X

DeepSeek-OCR

Information

DeepSeek AI

Github | Model Download | Paper Link | Arxiv Paper Link |

DeepSeek-OCR: Contexts Optical Compression

Explore the boundaries of visual-text compression.

## Usage Inference using Huggingface transformers on NVIDIA GPUs. Requirements tested on python 3.12.9 + CUDA11.8: \`\`\` torch==2.6.0 transformers==4.46.3 tokenizers==0.20.3 einops addict easydict pip install flash-attn==2.7.3 --no-build-isolation \`\`\` \`\`\`python from transformers import AutoModel, AutoTokenizer import torch import os os.environ["CUDA_VISIBLE_DEVICES"] = '0' model_name = 'deepseek-ai/DeepSeek-OCR' tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModel.from_pretrained(model_name, _attn_implementation='flash_attention_2', trust_remote_code=True, use_safetensors=True) model = model.eval().cuda().to(torch.bfloat16) # prompt = "\nFree OCR. " prompt = "\n<|grounding|>Convert the document to markdown. " image_file = 'your_image.jpg' output_path = 'your/output/dir' # infer(self, tokenizer, prompt='', image_file='', output_path = ' ', base_size = 1024, image_size = 640, crop_mode = True, test_compress = False, save_results = False): # Tiny: base_size = 512, image_size = 512, crop_mode = False # Small: base_size = 640, image_size = 640, crop_mode = False # Base: base_size = 1024, image_size = 1024, crop_mode = False # Large: base_size = 1280, image_size = 1280, crop_mode = False # Gundam: base_size = 1024, image_size = 640, crop_mode = True res = model.infer(tokenizer, prompt=prompt, image_file=image_file, output_path = output_path, base_size = 1024, image_size = 640, crop_mode=True, save_results = True, test_compress = True) \`\`\` ## vLLM Refer to [GitHub](https://github.com/deepseek-ai/DeepSeek-OCR/) for guidance on model inference acceleration and PDF processing, etc. ## Visualizations
## Acknowledgement We would like to thank [Vary](https://github.com/Ucas-HaoranWei/Vary/), [GOT-OCR2.0](https://github.com/Ucas-HaoranWei/GOT-OCR2.0/), [MinerU](https://github.com/opendatalab/MinerU), [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR), [OneChart](https://github.com/LingyvKong/OneChart), [Slow Perception](https://github.com/Ucas-HaoranWei/Slow-Perception) for their valuable models and ideas. We also appreciate the benchmarks: [Fox](https://github.com/ucaslcl/Fox), [OminiDocBench](https://github.com/opendatalab/OmniDocBench). ## Citation \`\`\`bibtex @article\{wei2024deepseek-ocr, title=\{DeepSeek-OCR: Contexts Optical Compression\}, author=\{Wei, Haoran and Sun, Yaofeng and Li, Yukun\}, journal=\{arXiv preprint arXiv:2510.18234\}, year=\{2025\} \}

Prompts

1

Alibaba Financial Reports PDF and Image Processing

2

Receipt Recognition

Reviews

Tags


  • BaileyZimX 2025-10-23 15:22
    Interesting:5,Helpfulness:5,Correctness:5
    Prompt: Receipt Recognition

    The receipt of Red Lobster results are 100% correct. I tested an image of receipt on the DeepSeek OKR model and the address is correctly recognized. Pretty Good. The difficult part of the receipt image is correct **RED LOBSTER 0319** 3707 Mccain Blvd North Little Rock, AR 72116-8023


  • BaileyZimX 2025-10-23 15:10
    Interesting:3,Table Processing:3,Helpfulness:4,Correctness:5
    Prompt: Alibaba Financial Reports PDF and Image Processing

    I am working on a financial report processing project and compared a few OCR models on Huggingface recently. The latest DeepSeek OCR model produces overall pretty good results, which is like more than 99% accuracy. Except that the processed results of the financial table is missing last one column 'USS' of the cost of revenue report, which I also attach to this review. I tried it on the workspace https://huggingface.co/spaces/khang119966/DeepSeek-OCR-DEMO. All the remaining texts, digits are correct, except the missing column of the table.

Write Your Review

Detailed Ratings

ALL
Correctness
Helpfulness
Interesting
Upload Pictures and Videos

Name
Size
Type
Download
Last Modified
  • Community

Add Discussion

Upload Pictures and Videos