X

DeepSeek-OCR

Information

DeepSeek AI

Github | Model Download | Paper Link | Arxiv Paper Link |

DeepSeek-OCR: Contexts Optical Compression

Explore the boundaries of visual-text compression.

## Usage Inference using Huggingface transformers on NVIDIA GPUs. Requirements tested on python 3.12.9 + CUDA11.8: \`\`\` torch==2.6.0 transformers==4.46.3 tokenizers==0.20.3 einops addict easydict pip install flash-attn==2.7.3 --no-build-isolation \`\`\` \`\`\`python from transformers import AutoModel, AutoTokenizer import torch import os os.environ["CUDA_VISIBLE_DEVICES"] = '0' model_name = 'deepseek-ai/DeepSeek-OCR' tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModel.from_pretrained(model_name, _attn_implementation='flash_attention_2', trust_remote_code=True, use_safetensors=True) model = model.eval().cuda().to(torch.bfloat16) # prompt = "\nFree OCR. " prompt = "\n<|grounding|>Convert the document to markdown. " image_file = 'your_image.jpg' output_path = 'your/output/dir' # infer(self, tokenizer, prompt='', image_file='', output_path = ' ', base_size = 1024, image_size = 640, crop_mode = True, test_compress = False, save_results = False): # Tiny: base_size = 512, image_size = 512, crop_mode = False # Small: base_size = 640, image_size = 640, crop_mode = False # Base: base_size = 1024, image_size = 1024, crop_mode = False # Large: base_size = 1280, image_size = 1280, crop_mode = False # Gundam: base_size = 1024, image_size = 640, crop_mode = True res = model.infer(tokenizer, prompt=prompt, image_file=image_file, output_path = output_path, base_size = 1024, image_size = 640, crop_mode=True, save_results = True, test_compress = True) \`\`\` ## vLLM Refer to [GitHub](https://github.com/deepseek-ai/DeepSeek-OCR/) for guidance on model inference acceleration and PDF processing, etc. [2025/10/23] DeepSeek-OCR is now officially supported in upstream [vLLM](https://docs.vllm.ai/projects/recipes/en/latest/DeepSeek/DeepSeek-OCR.html#installing-vllm). \`\`\`shell uv venv source .venv/bin/activate # Until v0.11.1 release, you need to install vLLM from nightly build uv pip install -U vllm --pre --extra-index-url https://wheels.vllm.ai/nightly \`\`\` \`\`\`python from vllm import LLM, SamplingParams from vllm.model_executor.models.deepseek_ocr import NGramPerReqLogitsProcessor from PIL import Image # Create model instance llm = LLM( model="deepseek-ai/DeepSeek-OCR", enable_prefix_caching=False, mm_processor_cache_gb=0, logits_processors=[NGramPerReqLogitsProcessor] ) # Prepare batched input with your image file image_1 = Image.open("path/to/your/image_1.png").convert("RGB") image_2 = Image.open("path/to/your/image_2.png").convert("RGB") prompt = "\nFree OCR." model_input = [ \{ "prompt": prompt, "multi_modal_data": \{"image": image_1\} \}, \{ "prompt": prompt, "multi_modal_data": \{"image": image_2\} \} ] sampling_param = SamplingParams( temperature=0.0, max_tokens=8192, # ngram logit processor args extra_args=dict( ngram_size=30, window_size=90, whitelist_token_ids=\{128821, 128822\}, # whitelist: , ), skip_special_tokens=False, ) # Generate output model_outputs = llm.generate(model_input, sampling_param) # Print output for output in model_outputs: print(output.outputs[0].text) \`\`\` ## Visualizations
## Acknowledgement We would like to thank [Vary](https://github.com/Ucas-HaoranWei/Vary/), [GOT-OCR2.0](https://github.com/Ucas-HaoranWei/GOT-OCR2.0/), [MinerU](https://github.com/opendatalab/MinerU), [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR), [OneChart](https://github.com/LingyvKong/OneChart), [Slow Perception](https://github.com/Ucas-HaoranWei/Slow-Perception) for their valuable models and ideas. We also appreciate the benchmarks: [Fox](https://github.com/ucaslcl/Fox), [OminiDocBench](https://github.com/opendatalab/OmniDocBench). ## Citation \`\`\`bibtex @article\{wei2025deepseek, title=\{DeepSeek-OCR: Contexts Optical Compression\}, author=\{Wei, Haoran and Sun, Yaofeng and Li, Yukun\}, journal=\{arXiv preprint arXiv:2510.18234\}, year=\{2025\} \}

Prompts

1

Alibaba Financial Reports PDF and Image Processing

2

Receipt Recognition

Reviews

Tags


  • BaileyZimX 2025-10-23 15:22
    Interesting:5,Helpfulness:5,Correctness:5
    Prompt: Receipt Recognition

    The receipt of Red Lobster results are 100% correct. I tested an image of receipt on the DeepSeek OKR model and the address is correctly recognized. Pretty Good. The difficult part of the receipt image is correct **RED LOBSTER 0319** 3707 Mccain Blvd North Little Rock, AR 72116-8023


  • BaileyZimX 2025-10-23 15:10
    Interesting:3,Table Processing:3,Helpfulness:4,Correctness:5
    Prompt: Alibaba Financial Reports PDF and Image Processing

    I am working on a financial report processing project and compared a few OCR models on Huggingface recently. The latest DeepSeek OCR model produces overall pretty good results, which is like more than 99% accuracy. Except that the processed results of the financial table is missing last one column 'USS' of the cost of revenue report, which I also attach to this review. I tried it on the workspace https://huggingface.co/spaces/khang119966/DeepSeek-OCR-DEMO. All the remaining texts, digits are correct, except the missing column of the table.

Write Your Review

Detailed Ratings

ALL
Correctness
Helpfulness
Interesting
Upload Pictures and Videos

Name
Size
Type
Download
Last Modified
  • Community

Add Discussion

Upload Pictures and Videos