X

Overview

MODEL

deepseek-ai/deepseek-ocr

Github | Model Download | Paper Link | Arxiv Paper Link | DeepSeek-OCR: Contexts Optical Compression Explore the boundaries of visual-text compression. ## Usage Inference using Huggingface transformers on NVIDIA GPUs. Requirements tested on python 3.12.9 + CUDA11.8: \`\`\` torch==2.6.0 transformers==4.46

qwen-alibaba/qwen-3

Qwen 3 is the latest large reasoning model developed by Alibaba company. It surpass multiple baselines on coding, math and surpass SOTA model performance on multiple benchmarks. It is said to be released by May, 2025. # Qwen3 Qwen Chat   |    Hugging Face | ModelScope   | Paper | Blog | Documentation Demo   | WeChat (微信)   | Discord   Visit our Hugging Fac

gemini-google/nano-banana

## Google Gemini Flash Nano Banana Today, we’re excited to introduce Gemini 2.5 Flash Image (aka nano-banana), our state-of-the-art image generation and editing model. This update enables you to blend multiple images into a single image, maintain character consistency for rich storytelling, make targeted transformations using natural language, and use Gemini's world knowledge to generate and ed

qwen-alibaba/qwen3-235b-a22b

Qwen3 Highlights Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: # Qwen3-235B-A22B ## Qw

Qwen/qwen3-vl-8b-instruct

# Qwen3-VL-8B-Instruct Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date. This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities. Available in

qwen-alibaba/qwen3-coder-480b-a35b-instruct

# Qwen3-Coder-480B-A35B-Instruct ## Highlights Today, we're announcing **Qwen3-Coder**, our most agentic code model to date. **Qwen3-Coder** is available in multiple sizes, but we're excited to introduce its most powerful variant first: **Qwen3-Coder-480B-A35B-Instruct**. featuring the following key enhancements: - **Significant Performance** among open models on **Agentic Cod

openai/gpt-oss-20b

Try gpt-oss · Guides · Model card · OpenAI blog Welcome to the gpt-oss series, designed for powerful reasoning, agentic tasks, and versatile developer use cases. We’re releasing two flavors of these open models: - \`gpt-oss-120b\` — for production, general purpose, high reasoning use cases that fit into a single 80GB GPU (like NVIDIA H100 or AMD MI300X) (117B parameters with 5

zai-org/glm-4-6

# GLM-4.6 Join our Discord community. Check out the GLM-4.6 technical blog, technical report(GLM-4.5), and Zhipu AI technical documentation. Use GLM-4.6 API services on Z.ai API Platform. One click to GLM-4.6. ## Model Introduction Compared with GLM-4.5, **GLM-4.6** brings several key improvements: * **Longer context window:** The context window has be

nanonets/nanonets-ocr2-3b

Nanonets-OCR2: A model for transforming documents into structured markdown with intelligent content recognition and semantic tagging ️ Live Demo | Blog | ⌨️ GitHub Cookbooks Nanonets-OCR2 by is a family of powerful, state-of-the-art image-to-markdown OCR models that go far beyond traditional text extraction. It transforms documents into structured markdown with intelli

PaddlePaddle/paddleocr-vl

PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model ](https://github.com/PaddlePaddle/PaddleOCR) ](https://huggingface.co/PaddlePaddle/PaddleOCR-VL) ](https://modelscope.cn/models/PaddlePaddle/PaddleOCR-VL) ](https://huggingface.co/spaces/PaddlePaddle/PaddleOCR-VL_Online_Demo) ](https://modelscope.cn/studios/PaddlePaddle/PaddleOCR-VL_Online_D

microsoft/userlm-8b

# microsoft/UserLM-8b model card ## Model description Unlike typical LLMs that are trained to play the role of the "assistant" in conversation, we trained UserLM-8b to simulate the “user” role in conversation (by training it to predict user turns in a large corpus of conversations called WildChat). This model is useful in simulating more realistic conversations, which is in turn useful in the de

katanemo/arch-router-1-5b

# katanemo/Arch-Router-1.5B ## Overview With the rapid proliferation of large language models (LLMs) -- each optimized for different strengths, style, or latency/cost profile -- routing has become an essential technique to operationalize the use of different models. However, existing LLM routing approaches are limited in two key ways: they evaluate performance using benchmarks that often fail to

tencent/hunyuanworld-mirror

HunyuanWorld-Mirror is a versatile feed-forward model for comprehensive 3D geometric prediction. It integrates diverse geometric priors (**camera poses**, **calibrated intrinsics**, **depth maps**) and simultaneously generates various 3D representations (**point clouds**, **multi-view depths**, **camera parameters**, **surface normals**, **3D Gaussians**) in a single forward pas

chatgpt-openai/sora-2-model

## OpenAI just released their flagship video and audio model Sora 2 You can download the apps from app store: https://apps.apple.com/us/app/sora-by-openai/id6744034028 ## Introduction The original Sora model⁠ from February 2024 was in many ways the GPT‑1 moment for video—the first time video generation started to seem like it was working, and simple behaviors like object permanence emer

black-forest-labs/flux-1-dev

# Key Features 1. Cutting-edge output quality, second only to our state-of-the-art model \`FLUX.1 [pro]\`. 2. Competitive prompt following, matching the performance of closed source alternatives . 3. Trained using guidance distillation, making \`FLUX.1 [dev]\` more efficient. 4. Open weights to drive new scientific research, and empower artists to develop innovative workflows. 5. Generated ou

claude-anthropic/claude-opus-4

Claude Opus 4 is the Hybrid reasoning model that pushes the frontier for coding and AI agents, featuring a 200K context window Claude Opus 4 is our most intelligent model to date, pushing the frontier in coding, agentic search, and creative writing. We’ve also made it possible to run Claude Code in the background, enabling developers to assign long-running coding tasks for Opus to handle indepe

claude-anthropic/claude-sonnet-4-5

## Main Features It's the strongest model for building complex agents. It’s the best model at using computers. And it shows substantial gains in reasoning and math. Code is everywhere. It runs every application, spreadsheet, and software tool you use. Being able to use those tools and reason through hard problems is how modern work gets done. Claude Sonnet 4.5 makes this possible. We're r

# **OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM** ](arxiv.org/abs/2510.15870 ) ](https://github.com/NVlabs/OmniVinci) ](https://huggingface.co/nvidia/omnivinci) ](https://nvlabs.github.io/OmniVinci) ## Introduction OmniVinci is an NVIDIA research project focused on exploring omni-modal LLMs that can not only see and read but also listen, speak, and reason. We a

claude-anthropic/claude-4

Anthropic launched the next generation of Claude models today—Opus 4 and Sonnet 4—designed for coding, advanced reasoning, and the support of the next generation of capable, autonomous AI agents. Claude 4 hybrid reasoning models let customers choose between near-instant responses and deeper reasoning. Claude 4 models offer improvements in coding, with Opus 4 as the “world’s best coding model

JunhaoZhuang/flashvsr

# FlashVSR **Towards Real-Time Diffusion-Based Streaming Video Super-Resolution** **Authors:** Junhao Zhuang, Shi Guo, Xin Cai, Xiaohui Li, Yihao Liu, Chun Yuan, Tianfan Xue         **Your star means a lot for us to develop this project!** :star: --- ### Abstract Diffusion models have recently advanced video restoration, but applying them to real-world video sup

REASON

qwen-alibaba/qwen3-235b-a22b

Qwen3 Highlights Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: # Qwen3-235B-A22B ## Qw

qwen-alibaba/qwen-3

Qwen 3 is the latest large reasoning model developed by Alibaba company. It surpass multiple baselines on coding, math and surpass SOTA model performance on multiple benchmarks. It is said to be released by May, 2025. # Qwen3 Qwen Chat   |    Hugging Face | ModelScope   | Paper | Blog | Documentation Demo   | WeChat (微信)   | Discord   Visit our Hugging Fac

Deepseek R2 is the latest large reasoning model developped by the Deepseek company. It surpasses multiple baselines on coding, math benchmarks and lower the training as well as the inference cost by 95%. It is said to be released by May, 2025.

gemini-google/gemini-3

## Gemini 3 Model Release News and Reviews Gemini 3 is reported to be released soon by Oct 22, 2025. It's useful for generating webpage, mini-games, music and much more capabilities. Right now a lot of developers are getting beta test access keys. Gemini 3 VS GPT 5 abilities comparison While Gemini 3 base models gain great potentials in coding and agent abilities. What's the relative c

claude-anthropic/claude-opus-4

Claude Opus 4 is the Hybrid reasoning model that pushes the frontier for coding and AI agents, featuring a 200K context window Claude Opus 4 is our most intelligent model to date, pushing the frontier in coding, agentic search, and creative writing. We’ve also made it possible to run Claude Code in the background, enabling developers to assign long-running coding tasks for Opus to handle indepe

deepseek/deepseek-prover-v2-671b

DeepSeek-Prover-V2 is an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thou

claude-anthropic/claude-4

Anthropic launched the next generation of Claude models today—Opus 4 and Sonnet 4—designed for coding, advanced reasoning, and the support of the next generation of capable, autonomous AI agents. Claude 4 hybrid reasoning models let customers choose between near-instant responses and deeper reasoning. Claude 4 models offer improvements in coding, with Opus 4 as the “world’s best coding model

DeepSeek V3 0324 is the latest generation LLM developed by the Deepseek company. It is reported to surpass multiple baselines.

# **OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM** ](arxiv.org/abs/2510.15870 ) ](https://github.com/NVlabs/OmniVinci) ](https://huggingface.co/nvidia/omnivinci) ](https://nvlabs.github.io/OmniVinci) ## Introduction OmniVinci is an NVIDIA research project focused on exploring omni-modal LLMs that can not only see and read but also listen, speak, and reason. We a

qwen-alibaba/qwen3-0-6b

Qwen3-0.6B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 0.6B Number of Paramaters (Non-Embedding): 0.44B Number of Layers: 28 Number of Attention Heads (GQA): 16 for Q and 8 for KV Context Length: 32,768 # Qwen3-0.6B ## Qwen3 Highlights Qwen3 is the latest generation of large language models

grok4-xai/grok-4

Grok 4 is the latest released model by XAI. It surpasses multiple benchmarks and are trained using corpus from x/twitter.

qwen-alibaba/qwen3-32b

Qwen3-32B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 32.8B Number of Paramaters (Non-Embedding): 31.2B Number of Layers: 64 Number of Attention Heads (GQA): 64 for Q and 8 for KV Context Length: 32,768 natively and 131,072 tokens with YaRN. # Qwen3-32B ## Qwen3 Highlights Qwen3 is the late

claude-anthropic/claude-sonnet-4-5

## Main Features It's the strongest model for building complex agents. It’s the best model at using computers. And it shows substantial gains in reasoning and math. Code is everywhere. It runs every application, spreadsheet, and software tool you use. Being able to use those tools and reason through hard problems is how modern work gets done. Claude Sonnet 4.5 makes this possible. We're r

claude-anthropic/claude-sonnet-4

Hybrid reasoning model with superior intelligence for high-volume use cases, and 200K context window Claude Sonnet 4 improves on Claude Sonnet 3.7 across a variety of areas, especially coding. It offers frontier performance that’s practical for most AI use cases, including user-facing AI assistants and high-volume tasks. Claude Sonnet 3.7 is the first hybrid reasoning model and our most inte

Qwen/qwen2-5-vl-3b-instruct

--- license_name: qwen-research license_link: https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct/blob/main/LICENSE language: - en pipeline_tag: image-text-to-text tags: - multimodal library_name: transformers --- # Qwen2.5-VL-3B-Instruct ## Introduction In the past five months since Qwen2-VL’s release, numerous developers have built new models on the Qwen2-VL vision-language models, pro

Qwen/qwen3-next-80b-a3b-instruct

# Qwen3-Next-80B-A3B-Instruct Over the past few months, we have observed increasingly clear trends toward scaling both total parameters and context lengths in the pursuit of more powerful and agentic artificial intelligence (AI). We are excited to share our latest advancements in addressing these demands, centered on improving scaling efficiency through innovative model architecture. We

Qwen/qwen2-5-vl-7b-instruct

--- license: apache-2.0 language: - en pipeline_tag: image-text-to-text tags: - multimodal library_name: transformers --- # Qwen2.5-VL-7B-Instruct ## Introduction In the past five months since Qwen2-VL’s release, numerous developers have built new models on the Qwen2-VL vision-language models, providing us with valuable feedback. During this period, we focused on building more useful vi

Qwen/qwen3-4b-instruct-2507

# Qwen3-4B-Instruct-2507 ## Highlights We introduce the updated version of the **Qwen3-4B non-thinking mode**, named **Qwen3-4B-Instruct-2507**, featuring the following key enhancements: - **Significant improvements** in general capabilities, including **instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage**. - **Substantial gains** in

deepseek-ai/deepseek-r1-distill-qwen-32b

# DeepSeek-R1 Paper Link️ ## 1. Introduction We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable pe

# Qwen3-8B ## Qwen3 Highlights Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: - **Uniquely support

video

Loading...

AGENT

Loading...

Coding Agent

Loading...

VIDEO GENERATOR

Loading...

IMAGE GENERATOR

Loading...

REASONING

Loading...

Write Your Review

Detailed Ratings

Upload Pictures and Videos