Search AI Agent Marketplace
Try: Coding Agent Autonomous Agent GUI Agent MCP Server Sales Agent HR Agent
Overview
MODEL
Github | Model Download | Paper Link | Arxiv Paper Link | DeepSeek-OCR: Contexts Optical Compression Explore the boundaries of visual-text compression. ## Usage Inference using Huggingface transformers on NVIDIA GPUs. Requirements tested on python 3.12.9 + CUDA11.8: \`\`\` torch==2.6.0 transformers==4.46
Qwen 3 is the latest large reasoning model developed by Alibaba company. It surpass multiple baselines on coding, math and surpass SOTA model performance on multiple benchmarks. It is said to be released by May, 2025. # Qwen3 Qwen Chat | Hugging Face | ModelScope | Paper | Blog | Documentation Demo | WeChat (微信) | Discord Visit our Hugging Fac
## Google Gemini Flash Nano Banana Today, we’re excited to introduce Gemini 2.5 Flash Image (aka nano-banana), our state-of-the-art image generation and editing model. This update enables you to blend multiple images into a single image, maintain character consistency for rich storytelling, make targeted transformations using natural language, and use Gemini's world knowledge to generate and ed
Qwen3 Highlights Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: # Qwen3-235B-A22B ## Qw
# Qwen3-VL-8B-Instruct Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date. This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities. Available in
# Qwen3-Coder-480B-A35B-Instruct ## Highlights Today, we're announcing **Qwen3-Coder**, our most agentic code model to date. **Qwen3-Coder** is available in multiple sizes, but we're excited to introduce its most powerful variant first: **Qwen3-Coder-480B-A35B-Instruct**. featuring the following key enhancements: - **Significant Performance** among open models on **Agentic Cod
Try gpt-oss · Guides · Model card · OpenAI blog Welcome to the gpt-oss series, designed for powerful reasoning, agentic tasks, and versatile developer use cases. We’re releasing two flavors of these open models: - \`gpt-oss-120b\` — for production, general purpose, high reasoning use cases that fit into a single 80GB GPU (like NVIDIA H100 or AMD MI300X) (117B parameters with 5
# GLM-4.6 Join our Discord community. Check out the GLM-4.6 technical blog, technical report(GLM-4.5), and Zhipu AI technical documentation. Use GLM-4.6 API services on Z.ai API Platform. One click to GLM-4.6. ## Model Introduction Compared with GLM-4.5, **GLM-4.6** brings several key improvements: * **Longer context window:** The context window has be
Nanonets-OCR2: A model for transforming documents into structured markdown with intelligent content recognition and semantic tagging ️ Live Demo | Blog | ⌨️ GitHub Cookbooks Nanonets-OCR2 by is a family of powerful, state-of-the-art image-to-markdown OCR models that go far beyond traditional text extraction. It transforms documents into structured markdown with intelli
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model ](https://github.com/PaddlePaddle/PaddleOCR) ](https://huggingface.co/PaddlePaddle/PaddleOCR-VL) ](https://modelscope.cn/models/PaddlePaddle/PaddleOCR-VL) ](https://huggingface.co/spaces/PaddlePaddle/PaddleOCR-VL_Online_Demo) ](https://modelscope.cn/studios/PaddlePaddle/PaddleOCR-VL_Online_D
# microsoft/UserLM-8b model card ## Model description Unlike typical LLMs that are trained to play the role of the "assistant" in conversation, we trained UserLM-8b to simulate the “user” role in conversation (by training it to predict user turns in a large corpus of conversations called WildChat). This model is useful in simulating more realistic conversations, which is in turn useful in the de
# katanemo/Arch-Router-1.5B ## Overview With the rapid proliferation of large language models (LLMs) -- each optimized for different strengths, style, or latency/cost profile -- routing has become an essential technique to operationalize the use of different models. However, existing LLM routing approaches are limited in two key ways: they evaluate performance using benchmarks that often fail to
HunyuanWorld-Mirror is a versatile feed-forward model for comprehensive 3D geometric prediction. It integrates diverse geometric priors (**camera poses**, **calibrated intrinsics**, **depth maps**) and simultaneously generates various 3D representations (**point clouds**, **multi-view depths**, **camera parameters**, **surface normals**, **3D Gaussians**) in a single forward pas
## OpenAI just released their flagship video and audio model Sora 2 You can download the apps from app store: https://apps.apple.com/us/app/sora-by-openai/id6744034028 ## Introduction The original Sora model from February 2024 was in many ways the GPT‑1 moment for video—the first time video generation started to seem like it was working, and simple behaviors like object permanence emer
# Key Features 1. Cutting-edge output quality, second only to our state-of-the-art model \`FLUX.1 [pro]\`. 2. Competitive prompt following, matching the performance of closed source alternatives . 3. Trained using guidance distillation, making \`FLUX.1 [dev]\` more efficient. 4. Open weights to drive new scientific research, and empower artists to develop innovative workflows. 5. Generated ou
Claude Opus 4 is the Hybrid reasoning model that pushes the frontier for coding and AI agents, featuring a 200K context window Claude Opus 4 is our most intelligent model to date, pushing the frontier in coding, agentic search, and creative writing. We’ve also made it possible to run Claude Code in the background, enabling developers to assign long-running coding tasks for Opus to handle indepe
## Main Features It's the strongest model for building complex agents. It’s the best model at using computers. And it shows substantial gains in reasoning and math. Code is everywhere. It runs every application, spreadsheet, and software tool you use. Being able to use those tools and reason through hard problems is how modern work gets done. Claude Sonnet 4.5 makes this possible. We're r
# **OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM** ](arxiv.org/abs/2510.15870 ) ](https://github.com/NVlabs/OmniVinci) ](https://huggingface.co/nvidia/omnivinci) ](https://nvlabs.github.io/OmniVinci) ## Introduction OmniVinci is an NVIDIA research project focused on exploring omni-modal LLMs that can not only see and read but also listen, speak, and reason. We a
Anthropic launched the next generation of Claude models today—Opus 4 and Sonnet 4—designed for coding, advanced reasoning, and the support of the next generation of capable, autonomous AI agents. Claude 4 hybrid reasoning models let customers choose between near-instant responses and deeper reasoning. Claude 4 models offer improvements in coding, with Opus 4 as the “world’s best coding model
# FlashVSR **Towards Real-Time Diffusion-Based Streaming Video Super-Resolution** **Authors:** Junhao Zhuang, Shi Guo, Xin Cai, Xiaohui Li, Yihao Liu, Chun Yuan, Tianfan Xue **Your star means a lot for us to develop this project!** :star: --- ### Abstract Diffusion models have recently advanced video restoration, but applying them to real-world video sup
REASON
Qwen3 Highlights Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: # Qwen3-235B-A22B ## Qw
Qwen 3 is the latest large reasoning model developed by Alibaba company. It surpass multiple baselines on coding, math and surpass SOTA model performance on multiple benchmarks. It is said to be released by May, 2025. # Qwen3 Qwen Chat | Hugging Face | ModelScope | Paper | Blog | Documentation Demo | WeChat (微信) | Discord Visit our Hugging Fac
Deepseek R2 is the latest large reasoning model developped by the Deepseek company. It surpasses multiple baselines on coding, math benchmarks and lower the training as well as the inference cost by 95%. It is said to be released by May, 2025.
## Gemini 3 Model Release News and Reviews Gemini 3 is reported to be released soon by Oct 22, 2025. It's useful for generating webpage, mini-games, music and much more capabilities. Right now a lot of developers are getting beta test access keys. Gemini 3 VS GPT 5 abilities comparison While Gemini 3 base models gain great potentials in coding and agent abilities. What's the relative c
Claude Opus 4 is the Hybrid reasoning model that pushes the frontier for coding and AI agents, featuring a 200K context window Claude Opus 4 is our most intelligent model to date, pushing the frontier in coding, agentic search, and creative writing. We’ve also made it possible to run Claude Code in the background, enabling developers to assign long-running coding tasks for Opus to handle indepe
DeepSeek-Prover-V2 is an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thou
Anthropic launched the next generation of Claude models today—Opus 4 and Sonnet 4—designed for coding, advanced reasoning, and the support of the next generation of capable, autonomous AI agents. Claude 4 hybrid reasoning models let customers choose between near-instant responses and deeper reasoning. Claude 4 models offer improvements in coding, with Opus 4 as the “world’s best coding model
# **OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM** ](arxiv.org/abs/2510.15870 ) ](https://github.com/NVlabs/OmniVinci) ](https://huggingface.co/nvidia/omnivinci) ](https://nvlabs.github.io/OmniVinci) ## Introduction OmniVinci is an NVIDIA research project focused on exploring omni-modal LLMs that can not only see and read but also listen, speak, and reason. We a
Qwen3-0.6B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 0.6B Number of Paramaters (Non-Embedding): 0.44B Number of Layers: 28 Number of Attention Heads (GQA): 16 for Q and 8 for KV Context Length: 32,768 # Qwen3-0.6B ## Qwen3 Highlights Qwen3 is the latest generation of large language models
Qwen3-32B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 32.8B Number of Paramaters (Non-Embedding): 31.2B Number of Layers: 64 Number of Attention Heads (GQA): 64 for Q and 8 for KV Context Length: 32,768 natively and 131,072 tokens with YaRN. # Qwen3-32B ## Qwen3 Highlights Qwen3 is the late
## Main Features It's the strongest model for building complex agents. It’s the best model at using computers. And it shows substantial gains in reasoning and math. Code is everywhere. It runs every application, spreadsheet, and software tool you use. Being able to use those tools and reason through hard problems is how modern work gets done. Claude Sonnet 4.5 makes this possible. We're r
Hybrid reasoning model with superior intelligence for high-volume use cases, and 200K context window Claude Sonnet 4 improves on Claude Sonnet 3.7 across a variety of areas, especially coding. It offers frontier performance that’s practical for most AI use cases, including user-facing AI assistants and high-volume tasks. Claude Sonnet 3.7 is the first hybrid reasoning model and our most inte
--- license_name: qwen-research license_link: https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct/blob/main/LICENSE language: - en pipeline_tag: image-text-to-text tags: - multimodal library_name: transformers --- # Qwen2.5-VL-3B-Instruct ## Introduction In the past five months since Qwen2-VL’s release, numerous developers have built new models on the Qwen2-VL vision-language models, pro
# Qwen3-Next-80B-A3B-Instruct Over the past few months, we have observed increasingly clear trends toward scaling both total parameters and context lengths in the pursuit of more powerful and agentic artificial intelligence (AI). We are excited to share our latest advancements in addressing these demands, centered on improving scaling efficiency through innovative model architecture. We
--- license: apache-2.0 language: - en pipeline_tag: image-text-to-text tags: - multimodal library_name: transformers --- # Qwen2.5-VL-7B-Instruct ## Introduction In the past five months since Qwen2-VL’s release, numerous developers have built new models on the Qwen2-VL vision-language models, providing us with valuable feedback. During this period, we focused on building more useful vi
# Qwen3-4B-Instruct-2507 ## Highlights We introduce the updated version of the **Qwen3-4B non-thinking mode**, named **Qwen3-4B-Instruct-2507**, featuring the following key enhancements: - **Significant improvements** in general capabilities, including **instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage**. - **Substantial gains** in
# DeepSeek-R1 Paper Link️ ## 1. Introduction We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable pe
# Qwen3-8B ## Qwen3 Highlights Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: - **Uniquely support
MULTI MODAL
Loading...
video
Loading...
AGENT
Loading...
Coding Agent
Loading...
VIDEO GENERATOR
Loading...
IMAGE GENERATOR
Loading...
REASONING
Loading...
Reviews
Write Your Review
Detailed Ratings
-
Community
-
大家在使用可灵AI生成视频的时候遇到了哪些好的体验和有问题的体验?请务必写明prompt输入文本和视频截图or短视频clip
-
大家在使用抖音的即梦AI生成视频的时候遇到了哪些好的体验和有问题的体验?请务必写明prompt输入文本和视频截图or短视频clip
-
大家在使用快手(Kuaishou Kwai)短视频的搜索推荐Search and Recommendation 功能的时候遇到了哪些好的体验和有问题的体验?请麻烦写明复现条件,比如prompt输入文本,上传截图。
-
大家在使用小红书(Xiaohongshu)APP的搜索推荐Search and Recommendation 功能的时候遇到了哪些好的体验和有问题的体验?请麻烦写明复现条件,比如prompt输入文本,上传截图。
-
大家在使用微信(WeChat)APP的搜索推荐Search and Recommendation 功能的时候遇到了哪些好的体验和有问题的体验?请麻烦写明复现条件,比如prompt输入文本,上传截图。
-
大家在使用微信(WeChat)APP的AI问答功能的时候,遇到了哪些好的体验和有问题的体验?请麻烦写明复现条件,比如prompt输入文本,上传截图。
-
大家在使用知乎(Zhihu)APP的搜索推荐Search and Recommendation 功能的时候遇到了哪些好的体验和有问题的体验?请麻烦写明复现条件,比如prompt输入文本,上传截图。
-
大家在使用京东(JD)APP的搜索推荐Search and Recommendation 功能的时候遇到了哪些好的体验和有问题的体验?请麻烦写明复现条件,比如prompt输入文本,上传截图。
-
大家在使用淘宝(Taobao)APP的搜索推荐Search and Recommendation 功能的时候遇到了哪些好的体验和有问题的体验?请麻烦写明复现条件,比如prompt输入文本,上传截图。
-
大家在使用支付宝(Alipay)APP的搜索推荐Search and Recommendation 功能的时候遇到了哪些好的体验和有问题的体验?请麻烦写明复现条件,比如prompt输入文本,上传截图。
-
大家在使用拼多多(PPD Temu)APP的搜索推荐Search and Recommendation 功能的时候遇到了哪些好的体验和有问题的体验?请麻烦写明复现条件,比如prompt输入文本,上传截图。
-
大家在使用知乎直答(Zhihu)AI搜索功能的时候,遇到了哪些好的体验和有问题的体验?请麻烦写一下当时输入的条件,比如prompt输入文本,或者是上传截图。
-
大家在使用知乎直答(Zhihu)AI搜索功能的时候,遇到了哪些好的体验和有问题的体验?请麻烦写一下当时输入的条件,比如prompt输入文本,或者是上传截图。
-
大家在使用快手(Kuaishou)的AI搜索功能的时候,遇到了哪些好的体验和有问题的体验?请麻烦写一下当时输入的条件,比如prompt输入文本,或者是上传截图。
-
大家在使用抖音(Douyin Tiktok)的AI搜索功能的时候,遇到了哪些好的体验和有问题的体验?请麻烦写一下当时输入的条件,比如prompt输入文本,或者是上传截图。
-
Please leave your thoughts on the best and coolest AI Generated Images.
-
Please leave your thoughts on free alternatives to Midjourney Stable Diffusion and other AI Image Generators.
-
Please leave your thoughs on the most scary or creepiest AI Generated Images.
-
We are witnessing great success in recent development of generative Artificial Intelligence in many fields, such as AI assistant, Chatbot, AI Writer. Among all the AI native products, AI Search Engine such as Perplexity, Gemini and SearchGPT are most attrative to website owners, bloggers and web content publishers. AI Search Engine is a new tool to provide answers directly to users' questions (queries). In this blog, we will give some brief introduction to basic concepts of AI Search Engine, including Large Language Models (LLM), Retrieval-Augmented Generation(RAG), Citations and Sources. Then we will highlight some majors differences between traditional Search Engine Optimization (SEO) and Generative Engine Optimization(GEO). And then we will cover some latest research and strategies to help website owners or content publishers to better optimize their content in Generative AI Search Engines.
-
We are seeing more applications of robotaxi and self-driving vehicles worldwide. Many large companies such as Waymo, Tesla and Baidu are accelerating their speed of robotaxi deployment in multiple cities. Some human drivers especially cab drivers worry that they will lose their jobs due to AI. They argue that the lower operating cost and AI can work technically 24 hours a day without any rest like human will have more competing advantage than humans. What do you think?