X

Category

Overview

Most Reviewed

Qwen3 Highlights Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training,

Qwen 3 is the latest large reasoning model developed by Alibaba company. It surpass multiple baselines on coding, math and surpass SOTA model performance on multiple benchmarks. It is said to be relea

DeepSeek V3 0324 is the latest generation LLM developed by the Deepseek company. It is reported to surpass multiple baselines.

DeepSeek-Prover-V2 is an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by Deep

Qwen3-32B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 32.8B Number of Paramaters (Non-Embedding): 31.2B Number of

Deepseek R2 is the latest large reasoning model developped by the Deepseek company. It surpasses multiple baselines on coding, math benchmarks and lower the training as well as the inference cost by 9

Qwen3 14B has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Number of Parameters: 14.8B - Number of Paramaters (Non-Embedding): 13.2B - Nu

Qwen3-8B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 8.2B Number of Paramaters (Non-Embedding): 6.95B Number of La

Qwen3 Highlights Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training,

Qwen3-4B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 4.0B Number of Paramaters (Non-Embedding): 3.6B Number of Lay

Qwen3-1.7B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 1.7B Number of Paramaters (Non-Embedding): 1.4B Number of L

Qwen3-0.6B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 0.6B Number of Paramaters (Non-Embedding): 0.44B Number of

Top Rated

Qwen 3 is the latest large reasoning model developed by Alibaba company. It surpass multiple baselines on coding, math and surpass SOTA model performance on multiple benchmarks. It is said to be relea

Qwen3-32B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 32.8B Number of Paramaters (Non-Embedding): 31.2B Number of

Deepseek R2 is the latest large reasoning model developped by the Deepseek company. It surpasses multiple baselines on coding, math benchmarks and lower the training as well as the inference cost by 9

Qwen3 Highlights Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training,

DeepSeek V3 0324 is the latest generation LLM developed by the Deepseek company. It is reported to surpass multiple baselines.

DeepSeek-Prover-V2 is an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by Deep

Qwen3 14B has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Number of Parameters: 14.8B - Number of Paramaters (Non-Embedding): 13.2B - Nu

Qwen3-8B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 8.2B Number of Paramaters (Non-Embedding): 6.95B Number of La

Qwen3 Highlights Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training,

Qwen3-4B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 4.0B Number of Paramaters (Non-Embedding): 3.6B Number of Lay

Qwen3-1.7B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 1.7B Number of Paramaters (Non-Embedding): 1.4B Number of L

Qwen3-0.6B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 0.6B Number of Paramaters (Non-Embedding): 0.44B Number of

REASON

Qwen3 Highlights Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training,

Qwen 3 is the latest large reasoning model developed by Alibaba company. It surpass multiple baselines on coding, math and surpass SOTA model performance on multiple benchmarks. It is said to be relea

DeepSeek V3 0324 is the latest generation LLM developed by the Deepseek company. It is reported to surpass multiple baselines.

DeepSeek-Prover-V2 is an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by Deep

Qwen3-32B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 32.8B Number of Paramaters (Non-Embedding): 31.2B Number of

Deepseek R2 is the latest large reasoning model developped by the Deepseek company. It surpasses multiple baselines on coding, math benchmarks and lower the training as well as the inference cost by 9

Qwen3 14B has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Number of Parameters: 14.8B - Number of Paramaters (Non-Embedding): 13.2B - Nu

Qwen3-8B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 8.2B Number of Paramaters (Non-Embedding): 6.95B Number of La

Qwen3 Highlights Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training,

Qwen3-4B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 4.0B Number of Paramaters (Non-Embedding): 3.6B Number of Lay

Qwen3-1.7B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 1.7B Number of Paramaters (Non-Embedding): 1.4B Number of L

Qwen3-0.6B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 0.6B Number of Paramaters (Non-Embedding): 0.44B Number of

REASONING

Qwen3 Highlights Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training,

Qwen 3 is the latest large reasoning model developed by Alibaba company. It surpass multiple baselines on coding, math and surpass SOTA model performance on multiple benchmarks. It is said to be relea

DeepSeek V3 0324 is the latest generation LLM developed by the Deepseek company. It is reported to surpass multiple baselines.

DeepSeek-Prover-V2 is an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by Deep

Qwen3-32B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 32.8B Number of Paramaters (Non-Embedding): 31.2B Number of

Deepseek R2 is the latest large reasoning model developped by the Deepseek company. It surpasses multiple baselines on coding, math benchmarks and lower the training as well as the inference cost by 9

Qwen3 14B has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Number of Parameters: 14.8B - Number of Paramaters (Non-Embedding): 13.2B - Nu

Qwen3-8B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 8.2B Number of Paramaters (Non-Embedding): 6.95B Number of La

Qwen3 Highlights Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training,

Qwen3-4B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 4.0B Number of Paramaters (Non-Embedding): 3.6B Number of Lay

Qwen3-1.7B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 1.7B Number of Paramaters (Non-Embedding): 1.4B Number of L

Qwen3-0.6B has the following features: Type: Causal Language Models Training Stage: Pretraining & Post-training Number of Parameters: 0.6B Number of Paramaters (Non-Embedding): 0.44B Number of

Reviews

Tags


  • aigc_coder 2025-05-02 12:25
    Interesting:4,Helpfulness:4,Correctness:3

    DeepSeek V3 has very high hallucination compared to other large MoE model with such huge size of parameters.


  • aigc_coder 2025-05-02 12:03
    Interesting:5,Helpfulness:5,Correctness:5

    Qwen3 32B model series are the most widely adopted and deployed model in industrial applications, which compromise of inference speed and performance. This updated version of Qwen3 32B model have the thinking mode and non-thinking mode, which supports both the common task of chat/text generation and more complex task of math, code generation, etc. On the AIME and many other math benchmarks, Qwen3 surpass many of the opensource counterpart.


  • aigc_coder 2025-05-02 11:56
    Interesting:3,Helpfulness:2,Correctness:3

    Qwen3 235B A22B model is more like an upgraded version of DeepSeek-R1. And it is also compared with Deepseek R1 model on multiple benchmarks of code and math. Personally, I don't Qwen3 is a huge upgrade compared to Gemini/OpenAI and Deepseek model, but more like a compromised version of complex thinking and realistic usage.


  • AILearner98 2025-05-02 11:49
    Interesting:5,Helpfulness:5,Correctness:5
    Prompt: In plane quadrilateral ABCD, AB = AC = CD = 1,\angle ADC = 30^{\circ},\angle DAB = 120^{\circ}. Fold triangle ACD along AC to triangle ACP, where P is a moving point. Find the minimum cosine value of the dihedral angle A - CP - B.

    Correct result: \sqrt(3)/3. To test the geometry question on Qwen app and the thinking mode you can get the result: Thinking mode: correct answer \sqrt(3)/3. Without thinking mode: wrong answer. Overall, the 235B model is quite powerful compared to previous SOTA model. More about the key updates in Qwen3: Hybrid reasoning model, expanded language support (100+ languages), enhanced tool calling capabilities with Qwen-Agent supporting MCP. The newly open-sourced Qwen3 is China's first "hybrid reasoning model", a concept initially proposed by Claude3.7 and recently adopted by Gemini2.5 Flash. Essentially, this allows the model to toggle reasoning processes on/off. The primary purpose is to accelerate response generation for simple queries or time-sensitive scenarios by optionally disabling the thinking process while maintaining output quality. Previous approaches struggled to directly suppress reasoning steps in LLMs without retraining, as prompt engineering offered limited control. Qwen3 introduces two control methods: 1) A hard switch via enable_thinking parameter (True/False), and 2) When enabled, secondary soft switching through appending /no_think or /think tokens. Qwen also provides recommended parameter configurations to ensure optimal performance: Think mode: Temperature=0.6, TopP=0.95, TopK=20, MinP=0 Non-think mode: Temperature=0.7, TopP=0.8, TopK=20, MinP=0 Additionally, Qwen3 features specialized training for tool invocation, with Qwen-Agent now supporting MCP.


  • HaoZLi 2025-05-01 09:48
    Interesting:4,Helpfulness:3,Correctness:4
    Prompt: How to use KL divergence to help regularize the RL training of large reasoning model? What's the drawback of current RL algorithm?

    There is not public access to test the prover model. And I tried to use a previous prompt in machine learning to ask DeepSeek model to make a proof. But it seems like the question is over simplified and it only gave some introductory summarization. But the thinking process is quite interesting.

Write Your Review

Detailed Ratings

Upload Pictures and Videos