Category
Reviews
Tags
-
DeepSeek V3 has very high hallucination compared to other large MoE model with such huge size of parameters.
-
Qwen3 32B model series are the most widely adopted and deployed model in industrial applications, which compromise of inference speed and performance. This updated version of Qwen3 32B model have the thinking mode and non-thinking mode, which supports both the common task of chat/text generation and more complex task of math, code generation, etc. On the AIME and many other math benchmarks, Qwen3 surpass many of the opensource counterpart.
-
Qwen3 235B A22B model is more like an upgraded version of DeepSeek-R1. And it is also compared with Deepseek R1 model on multiple benchmarks of code and math. Personally, I don't Qwen3 is a huge upgrade compared to Gemini/OpenAI and Deepseek model, but more like a compromised version of complex thinking and realistic usage.
-
Prompt: In plane quadrilateral ABCD, AB = AC = CD = 1,\angle ADC = 30^{\circ},\angle DAB = 120^{\circ}. Fold triangle ACD along AC to triangle ACP, where P is a moving point. Find the minimum cosine value of the dihedral angle A - CP - B.Correct result: \sqrt(3)/3. To test the geometry question on Qwen app and the thinking mode you can get the result: Thinking mode: correct answer \sqrt(3)/3. Without thinking mode: wrong answer. Overall, the 235B model is quite powerful compared to previous SOTA model. More about the key updates in Qwen3: Hybrid reasoning model, expanded language support (100+ languages), enhanced tool calling capabilities with Qwen-Agent supporting MCP. The newly open-sourced Qwen3 is China's first "hybrid reasoning model", a concept initially proposed by Claude3.7 and recently adopted by Gemini2.5 Flash. Essentially, this allows the model to toggle reasoning processes on/off. The primary purpose is to accelerate response generation for simple queries or time-sensitive scenarios by optionally disabling the thinking process while maintaining output quality. Previous approaches struggled to directly suppress reasoning steps in LLMs without retraining, as prompt engineering offered limited control. Qwen3 introduces two control methods: 1) A hard switch via enable_thinking parameter (True/False), and 2) When enabled, secondary soft switching through appending /no_think or /think tokens. Qwen also provides recommended parameter configurations to ensure optimal performance: Think mode: Temperature=0.6, TopP=0.95, TopK=20, MinP=0 Non-think mode: Temperature=0.7, TopP=0.8, TopK=20, MinP=0 Additionally, Qwen3 features specialized training for tool invocation, with Qwen-Agent now supporting MCP.
-
Prompt: How to use KL divergence to help regularize the RL training of large reasoning model? What's the drawback of current RL algorithm?There is not public access to test the prover model. And I tried to use a previous prompt in machine learning to ask DeepSeek model to make a proof. But it seems like the question is over simplified and it only gave some introductory summarization. But the thinking process is quite interesting.
Write Your Review
Detailed Ratings
-
Community
-
大家在使用可灵AI生成视频的时候遇到了哪些好的体验和有问题的体验?请务必写明prompt输入文本和视频截图or短视频clip
-
大家在使用抖音的即梦AI生成视频的时候遇到了哪些好的体验和有问题的体验?请务必写明prompt输入文本和视频截图or短视频clip
-
大家在使用快手(Kuaishou Kwai)短视频的搜索推荐Search and Recommendation 功能的时候遇到了哪些好的体验和有问题的体验?请麻烦写明复现条件,比如prompt输入文本,上传截图。
-
大家在使用小红书(Xiaohongshu)APP的搜索推荐Search and Recommendation 功能的时候遇到了哪些好的体验和有问题的体验?请麻烦写明复现条件,比如prompt输入文本,上传截图。
-
大家在使用微信(WeChat)APP的搜索推荐Search and Recommendation 功能的时候遇到了哪些好的体验和有问题的体验?请麻烦写明复现条件,比如prompt输入文本,上传截图。
-
大家在使用微信(WeChat)APP的AI问答功能的时候,遇到了哪些好的体验和有问题的体验?请麻烦写明复现条件,比如prompt输入文本,上传截图。
-
大家在使用知乎(Zhihu)APP的搜索推荐Search and Recommendation 功能的时候遇到了哪些好的体验和有问题的体验?请麻烦写明复现条件,比如prompt输入文本,上传截图。
-
大家在使用京东(JD)APP的搜索推荐Search and Recommendation 功能的时候遇到了哪些好的体验和有问题的体验?请麻烦写明复现条件,比如prompt输入文本,上传截图。
-
大家在使用淘宝(Taobao)APP的搜索推荐Search and Recommendation 功能的时候遇到了哪些好的体验和有问题的体验?请麻烦写明复现条件,比如prompt输入文本,上传截图。
-
大家在使用支付宝(Alipay)APP的搜索推荐Search and Recommendation 功能的时候遇到了哪些好的体验和有问题的体验?请麻烦写明复现条件,比如prompt输入文本,上传截图。
Reply