X

Nemotron-Research-GooseReason-0.7M

Information

# GooseReason-0.7M **Synthesized with *Golden Goose*: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text** [![Paper](https://img.shields.io/badge/arXiv-2601.22975-b31b1b.svg)](https://arxiv.org/abs/2601.22975) [![License: CC BY-NC 4.0](https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc/4.0/)
**GooseReason-0.7M** is a large-scale RLVR dataset with over **0.7 million tasks** across mathematics, programming, and general scientific domains, synthesized by the **Golden Goose** pipeline. It is used to train [GooseReason-4B-Instruct](https://huggingface.co/nvidia/Nemotron-Research-GooseReason-4B-Instruct), which achieves new state-of-the-art results among 4B-Instruct models across 15 diverse benchmarks, spanning mathematics, programming, STEM reasoning, instruction following, and logical puzzles. This dataset is for research and development only. ## Golden Goose Scaling up RLVR is bottlenecked by the scarcity of verifiable training data, where improvements increasingly saturate after prolonged training on existing datasets. **Golden Goose** is a simple, scalable pipeline that synthesizes *unlimited* RLVR tasks from reasoning-rich but unverifiable internet text—corpora such as science textbooks, Olympiad math forums, and cybersecurity web scrapes that were previously excluded from RLVR data construction due to the difficulty of automatic verification. **The key idea:** given a source text *S*, we prompt an LLM to identify a contiguous span *t* of crucial reasoning steps and replace it with a \`[MASK]\` token, constructing a masked context *S*_mask. Treating *t* as the ground-truth answer, the LLM then generates a set of diverse, plausible distractors *D* = \{*d*₁, ..., *d*ₖ\} that are similar in style and length to the removed span yet incorrect in context, forming a multiple-choice question: *Q* = (*S*_mask, \{*t*\} ∪ *D*) Verification during RL simply checks whether the model's prediction matches the ground-truth option—no external judge or test execution needed. This formulation unlocks reasoning-rich corpora that were previously unusable for RLVR: Olympiad-level theorem proving from AoPS-Instruct, free-form textbook QA from MegaScience, and coding problems without test cases from rStar-Coder. ## GooseReason-0.7M Dataset Using the Golden Goose pipeline, we synthesize **GooseReason-0.7M**, a large-scale RLVR dataset with over **0.7 million tasks** spanning mathematics, programming, and general scientific domains. The dataset is constructed from the following source corpora: | Domain | # Examples | Source | Description | |--------|----------------|--------|-------------| | Math | 235,836 | AoPS‑Instruct | ~600K QA pairs from the Art of Problem Solving forum, predominantly featuring Olympiad-level math problems with community-driven solutions | | Code | 281,793 | rStar‑Coder | ~418K coding problems from competitive programming platforms; we use the \`synthetic_sft\` split (questions + teacher model solutions without test cases), which is not directly usable for RL training | | STEM | 155,496 | MegaScience | ~650K QA pairs from ~12K university-level scientific textbooks spanning physics, biology, chemistry, medicine, computer science, mathematics, and economics | The data mixing ratio used to train GooseReason-4B-Instruct is **55% ProRL data, 15% GooseReason-0.7M Math, 15% GooseReason-0.7M Code, and 15% GooseReason-0.7M STEM**. ## Data Format Each record contains three fields: | Field | Type | Description | |-------|------|-------------| | \`question\` | \`str\` | A masked passage or solution with \`[MASK]\` indicating the missing reasoning steps | | \`options\` | \`list[str]\` | Answer choices in order (index 0 = option A, index 1 = option B, …) | | \`answer\` | \`str\` | The correct option letter (e.g., \`"A"\`) | Example record: \`\`\`json \{ "question": "You are given a math problem and its solution, with some steps replaced by [MASK]...\n\n**Question:**\n...\n\n**Solution:**\n...\n[MASK]\n...", "options": [ "6. Because QR is a radical axis, the point E has equal power...", "6. Using power of a point, E has equal power to the circle...", "..." ], "answer": "E" \} \`\`\` ## Citation If you find this dataset or the Golden Goose paper helpful, please cite: \`\`\`bibtex @article\{lu2026goldengoose, title=\{Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text\}, author=\{Lu, Ximing and Acuna, David and Jung, Jaehun and Hu, Jian and Zhang, Di and Diao, Shizhe and Zou, Yunheng and Zhang, Shaokun and Cui, Brandon and Liu, Mingjie and Kim, Hyunwoo and Ammanabrolu, Prithviraj and Kautz, Jan and Dong, Yi and Choi, Yejin\}, journal=\{arXiv preprint arXiv:2601.22975\}, year=\{2026\} \} \`\`\`

Prompts

Reviews

Tags

Write Your Review

Detailed Ratings

ALL
Correctness
Helpfulness
Interesting
Upload Pictures and Videos

Name
Size
Type
Download
Last Modified
  • Community

Add Discussion

Upload Pictures and Videos