10 months ago · 55d247541a
--- a/README.md
+++ b/README.md
@@ -1,42 +1,210 @@
 
				 # open-r1
			
 
				-huggingface  开源的 DeepSeek-R1 复现
			
 
				+huggingface  开源的 DeepSeek-R1 复现，代码不多，复现 DeepSeek-R1 的模型训练，基于少量GPU H800 蒸馏 OpenAI 等大模型。
			
 
				 
			
 
				 ## Develop
			
 
				 
			
 
				 复现步骤：
			
 
				 
			
 
				--   第一步：**复现 R1-Distill 模型**：从 DeepSeek-R1 中提取高质量语料库，以此复现 R1-Distill 模型。
			
 
				+-   第一步：**蒸馏 R1-Distill 小模型**： 蒸馏 Qwen / DeepSeek-R1模型，以此复现 R1-Distill 模型。
			
 
				 -   第二步：**复现纯 RL 流程**：重现 DeepSeek 创建 R1-Zero 的纯强化学习（RL）流程，这可能需要为数学、推理和代码领域构建新的大规模数据集。
			
 
				--   第三步：**多阶段训练验证**：展示通过多阶段训练，能将基础模型转化为经过 RL 调整的模型。
			
 
				+-   第三步：**多阶段训练验证**：通过多阶段训练，能将基础模型转化为经过 RL 调整的模型。
			
 
				 
			
 
				 代码结构：
			
 
				 
			
 
				 -   **src/open_r1 文件夹**：包含用于训练和评估模型以及生成合成数据的脚本。
			
 
				 
			
 
				--   -   `grpo.py`负责运用 GRPO 算法在特定数据集上训练模型；
			
 
				+-   -   `grpo.py`负责运用 GRPO（分组相对策略优化） 算法在特定数据集上训练模型；
			
 
				     -   `sft.py`用于执行模型的监督微调（SFT）；
			
 
				     -   `evaluate.py`对模型进行 R1 基准测试评估；
			
 
				     -   `generate.py`借助 Distilabel 从模型生成合成数据 。
			
 
				 
			
 
				-
			
 
				 开发：
			
 
				 
			
 
				+安装依赖，
			
 
				+
			
 
				 * linux
			
 
				 * CUDA 12.1
			
 
				 * PyTorch v2.5.1
			
 
				 * vllm
			
 
				-*
			
 
				 
			
 
				 ```
			
 
				+git clone https://github.com/huggingface/open-r1
			
 
				+cd open-r1/rl_training  
			
 
				+pip install -r requirements.txt  
			
 
				+
			
 
				+# conda create -n openr1 python=3.11 && conda activate openr1  # 用uv代替conda
			
 
				 uv venv openr1 --python 3.11 && source openr1/bin/activate && uv pip install --upgrade pip
			
 
				 
			
 
				 uv pip install vllm==0.6.6.post1
			
 
				+uv pip install -e ".[dev]"
			
 
				 
			
 
				 # 下载模型
			
 
				-huggingface-cli loginwandb login
			
 
				+huggingface-cli login
			
 
				+wandb login
			
 
				+
			
 
				+sudo apt-get install git-lfs
			
 
				+
			
 
				+```
			
 
				+
			
 
				+
			
 
				+
			
 
				+-   **SFT 监督微调阶段**
			
 
				+
			
 
				+
			
 
				+```
			
 
				+accelerate launch --config_file=configs/zero3.yaml src/open_r1/sft.py \
			
 
				+    --model_name_or_path Qwen/Qwen2.5-Math-1.5B-Instruct \
			
 
				+    --dataset_name HuggingFaceH4/Bespoke-Stratos-17k \
			
 
				+    --learning_rate 2.0e-5 \
			
 
				+    --num_train_epochs 1 \
			
 
				+    --packing \
			
 
				+    --max_seq_length 4096 \
			
 
				+    --per_device_train_batch_size 4 \
			
 
				+    --per_device_eval_batch_size 4 \
			
 
				+    --gradient_accumulation_steps 4 \
			
 
				+    --gradient_checkpointing \
			
 
				+    --bf16 \
			
 
				+    --logging_steps 5 \
			
 
				+    --eval_strategy steps \
			
 
				+    --eval_steps 100 \
			
 
				+    --output_dir data/Qwen2.5-1.5B-Open-R1-Distill
			
 
				+```
			
 
				+
			
 
				+
			
 
				+
			
 
				+-   ### **GRPO** 
			
 
				+
			
 
				+```
			
 
				+accelerate launch --config_file configs/zero3.yaml src/open_r1/grpo.py \
			
 
				+    --output_dir DeepSeek-R1-Distill-Qwen-7B-GRPO \
			
 
				+    --model_name_or_path deepseek-ai/DeepSeek-R1-Distill-Qwen-7B \
			
 
				+    --dataset_name AI-MO/NuminaMath-TIR \
			
 
				+    --max_prompt_length 256 \
			
 
				+    --per_device_train_batch_size 1 \
			
 
				+    --gradient_accumulation_steps 16 \
			
 
				+    --logging_steps 10 \
			
 
				+    --bf16
			
 
				+```
			
 
				+
			
 
				+
			
 
				+
			
 
				+模型评估：
			
 
				+
			
 
				+```
			
 
				+MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
			
 
				+MODEL_ARGS="pretrained=$MODEL,dtype=float16,max_model_length=32768,gpu_memory_utilisation=0.8"
			
 
				+TASK=aime24
			
 
				+OUTPUT_DIR=data/evals/$MODEL
			
 
				+
			
 
				+lighteval vllm $MODEL_ARGS "custom|$TASK|0|0" \
			
 
				+    --custom-tasks src/open_r1/evaluate.py \
			
 
				+    --use-chat-template \
			
 
				+    --system-prompt="Please reason step by step, and put your final answer within \boxed{}." \
			
 
				+    --output-dir $OUTPUT_DIR
			
 
				+ 
			
 
				+```
			
 
				+
			
 
				+
			
 
				+
			
 
				+数据生成 generate.py：
			
 
				+
			
 
				+```
			
 
				+from datasets import load_dataset
			
 
				+from distilabel.models import vLLM
			
 
				+from distilabel.pipeline import Pipeline
			
 
				+from distilabel.steps.tasks import TextGeneration
			
 
				+
			
 
				+prompt_template = """\
			
 
				+You will be given a problem. Please reason step by step, and put your final answer within \boxed{}:
			
 
				+{{ instruction }}"""
			
 
				+
			
 
				+dataset = load_dataset("AI-MO/NuminaMath-TIR", split="train").select(range(10))
			
 
				+
			
 
				+model_id = "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B"
			
 
				+
			
 
				+with Pipeline(
			
 
				+    name="distill-qwen-7b-r1",
			
 
				+    description="A pipeline to generate data from a distilled r1 model",
			
 
				+) as pipeline:
			
 
				+
			
 
				+    llm = vLLM(
			
 
				+        model=model_id,
			
 
				+        tokenizer=model_id,
			
 
				+        extra_kwargs={
			
 
				+            "tensor_parallel_size": 1,
			
 
				+            "max_model_len": 8192,
			
 
				+        },
			
 
				+        generation_kwargs={
			
 
				+            "temperature": 0.6,
			
 
				+            "max_new_tokens": 8192,
			
 
				+        },
			
 
				+    )
			
 
				+    prompt_column = "problem"
			
 
				+    text_generation = TextGeneration(
			
 
				+        llm=llm, 
			
 
				+        template=prompt_template,
			
 
				+        num_generations=4,
			
 
				+        input_mappings={"instruction": prompt_column} if prompt_column is not None else {}
			
 
				+    )
			
 
				+
			
 
				+if __name__ == "__main__":
			
 
				+    distiset = pipeline.run(dataset=dataset)
			
 
				+    distiset.push_to_hub(repo_id="username/numina-deepseek-r1-qwen-7b")
			
 
				+```
			
 
				+
			
 
				+
			
 
				 
			
 
				 ```
			
 
				+pip install openr1==0.5.2  
			
 
				+
			
 
				+# 运行预训练模型  
			
 
				+from openr1 import load_model  
			
 
				+model = load_model("uid/qwen-1.5b-r1-distilled")  
			
 
				+response = model.generate("用牛顿法求√2 的近似值")  
			
 
				+print(response.thought_chain)  # 输出完整推理步骤
			
 
				+```
			
 
				+
			
 
				+
			
 
				+
			
 
				+数据集：
			
 
				+
			
 
				+bespokelabs/Bespoke-Stratos-17k ：是Berkeley Sky-T1数据管道的复制品，它使用 DeepSeek-R1 创建问题、推理轨迹和答案的数据集。随后，使用类似于 R1 论文的蒸馏方法，使用这些数据对 7B 和 32B Qwen 模型进行微调。
			
 
				+
			
 
				+open-thoughts/OpenThoughts-114k：一个“开放的综合推理数据集，包含 114k 个高质量示例，涵盖数学、科学、代码和谜题”。Open Thoughts 工作的一部分。
			
 
				+
			
 
				+perceptioncomputations/dolphin-r1：800k 个样本数据集，包括来自 DeepSeek-R1、Gemini flash 的完成情况和来自 Dolphin chat 的 200k 个样本，目的是帮助训练 R1 风格的模型。
			
 
				+
			
 
				+ServiceNow-AI/R1-Distill-SFT：目前有 17,000 个样本，这是 ServiceNow 语言模型实验室为创建数据以支持 Open-R1 工作所做的努力。
			
 
				+
			
 
				+NovaSky-AI/Sky-T1_data_17k：用于训练 Sky-T1-32B-Preview 的数据集。该数据集是复制 o1 风格推理的早期努力的一部分。在此数据集上训练的模型的训练成本不到 450 美元。这篇博客文章对此进行了更详细的介绍。
			
 
				+
			
 
				+Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B：此数据集扩展了Magpie及其生成指令数据的方法，无需启动提示即可在响应中包含推理。指令由 Llama 3.1 70B Instruct 和 Llama 3.3 70B Instruct 生成，响应由 DeepSeek-R1-Distill-Llama-70B 生成
			
 
				+
			
 
				+
			
 
				+
			
 
				+## **数据生成**
			
 
				+
			
 
				+-   ### **从一个小型蒸馏的R1模型生成数据**
			
 
				+
			
 
				+1块H100显卡，从deepseek-ai/DeepSeek-R1-Distill-Qwen-7B生成数据
			
 
				+
			
 
				+-   **从DeepSeek-R1生成数据**
			
 
				+
			
 
				+使用了2个节点，每个节点配备8块H100显卡，从DeepSeek-R1模型生成数据
			
 
				+
			
 
				+
			
 
				+
			
 
				+## Reference
			
 
				+
			
 
				+https://github.com/huggingface/open-r1
			
 
				+
			
 
				+https://colab.research.google.com/github/huggingface/open-r1
			
 
				+
			
 
				+[deepseek-ai/DeepSeek-R1](https://github.com/deepseek-ai/DeepSeek-R1)
			
 
				+
			
 
				+[deepseek-ai/DeepSeek-V2](https://github.com/deepseek-ai/DeepSeek-V2)
			
 
				 
			
 
				+[deepseek-ai/DeepSeek-VL2](https://github.com/deepseek-ai/DeepSeek-VL2)
			
 
				 
			
 
				+[deepseek-ai/DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3)
			
 
				 
			
 
				-https://github.com/huggingface/open-r1