Fine-Tuning & Training

The DeepSeek Paper's Main Algorithm

Archive

Expert

300pts0 solves

The DeepSeek-R1-Zero paper used pure RL without any supervised fine-tuning stage, scaling a specific algorithm to tens of thousands of rollouts per question for reasoning. Name the algorithm (four-letter acronym). Flag format: CONGRESS{acronym}. Example: CONGRESS{rloo}.

Show hint

Group + the classic three-letter PPO.

Archive — no submissions accepted

This challenge is preserved for reference. Play live challenges at /challenges.