The DeepSeek Paper's Main Algorithm
ArchiveExpert
The DeepSeek-R1-Zero paper used pure RL without any supervised fine-tuning stage, scaling a specific algorithm to tens of thousands of rollouts per question for reasoning. Name the algorithm (four-letter acronym). Flag format: CONGRESS{acronym}. Example: CONGRESS{rloo}.
Show hint
Group + the classic three-letter PPO.
Archive — no submissions accepted
This challenge is preserved for reference. Play live challenges at /challenges.