RLHF Pipeline
ArchiveMedium
RLHF has 3 stages: supervised fine-tuning _____(1), train a model from human preferences _____(2), optimize with reinforcement learning _____(3).
Flag format: CONGRESS{1:[stage],2:[stage],3:[stage]}
Example: CONGRESS{1:pretrain,2:finetune,3:deploy}
Show hint
SFT first, then learn what humans like, then optimize for it.
Archive — no submissions accepted
This challenge is preserved for reference. Play live challenges at /challenges.