Archive
Fine-Tuning & Training

RLHF Pipeline

Archive
Medium
150pts35 solves
RLHF has 3 stages: supervised fine-tuning _____(1), train a model from human preferences _____(2), optimize with reinforcement learning _____(3). Flag format: CONGRESS{1:[stage],2:[stage],3:[stage]} Example: CONGRESS{1:pretrain,2:finetune,3:deploy}
Show hint
SFT first, then learn what humans like, then optimize for it.

Archive — no submissions accepted

This challenge is preserved for reference. Play live challenges at /challenges.