Fine-Tuning & Training

RLHF Pipeline

Archive

Medium

150pts35 solves

RLHF has 3 stages: supervised fine-tuning _____(1), train a model from human preferences _____(2), optimize with reinforcement learning _____(3). Flag format: CONGRESS{1:[stage],2:[stage],3:[stage]} Example: CONGRESS{1:pretrain,2:finetune,3:deploy}

Show hint

SFT first, then learn what humans like, then optimize for it.

Archive — no submissions accepted

This challenge is preserved for reference. Play live challenges at /challenges.