RLHF Pipeline

Medium150 pts0 solves
RLHF has 3 stages: supervised fine-tuning, training a model from human preferences, then optimizing with reinforcement learning. Name all 3 stages. Flag format: CONGRESS{[stage1],[stage2],[stage3]} Example: CONGRESS{pretrain,finetune,deploy}
Hint
SFT first, then learn what humans like, then optimize for it.