Archive
Fine-Tuning & Training

The Model Trained From Preferences

Archive
Very Easy
50pts0 solves
The training technique that shapes a model to align with human preferences via a reward model + policy gradient loop is known by this four-letter acronym. Flag format: CONGRESS{acronym}. Example: CONGRESS{sft}.
Show hint
Reinforcement Learning from Human ...

Archive — no submissions accepted

This challenge is preserved for reference. Play live challenges at /challenges.