Archive
Fine-Tuning & Training

The Single-Stage ORPO

Archive
Medium
150pts0 solves
Hong et al. (2024) proposed ORPO: a preference-learning loss that adds an odds-ratio preference term onto the supervised fine-tuning loss, removing the need for a separate preference stage. ORPO's key practical claim is that it combines which two stages?
Show hint
The two stages that usually come right after pretraining.

Archive — no submissions accepted

This challenge is preserved for reference. Play live challenges at /challenges.