DPO
ArchiveHard
DPO achieves RLHF-like results but simpler.
What is DPO's key simplification?
Show hint
One fewer model to train.
Archive — no submissions accepted
This challenge is preserved for reference. Play live challenges at /challenges.
Archive — no submissions accepted
This challenge is preserved for reference. Play live challenges at /challenges.