Human Preference
ArchiveHard
RLHF's reward model is trained on human comparisons.
What does it learn to predict?
Show hint
Given two outputs, which one would a human choose?
Archive — no submissions accepted
This challenge is preserved for reference. Play live challenges at /challenges.