Archive
Evaluation & Benchmarks

Human Preference

Archive
Hard
200pts44 solves
RLHF's reward model is trained on human comparisons. What does it learn to predict?
Show hint
Given two outputs, which one would a human choose?

Archive — no submissions accepted

This challenge is preserved for reference. Play live challenges at /challenges.