Human Preference

Archive

Hard

200pts44 solves

RLHF's reward model is trained on human comparisons. What does it learn to predict?

Show hint

Given two outputs, which one would a human choose?

Archive — no submissions accepted

This challenge is preserved for reference. Play live challenges at /challenges.