Elo Rating System [LMSYS]
Medium150 pts0 solves
[LMSYS Chatbot Arena] LMSYS's Chatbot Arena shows users two anonymous responses and lets them vote. Rankings update like chess Elo. This is one specific evaluation platform, not a universal eval method.
What is its core evaluation mechanism?
Flag format: CONGRESS{[mechanism]}
Example: CONGRESS{automated_benchmark_score}
Hint
Two models compete head-to-head, humans pick the winner. See lmarena.ai.