Archive
Evaluation & Benchmarks

Elo Rating [LMSYS]

Archive
Medium
150pts32 solves
[LMSYS Chatbot Arena] Chatbot Arena ranks LLMs using a chess-like Elo system. What is the core evaluation mechanism?
Show hint
Pairwise comparison, not absolute scoring.

Archive — no submissions accepted

This challenge is preserved for reference. Play live challenges at /challenges.