Archive
Evaluation & Benchmarks

The Arena Of Pairwise Preference

Archive
Easy
100pts0 solves
LMSYS released a live web leaderboard where anonymous model pairs answer a user query and humans vote for the better one, aggregated into Elo scores. Name the site (two words). Flag format: CONGRESS{two-words}. Example: CONGRESS{human bench}.
Show hint
Think of a Roman combat space + the system kind involved.

Archive — no submissions accepted

This challenge is preserved for reference. Play live challenges at /challenges.