Archive
Evaluation & Benchmarks

G-Eval

Archive
Hard
200pts47 solves
G-Eval improves LLM-as-judge by having the evaluator reason step-by-step before scoring. What does G-Eval add to simple 'rate 1-5' prompting?
Show hint
The same technique that improves reasoning also improves evaluation.

Archive — no submissions accepted

This challenge is preserved for reference. Play live challenges at /challenges.