G-Eval
ArchiveHard
G-Eval improves LLM-as-judge by having the evaluator reason step-by-step before scoring.
What does G-Eval add to simple 'rate 1-5' prompting?
Show hint
The same technique that improves reasoning also improves evaluation.
Archive — no submissions accepted
This challenge is preserved for reference. Play live challenges at /challenges.