G-Eval

Hard200 pts0 solves

G-Eval improves LLM-as-judge by having the evaluator reason step-by-step before assigning a score. What does G-Eval add to simple 'rate 1-5' prompting? Flag format: CONGRESS{[what_it_adds]} Example: CONGRESS{multiple_evaluators_averaging}

Hint

The same technique that improves reasoning also improves evaluation.