G-Eval
Hard200 pts0 solves
G-Eval improves LLM-as-judge by having the evaluator reason step-by-step before assigning a score.
What does G-Eval add to simple 'rate 1-5' prompting?
Flag format: CONGRESS{[what_it_adds]}
Example: CONGRESS{multiple_evaluators_averaging}
Hint
The same technique that improves reasoning also improves evaluation.