BLEU Score Limitations
Easy100 pts0 solves
BLEU score measures n-gram overlap between generated and reference text. "The cat sat on the mat" vs "A feline rested on the rug" scores poorly despite being semantically equivalent.
What is BLEU's fundamental limitation?
Flag format: CONGRESS{limitation_in_snake_case}
Hint
BLEU measures word overlap, not meaning.