Archive
Evaluation & Benchmarks

LLM Regression Testing

Archive
Hard
200pts46 solves
After updating your prompt, 3 previously working cases fail. What prevents regressions?
Show hint
Like unit tests, but for prompts.

Archive — no submissions accepted

This challenge is preserved for reference. Play live challenges at /challenges.