Challenges
16 challenges available
All🎯 Prompt Engineering🤖 Agentic Architectures🔍 RAG & Retrieval🛡️ AI Security📊 Evaluation & Benchmarks🔧 Fine-Tuning & Training⚙️ LLM Infrastructure👁️ Multimodal & Vision📖 Know Your Model
📊 Evaluation & Benchmarks
Vibe Check vs Systematic Eval
Very Easy
0 solves50 pts
📊 Evaluation & Benchmarks
Vibe Check Problem
Very Easy
0 solves50 pts
📊 Evaluation & Benchmarks
BLEU Score Limitations
Easy
0 solves100 pts
📊 Evaluation & Benchmarks
BLEU Limitation
Easy
0 solves100 pts
📊 Evaluation & Benchmarks
LLM-as-a-Judge Biases
Easy
0 solves100 pts
📊 Evaluation & Benchmarks
LLM-as-a-Judge
Easy
0 solves100 pts
📊 Evaluation & Benchmarks
Hallucination Detection
Medium
0 solves150 pts
📊 Evaluation & Benchmarks
Benchmark Contamination
Medium
0 solves150 pts
📊 Evaluation & Benchmarks
Elo Rating System [LMSYS]
Medium
0 solves150 pts
📊 Evaluation & Benchmarks
Faithfulness Score [RAGAS]
Medium
0 solves150 pts
📊 Evaluation & Benchmarks
Elo Ratings
Medium
0 solves150 pts
📊 Evaluation & Benchmarks
Answer Relevance [RAGAS]
Medium
0 solves150 pts
📊 Evaluation & Benchmarks
LLM Regression Testing
Hard
0 solves200 pts
📊 Evaluation & Benchmarks
Human Preference Prediction
Hard
0 solves200 pts
📊 Evaluation & Benchmarks
Regression Testing for LLMs
Hard
0 solves200 pts
📊 Evaluation & Benchmarks
G-Eval
Hard
0 solves200 pts