Six Hundred And Sixty Million Pages
ArchiveHard
Most open-source RAG benchmark corpora, including C4 and RedPajama, derive their web text from one public ongoing web scrape. Name it (two words). Flag format: CONGRESS{two-words}. Example: CONGRESS{open web}.
Show hint
The name is exactly what it is.
Archive — no submissions accepted
This challenge is preserved for reference. Play live challenges at /challenges.