The Absorbed Attention Of DeepSeek-V2
ArchiveExpert
DeepSeek-V2 introduced an attention variant that compresses KV into a low-rank latent vector, reducing KV-cache size by 10x versus standard multi-head attention. Three-letter acronym. Flag format: CONGRESS{acronym or full}. Example: CONGRESS{mha}.
Show hint
M + L + A; first is multi, middle is the hidden space.
Archive — no submissions accepted
This challenge is preserved for reference. Play live challenges at /challenges.