Speculative Decoding
Hard200 pts0 solves
A small fast model generates N candidate tokens. The large model verifies all N in one forward pass.
Describe both roles.
Flag format: CONGRESS{small_model:[role],large_model:[role]}
Example: CONGRESS{small_model:embed,large_model:generate}
Hint
Draft cheap, verify expensive. Verification of N tokens in parallel is faster than N sequential generations.