Tests a per-token-identity (or fuzzy-n-gram) lookup table of attention patterns built during prefill and queried during decode, plus its use for adaptive KV-cache quantization. The full write-up is in ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results