2025 USA-NA-AIO Round 2, Problem 2, Part 14

USAAIO · May 14, 2025, 10:47pm

Part 14 (5 points, non-coding task)

In generative AI, such as GPT, we autoprogressively generate tokens. For a given position l, the keys and values on this position \mathbf{k}_l and \mathbf{v}_l are repeatly used in generating tokens for positions l' > l.

Therefore, the values of \mathbf{k}_l and \mathbf{v}_l are typically stored in cache (no need to revise your code in earlier parts if your code does not support this). We call such storage as kv-cache.

Do the following tasks to compute kv-cache in different models while doing autoregressive inference: (reasoning is required)

In MHA, the kv-cache at each position is 2 D. Explain why.
In MLA, what is the kv-cache at each position?

USAAIO · May 14, 2025, 10:47pm

\color{green}{\text{### WRITE YOUR SOLUTION HERE ###}}

In MHA, \mathbf{k}_l, \mathbf{v}_l \in \Bbb R^D. Therefore, the kv-cache at each position is \boxed{2D}.
In MLA, because \mathbf{W}^{\mathbf{DKV}} \in \Bbb R^{r \times D}, we have \mathbf{\hat k}_l, \mathbf{\hat v}_l \in \Bbb R^r.

In addition, because \mathbf{\hat k}_l = \mathbf{\hat v}_l, the kv-cache at each position is \boxed{r}.

\color{red}{\text{""" END OF THIS PART """}}

Topic		Replies	Views
2025 USA-NA-AIO Round 2, Problem 2, Part 12 2025 USA-NA-AIO Round 2	1	66	May 14, 2025
2025 USA-NA-AIO Round 2, Problem 2, Part 9 2025 USA-NA-AIO Round 2	1	58	May 14, 2025
2025 USA-NA-AIO Round 2, Problem 2, Part 5 2025 USA-NA-AIO Round 2	1	72	May 14, 2025
2025 USA-NA-AIO Round 2, Problem 2, Part 2 2025 USA-NA-AIO Round 2	1	83	May 14, 2025
2025 USA-NA-AIO Round 2, Problem 2, Part 1 2025 USA-NA-AIO Round 2	1	112	May 14, 2025

2025 USA-NA-AIO Round 2, Problem 2, Part 14

Part 14 (5 points, non-coding task)

Related topics