2025 USA-NA-AIO Round 2, Problem 2, Part 1

USAAIO · May 14, 2025, 10:31pm

Problem 2 (100 points)

Multi-head attention (MHA) is a big breakthrough in AI. Based on its original form, there are many variants that improved it.

In this problem, you are asked to study multi-head attention and its variants.

We use the following notation in this problem.

B: batch size. b: index of a sample.
L_1: length of an attending sequence. l_1: index of a position in this sequence.
L_2: length of a being attended sequence. l_2: index of a position in this sequence.
D_1: dimension of a hidden state/token in an attending sequence.
D_2: dimension of a hidden state/token in a being attended sequence.
H: number of heads. h: index of a head.
D_v: dimension of a value vector.
D_{qk}: dimension of a query/key vector.

# Run code in this cell

"""
DO NOT MAKE ANY CHANGE IN THIS CELL.
"""

import torch
import torch.nn as nn
import numpy as np

\color{red}{\text{WARNING !!!}}

Beyond importing libraries/modules/classes/functions in the preceeding cell, you are NOT allowed to import anything else for the following purposes:
- As a part of your final solution. For instance, if a problem asks you to build a model without using sklearn but you use it, then you will not earn points.
- Temporarily import something to assist you to get a solution. For instance, if a problem asks you to manually compute eigenvalues but you temporarily use np.linalg.eig to get an answer and then delete your code, then you violate the rule.
Rule of thumb: Each part has its particular purpose to intentionally test you something. Do not attempt to find a shortcut to circumvent the rule.

Do the following tasks (Reasoning is not required).

For each hidden state at position l_1 in an attending sequence, \mathbf{x}_{l_1} \in \Bbb R^{D_1}, we project it into a query vector for head h according to

\mathbf{q}_{l_1,h} = \mathbf{W}^{\mathbf{Q}}_h \mathbf{x}_{l_1} .

What is the shape of \mathbf{W}^{\mathbf{Q}}_h?
For each hidden state at position l_2 in a being attended sequence \mathbf{y}_{l_2} \in \Bbb R^{D_2}, we project it into a key vector for head h according to

\mathbf{k}_{l_2,h} = \mathbf{W}^{\mathbf{K}}_h \mathbf{y}_{l_2} .

What is the shape of \mathbf{W}^{\mathbf{K}}_h?
For each hidden state at position l_2 in a being attended sequence \mathbf{y}_{l_2} \in \Bbb R^{D_2}, we project it into a value vector for head h according to

\mathbf{v}_{l_2,h} = \mathbf{W}^{\mathbf{V}}_h \mathbf{y}_{l_2} .

What is the shape of \mathbf{W}^{\mathbf{V}}_h?

USAAIO · May 14, 2025, 10:32pm

\color{green}{\text{### WRITE YOUR SOLUTION HERE ###}}

\color{red}{\text{""" END OF THIS PART """}}

Topic		Replies	Views
2025 USA-NA-AIO Round 2, Problem 2, Part 5 2025 USA-NA-AIO Round 2	1	72	May 14, 2025
2025 USA-NA-AIO Round 2, Problem 2, Part 2 2025 USA-NA-AIO Round 2	1	83	May 14, 2025
2025 USA-NA-AIO Round 2, Problem 2, Part 6 2025 USA-NA-AIO Round 2	2	76	May 14, 2025
2025 USA-NA-AIO Round 2, Problem 2, Part 4 2025 USA-NA-AIO Round 2	1	60	May 14, 2025
2025 USA-NA-AIO Round 2, Problem 2, Part 7 2025 USA-NA-AIO Round 2	1	59	May 14, 2025