2025 USA-NA-AIO Round 2, Problem 2, Part 4

USAAIO · May 14, 2025, 10:33pm

Part 4 (5 points, non-coding task)

At position l_1 in an attending sequence, for head h, the information extracted from attending to a being attended sequence is given by

\mathbf{o}_{h,l_1} = \sum_{l_2 = 0}^{L_2 - 1} \alpha_{h, l_1 l_2} \mathbf{v}_{l_2,h} .

We hereafter call \mathbf{o}_{h,l_1} a pre-out-projection output vector.

Do the following tasks.

What is the shape of vector \mathbf{o}_{h,l_1}?
We concatenate \left\{\mathbf{o}_{h,l_1} : h \in \left\{ 0, 1 , \cdots , H-1 \right\} \right\} along axis 0:

\mathbf{o}_{l_1} = \begin{bmatrix} \mathbf{o}_{0,l_1} \\ \mathbf{o}_{1,l_1} \\ \vdots \\ \mathbf{o}_{H-1,l_1} \end{bmatrix}

What is the shape of \mathbf{o}_{l_1}?
We project \mathbf{o}_{l_1} to a post-out-projection output vector via an out-projection matrix:

\mathbf{x}_{l_1}^{out} = \mathbf{W}^O \mathbf{o}_{l_1} \in \Bbb R^{D_1} ,

where

\mathbf{W}^O = \begin{bmatrix} \mathbf{W}^O_0 & \mathbf{W}^O_1 & \cdots & \mathbf{W}^O_{H-1} \end{bmatrix}

What is the shape of \mathbf{W}^O_h for each h \in \left\{ 0 , 1 , \cdots , H-1 \right\} and \mathbf{W}^O?

USAAIO · May 14, 2025, 10:34pm

\color{green}{\text{### WRITE YOUR SOLUTION HERE ###}}

The shape of \mathbf{o}_{h,l_1} is \left( D_v, \right).
The shape of \mathbf{o}_{l_1} is \left( H \cdot D_v, \right).
For each head h, the shape of \mathbf{W}^O_h is \left( D_1, D_v \right).

The shape of \mathbf{W}^O is \left( D_1, H \cdot D_v \right).

\color{red}{\text{""" END OF THIS PART """}}

Topic		Replies	Views
2025 USA-NA-AIO Round 2, Problem 2, Part 2 2025 USA-NA-AIO Round 2	1	38	May 14, 2025
2025 USA-NA-AIO Round 2, Problem 2, Part 3 2025 USA-NA-AIO Round 2	1	30	May 14, 2025
2025 USA-NA-AIO Round 1, Problem 3, Part 12 2025 USA-NA-AIO Round 1	1	135	March 28, 2025
2025 USA-NA-AIO Round 1, Problem 2, Part 7 2025 USA-NA-AIO Round 1	1	119	March 28, 2025
2025 USA-NA-AIO Round 1, Problem 3, Part 13 2025 USA-NA-AIO Round 1	1	127	March 28, 2025