Part 3 (10 points, non-coding task)
Define function \text{Softmax}: \Bbb R^d \rightarrow \Bbb R^d, with the $i$th output value as
\text{Softmax}_i \left( \mathbf{z} \right)
= \frac{\exp \left( z_i \right)}{\sum_{j=0}^{d-1} \exp \left( z_j \right)} .
At position l_1 in the attending sequence, its attention score to position l_2 in the being attended sequence for head h is denoted as \alpha_{h, l_1 l_2}.
We can write \alpha_{h, l_1 l_2} in the following form:
\alpha_{h, l_1 l_2}
= \text{Softmax}_{l_2} \left( \color{red}{\boxed{???}} \right) ,
What is the formula in the above red box (reasoning is not required)?