Part 5 (10 points, coding task)
In this part, you are asked to build your own multi-head attention module that subclasses nn.Module.
- 
For simplicity, we ignore any masking. That is, each position in an attending sequence attends to all positions in a being attended sequence. 
- 
In your code, you do not need to worry about whether your code is efficient in an autoprogressive token generation process when your module is used in inference in a GPT-like task. That is, if we use your code in a GPT-like task to autoprogressively generate tokens, it is totally fine if you repeatly generate the same key and value at a given position rather than more efficiently storing their values in cache. 
- 
The class name is MyMHA.
- 
Attributes: - 
D_1: Dimension of a hidden state/token in an attending sequence.
- 
D_2: Dimension of a hidden state/token in a being attended sequence.
- 
D_v: Dimension of a value vector.
- 
D_qk: Dimension of a query/key vector.
- 
H: Number of heads.
- 
W_Q: A linear module whose weights is a query-projection matrix. The shape should be consistant with your answer in Part 2. No bias.
- 
W_K: A linear module whose weights is key-projection matrix. The shape should be consistant with your answer in Part 2. No bias.
- 
W_V: A linear module whose weights is value-projection matrix. The shape should be consistant with your answer in Part 2. No bias.
- 
W_O: A linear module whose weights is an out-projection matrix. The shape should be consistant with your answer in Part 4. No bias.
 
- 
- 
Method __init__:- 
Inputs - 
D_1
- 
D_2
- 
D_qk
- 
D_v
- 
H
 
- 
- 
Outputs - None
 
- 
What to do inside this method - Initialize attribute values
 
 
- 
- 
Method forward:- 
Inputs: - 
An attending sequence (tensor) with shape (B,L_1,D_1)
- 
A being addended sequence (tensor) with shape (B,L_2,D_2)
 
- 
- 
Outputs - Post-out-projection outputs with shape (B,L_1,D_1)
 
- Post-out-projection outputs with shape 
- 
What to do inside this method - 
Compute the outputs 
- 
After each operation, add a comment on the tensor shape 
- 
Do not use any loop 
 
- 
 
-