USAAIO
March 28, 2025, 5:19am
1
Part 2 (10 points, non-coding task)
Define \nabla_{\mathbf{z}} \ f\left( \mathbf{z} \right) to be the gradient of function f with respect to vector/matrix \mathbf{z} .
Compute the following gradients. Reasoning is required.
\nabla_{\mathbf{x}} \ y .
The final answer should be in a matrix form.
\nabla_{\mathbf{W}} \ y .
The final answer should be in an element-wise form.
\nabla_{\mathbf{b}} \ y .
The final answer should be in a matrix form.
USAAIO
March 28, 2025, 5:20am
2
\color{green}{\text{### WRITE YOUR SOLUTION HERE ###}}
Since \mathbf{y} \in \Bbb R^M and \mathbf{x} \in \Bbb R^N , \nabla_{\mathbf{x}} \ \mathbf{y} \in \Bbb R^{M \times N} .
We have
\begin{align*}
\frac{\partial y_m}{\partial x_n}
& = \frac{\partial \left( \sum_{i=0}^{N-1} w_{mi} x_i + b_m \right)}{\partial x_n} \\
& = w_{mn} .
\end{align*}
Therefore,
\boxed{\nabla_{\mathbf{x}} \ \mathbf{y} = \mathbf{W} }.
Since \mathbf{y} \in \Bbb R^M and \mathbf{W} \in \Bbb R^{M \times N} , \nabla_{\mathbf{W}} \ \mathbf{y} \in \Bbb R^{M \times M \times N} .
We have
\begin{align*}
\frac{\partial y_m}{\partial w_{kn}}
& = \frac{\partial \left( \sum_{i=0}^{N-1} w_{mi} x_i + b_m \right)}{\partial w_{kn}} \\
& = \boxed{x_n \delta_{mk}}.
\end{align*}
Since \mathbf{y} \in \Bbb R^M and \mathbf{b} \in \Bbb R^M , \nabla_{\mathbf{b}} \ \mathbf{y} \in \Bbb R^{M \times M} .
We have
\begin{align*}
\frac{\partial y_m}{\partial w_k}
& = \frac{\partial \left( \sum_{i=0}^{N-1} w_{mi} x_i + b_m \right)}{\partial b_k} \\
& = \delta_{mk} .
\end{align*}
Therefore,
\boxed{\nabla_{\mathbf{b}} \ \mathbf{y} = \mathbf{I}_{M \times M}} .
\color{red}{\text{""" END OF THIS PART """}}
For 1 and 3 we just take partial derivatives like we would when they’re not vectors/matrices.
\boxed{\nabla_x y = W}
We can treat b as Ib , where I is M\times M , and we get \boxed{\nabla_b y = I}
For 2, \frac{\partial y_i}{\partial W_{jk}} only has a value when the element in W we’re looking at is in the same corresponding row as the component of y_i . So we have \boxed{\partial_w y = x_{k}\delta_{ij}} .