In this part, \mathbf{w} refers to the one defined in Part 3.
Let \mathbf{w}_t be the normal vector of the separating hyperplane after the $t$th update.
Let j_t be the index of the data point used in the $t$th update.
First, we compute < \mathbf{w}_t , \mathbf{w} >.
We have
\begin{align*}
< \mathbf{w}_t , \mathbf{w} >
& = < \mathbf{w}_{t-1} + y_{j_t} \tilde \phi \left( \mathbf{x}_{j_t} \right) , \mathbf{w} > \\
& = < \mathbf{w}_{t-1} , \mathbf{w} >
+ < y_{j_t} \tilde \phi \left( \mathbf{x}_{j_t} \right) , \mathbf{w} > \\
& = < \mathbf{w}_{t-1} , \mathbf{w} >
+ y_{j_t} \tilde \phi \left( \mathbf{x}_{j_t} \right)^\top \mathbf{w} \\
& = < \mathbf{w}_{t-1} , \mathbf{w} >
+ 1 ,
\end{align*}
where the last equality follows from the analysis in Part 3.
Because \mathbf{w}_0 = 0, we have
< \mathbf{w}_t , \mathbf{w} \> = t .
Second, we establish an upper bound of \left|\left| \mathbf{w}_t \right|\right|_2^2.
We have
\begin{align*}
\left|\left| \mathbf{w}_t \right|\right|_2^2
& = < \mathbf{w}_t , \mathbf{w}_t > \\
& = < \mathbf{w}_{t-1} + y_{j_t} \tilde \phi \left( \mathbf{x}_{j_t} \right)
, \mathbf{w}_{t-1} + y_{j_t} \tilde \phi \left( \mathbf{x}_{j_t} \right) > \\
& = < \mathbf{w}_{t-1} , \mathbf{w}_{t-1} >
+ < y_{j_t} \tilde \phi \left( \mathbf{x}_{j_t} \right)
, y_{j_t} \tilde \phi \left( \mathbf{x}_{j_t} \right) >
+ 2 < \mathbf{w}_{t-1} , y_{j_t} \tilde \phi \left( \mathbf{x}_{j_t} \right) > \\
& = \left|\left| \mathbf{w}_{t-1} \right|\right|_2^2
+ y_{j_t}^2 \tilde \kappa \left( \mathbf{x}_{j_t} , \mathbf{x}_{j_t} \right)
+ 2 < \mathbf{w}_{t-1} , y_{j_t} \tilde \phi \left( \mathbf{x}_{j_t} \right) > \\
& \leq \left|\left| \mathbf{w}_{t-1} \right|\right|_2^2
+ y_{j_t}^2 \tilde \kappa \left( \mathbf{x}_{j_t} , \mathbf{x}_{j_t} \right) \\
& = \left|\left| \mathbf{w}_{t-1} \right|\right|_2^2
+ \tilde \kappa \left( \mathbf{x}_{j_t} , \mathbf{x}_{j_t} \right) \\
& = \left|\left| \mathbf{w}_{t-1} \right|\right|_2^2
+ 1 + a ,
\end{align*}
where the inequality follows from the Perceptron algorithm update criterion,
the second from the last equality follows from the property that y_{j_t} \in \left\{ -1 , 1 \right\},
the last equality follows from the property that \tilde \kappa \left( \mathbf{x}_{j_t} , \mathbf{x}_{j_t} \right) = \kappa \left( \mathbf{x}_{j_t} , \mathbf{x}_{j_t} \right) + a and \kappa is a normalized kernel that \kappa \left( \mathbf{x}_{j_t} , \mathbf{x}_{j_t} \right) = 1.
Because \mathbf{w}_0 = 0, we have
\mathbf{w}_t \leq t \left( 1 + a \right) .
Third, we establish an upper bound of \left|\left| \mathbf{w} \right|\right|_2^2.
We have
\begin{align*}
\left|\left| \mathbf{w} \right|\right|_2^2
& = \mathbf{w}^\top \mathbf{w} \\
& = \left( \sum_{i=1}^m \tilde \alpha_i \tilde \phi \left( \mathbf{x}_i \right)^\top \right)
\left( \sum_{j=1}^m \tilde \alpha_j \tilde \phi \left( \mathbf{x}_j \right) \right) \\
& = \sum_{i=1}^m \sum_{j=1}^m
\tilde \alpha_i
\tilde \kappa \left( \mathbf{x}_i , \mathbf{x}_j \right)
\tilde \alpha_j \\
& = \tilde \alpha^T \tilde K \tilde \alpha \\
& = \left( \tilde K^{-1} \mathbf{y} \right)^T \mathbf{y} \\
& = \mathbf{y}^\top \tilde K^{-1, \top} \mathbf{y} \\
& = \mathbf{y}^\top \tilde K^{-1} \mathbf{y} \\
& = \left|\left| \mathbf{y}^\top \tilde K^{-1} \mathbf{y} \right|\right|_2 \\
& \leq \left|\left| \mathbf{y}^\top \right|\right|_2
\left|\left| \tilde K^{-1} \right|\right|_2
\left|\left| \mathbf{y} \right|\right|_2 \\
& \leq m^{1/2} \left|\left| \tilde K^{-1} \right|\right|_2 m^{1/2} \\
& = m \left|\left| \tilde K^{-1} \right|\right|_2 \\
& = \frac{m}{\lambda_{\min} \left( \tilde K \right)} \\
& \leq \frac{m}{a} ,
\end{align*}
where the seventh equality follows from the property that \tilde K is a symmetric matrix, the second inequality follows from the property that \mathbf{y} \in \left\{ -1 , 1 \right\}^m,
the last inequality follows from the property that the smallest eigenvalue of \tilde K satisfies \lambda_{\min} \left( \tilde K \right) = \lambda_{\min} \left( K \right) + a \geq a.
Following from the Cauchy-Schwarz inequality, we have
\left( < \mathbf{w}_t , \mathbf{w} > \right)^2
\leq \left|\left| \mathbf{w}_t \right|\right|_2^2
\left|\left| \mathbf{w} \right|\right|_2^2 .
Plugging all above three results, we get
t^2 \leq t \left( 1 + a \right) \cdot \frac{m}{a} .
Therefore, the number of updates on \mathbf{w}_t is upper bounded by
t \leq \frac{m \left( 1 + a \right)}{a} .