Derive the Bellman equation for state s_1 assuming the agent uses a policy \pi that always chooses the action “attempt right” in s_1 and “attempt down” in all other states.
Denote this given policy by \pi.
We have
\begin{align*}
V_\pi \left( s_1 \right)
& = R \left(s_1, \text{attempts right} \right))
+ \gamma
\left( 0.7 V_\pi \left( s_2 \right)
+ 0.1 V_\pi \left( s_1 \right)
+ 0.1 V_\pi \left( s_2 \right)
+ 0.1 V_\pi \left( s_4 \right)
\right) \\
& = R \left(s_1, \text{attempts right} \right))
+ \gamma
\left( 0.8 V_\pi \left( s_2 \right)
+ 0.1 V_\pi \left( s_1 \right)
+ 0.1 V_\pi \left( s_4 \right)
\right) \\
& = 1 + 0.9
\left( 0.8 V_\pi \left( s_2 \right)
+ 0.1 V_\pi \left( s_1 \right)
+ 0.1 V_\pi \left( s_4 \right)
\right) .
\end{align*}