2024 IAIO Problem 6.2

Derive the Bellman equation for state s_1 assuming the agent uses a policy \pi that always chooses the action “attempt right” in s_1 and “attempt down” in all other states.

Denote this given policy by \pi.
We have

\begin{align*} V_\pi \left( s_1 \right) & = R \left(s_1, \text{attempts right} \right)) + \gamma \left( 0.7 V_\pi \left( s_2 \right) + 0.1 V_\pi \left( s_1 \right) + 0.1 V_\pi \left( s_2 \right) + 0.1 V_\pi \left( s_4 \right) \right) \\ & = R \left(s_1, \text{attempts right} \right)) + \gamma \left( 0.8 V_\pi \left( s_2 \right) + 0.1 V_\pi \left( s_1 \right) + 0.1 V_\pi \left( s_4 \right) \right) \\ & = 1 + 0.9 \left( 0.8 V_\pi \left( s_2 \right) + 0.1 V_\pi \left( s_1 \right) + 0.1 V_\pi \left( s_4 \right) \right) . \end{align*}