2024 IAIO Problem 6.3

Compute the value function V_\pi for the policy \pi.

We have

\begin{align*} V_\pi \left( s_1 \right) & = 1 + 0.9 \left( 0.8 V_\pi \left( s_2 \right) + 0.1 V_\pi \left( s_1 \right) + 0.1 V_\pi \left( s_4 \right) \right) \\ V_\pi \left( s_2 \right) & = 2 + 0.9 \left( 0.1 V_\pi \left( s_1 \right) + 0.1 V_\pi \left( s_3 \right) + 0.8 V_\pi \left( s_5 \right) \right) \\ V_\pi \left( s_3 \right) & = 0.9 \left( 0.1 V_\pi \left( s_2 \right) + 0.1 V_\pi \left( s_3 \right) + 0.8 V_\pi \left( s_6 \right) \right) \\ V_\pi \left( s_4 \right) & = 0.9 \left( 0.1 V_\pi \left( s_4 \right) + 0.1 V_\pi \left( s_5 \right) + 0.8 V_\pi \left( s_7 \right) \right) \\ V_\pi \left( s_5 \right) & = 0.9 \left( 0.1 V_\pi \left( s_4 \right) + 0.1 V_\pi \left( s_6 \right) + 0.8 V_\pi \left( s_8 \right) \right) \\ V_\pi \left( s_6 \right) & = 0.9 \left( 0.1 V_\pi \left( s_5 \right) + 0.1 V_\pi \left( s_6 \right) + 0.8 V_\pi \left( s_9 \right) \right) \\ V_\pi \left( s_7 \right) & = 0.9 \left( 0.9 V_\pi \left( s_7 \right) + 0.1 V_\pi \left( s_8 \right) \right) \\ V_\pi \left( s_8 \right) & = 0.9 \left( 0.1 V_\pi \left( s_7 \right) + 0.1 V_\pi \left( s_9 \right) + 0.8 V_\pi \left( s_8 \right) \right) \\ V_\pi \left( s_9 \right) & = 0.9 \left( 0.1 V_\pi \left( s_8 \right) + 0.9 V_\pi \left( s_9 \right) \right) \end{align*}

The equations for states s_4, \cdots , s_9 implies

V_\pi \left( s_i \right) = 0, \ \forall \ i \in \left\{ 4, 5 , \cdots , 9 \right\} .

Therefore, we only need to solve the first three equations above. They take the following simplified form:

\begin{align*} V_\pi \left( s_1 \right) & = 1 + 0.9 \left( 0.8 V_\pi \left( s_2 \right) + 0.1 V_\pi \left( s_1 \right) \right) \\ V_\pi \left( s_2 \right) & = 2 + 0.9 \left( 0.1 V_\pi \left( s_1 \right) + 0.1 V_\pi \left( s_3 \right) \right) \\ V_\pi \left( s_3 \right) & = 0.9 \left( 0.1 V_\pi \left( s_2 \right) + 0.1 V_\pi \left( s_3 \right) \right) \end{align*}

To solve this system of equations, we write it in the matrix form:

\begin{align*} \begin{bmatrix} 0.91 & -0.72 & 0 \\ -0.09 & 1 & -0.09 \\ 0 & -0.09 & 0.91 \end{bmatrix} \begin{bmatrix} V_\pi \left( s_1 \right) \\ V_\pi \left( s_2 \right) \\ V_\pi \left( s_3 \right) \end{bmatrix} = \begin{bmatrix} 1 \\ 2 \\ 0 \end{bmatrix} \end{align*}

By solving this system of equations, we get

\begin{bmatrix} V_\pi \left( s_1 \right) \\ V_\pi \left( s_2 \right) \\ V_\pi \left( s_3 \right) \end{bmatrix} = \begin{bmatrix} 2.094 \\ 2.282 \\ 0.226 \end{bmatrix}