2024 IAIO Problem 6.2

beaver-edge · November 30, 2025, 4:25am

Derive the Bellman equation for state s_1 assuming the agent uses a policy \pi that always chooses the action “attempt right” in s_1 and “attempt down” in all other states.

beaver-edge · November 30, 2025, 4:26am

Denote this given policy by \pi.
We have

\begin{align*} V_\pi \left( s_1 \right) & = R \left(s_1, \text{attempts right} \right)) + \gamma \left( 0.7 V_\pi \left( s_2 \right) + 0.1 V_\pi \left( s_1 \right) + 0.1 V_\pi \left( s_2 \right) + 0.1 V_\pi \left( s_4 \right) \right) \\ & = R \left(s_1, \text{attempts right} \right)) + \gamma \left( 0.8 V_\pi \left( s_2 \right) + 0.1 V_\pi \left( s_1 \right) + 0.1 V_\pi \left( s_4 \right) \right) \\ & = 1 + 0.9 \left( 0.8 V_\pi \left( s_2 \right) + 0.1 V_\pi \left( s_1 \right) + 0.1 V_\pi \left( s_4 \right) \right) . \end{align*}

Topic		Replies	Views
2024 IAIO Problem 6.1 2024 IAIO	1	98	November 30, 2025
2024 IAIO Problem 6.3 2024 IAIO	1	100	November 30, 2025
2024 IAIO Question 5.1 2024 IAIO	1	144	August 19, 2025
2025 USA-NA-AIO Round 1, Problem 3, Part 15 2025 USA-NA-AIO Round 1	1	705	March 28, 2025
2024 IAIO Question 3.2 2024 IAIO	1	88	August 19, 2025

2024 IAIO Problem 6.2

Related topics