2024 IAIO Problem 6.1

beaver-edge · November 30, 2025, 4:14am

This question focuses on evaluating Markov Decision Processes (MDPs) and Bellman Equations in reinforcement learning. Assume an agent is navigating a grid world with the following characteristics:

The grid is a 3x3 matrix with each cell representing a state s \in \{s_1, s_2, \dots, s_9\}:

image215×148 3.6 KB
The agent can move up, down, left, or right. If the agent tries to move outside the grid, it stays in the same position.
The rewards are given as follows:
- R(s_1, \text{attempts right}) = 1
- R(s_2, \text{attempts down}) = 2
- All other rewards are 0.
The transition probabilities are stochastic:
- With probability 0.7, the agent actually moves in the chosen direction.
- With probability 0.1, the agent actually moves left.
- With probability 0.1, the agent actually moves right.
- With probability 0.1, the agent actually moves down.

The agent starts at s_1. Assume a discount factor of \gamma = 0.9.

Consider performing Q-learning with the value function V initialised to 0. Calculate the expected state-action value Q(s_1, \text{right}) after attempting to move right in the first iteration.

beaver-edge · November 30, 2025, 4:24am

Note: In the Q function in this problem, the second argument should be understood as the agent’s attempted action, not the realized action.

We have

\begin{align*} Q \left(s_1, \text{attempts right} \right)) & = R \left(s_1, \text{attempts right} \right)) + \gamma \left( 0.7 V \left( s_2 \right) + 0.1 V \left( s_1 \right) + 0.1 V \left( s_2 \right) + 0.1 V \left( s_4 \right) \right) \\ & = R \left(s_1, \text{attempts right} \right)) + \gamma \left( 0.7 \cdot 0 + 0.1 \cdot 0 + 0.1 \cdot 0 + 0.1 \cdot 0 \right) \\ & = 1 . \end{align*}

Topic		Replies	Views
2024 IAIO Problem 6.2 2024 IAIO	1	40	November 30, 2025
2024 IAIO Problem 6.3 2024 IAIO	1	81	November 30, 2025
2025 USA-NA-AIO Round 2, Problem 2, Part 3 2025 USA-NA-AIO Round 2	1	248	May 14, 2025
2024 IAIO Question 3.2 2024 IAIO	1	74	August 19, 2025
2025 USA-NA-AIO Round 2, Problem 2, Part 12 2025 USA-NA-AIO Round 2	1	216	May 14, 2025

2024 IAIO Problem 6.1

Related topics