Policy Iteration

Motivation: I could not find an easy-to-understand post that described exactly how policy iteration worked along with a functional implementation.

What is Policy Iteration?
- Concept
- What is Policy Evaluation?
  - Concept
  - Required GridWorld Environment Code
  - Implementation
- What is Policy Improvement?
  - Concept
  - Implementation
- Putting it all together
- Full Implementation of Policy Iteration
  - LineWorld
  - Deterministic GridWorld
  - Non-deterministic GridWorld
References

What is Policy Iteration?

Concept

Policy Iteration is the process of finding an optimal policy for a given environment. An optimal policy in this context is a 1-D vector of actions where each action at index i maps to a state at index i within the given environment. For example, imagine you're given a 5x5 grid. Each square is a state in the grid, so we can also represent the state space as a 1-D vector (25 points in this particular vector since there are 25 squares). For each square in vector S, there is an optimal action A that can be taken. Policy iteration computes the optimal policy vector such that each action A is the most optimal action to take for a given state S.

Jargon:

The optimal policy for a given environment is also known as Pi star
S is the set of all states within the environment
P is the transition probability matrix. For each element in this matrix, you can think of it as "What is the probability of going from state S to S' given I take some action A?".

Sanity Checks:

the length of the policy vector (also known as pi) is the same length as the state S vector
Pi(x) → S(x) where x is the particular state inside your state space.

What is Policy Evaluation?

Concept

Policy Evaluation is a process that computes the value function given an optimal policy.

To see policy evaluation in action for a 5x5 GridWorld environment, copy the following code to a python file and run it . Ensure you have the requirements installed beforehand as well as the GridWorld environment code included.

Required GridWorld Environment Code

Store this code inside of the same file as where you'll be writing your policy iteration code. If you want to store it in a separate file, make sure you can import it in your policy iteration code file.

Table of Contents:

What is Policy Iteration?

Concept

What is Policy Evaluation?

Concept

Required GridWorld Environment Code