Monte Carlo Control with Blackjack (WIP)

Motivation: I could not find an easy-to-understand post that described exactly how Monte Carlo Control worked along with a functional implementation for a particular environment.

Table of Contents:

Introduction
What is Monte-Carlo Control?
- Main idea
- Why action values over state values?
Implementation
- Easy21 Environment
- Monte-Carlo Policy Evaluation
- Monte-Carlo Policy Improvement
- Putting it all together
Conclusions
- How is this more practical than Dynamic Programming?
- Where is this used in practice?
- Limitations
References

Introduction

This is based off the Easy21 assignment that David Silver handed out as a lab in this lecture. Here is the exact PDF if you're interested in taking a look:

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/f6978c26-432a-4bb5-9969-3ff7ea6c12ec/easy21.pdf

There are a few sections in here but for this post, we are just going to focus on Monte-Carlo Control in Easy21. We will be implementing the Easy21 game environment, and then learn a optimal policy for this environment using Monte-Carlo Control.

Introduction

What is Monte-Carlo Control?

Main Idea

Why action-values over state values?

Implementation

Easy21 Environment