Motivation: I could not find an easy-to-understand post that described exactly how Monte Carlo Control worked along with a functional implementation for a particular environment.

Table of Contents:

Introduction

This is based off the Easy21 assignment that David Silver handed out as a lab in this lecture. Here is the exact PDF if you're interested in taking a look:

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/f6978c26-432a-4bb5-9969-3ff7ea6c12ec/easy21.pdf

There are a few sections in here but for this post, we are just going to focus on Monte-Carlo Control in Easy21. We will be implementing the Easy21 game environment, and then learn a optimal policy for this environment using Monte-Carlo Control.

What is Monte-Carlo Control?

Main Idea

<TBW>

Why action-values over state values?

<TBW>

Implementation

Easy21 Environment