My posts on RL as I self-study. Each post breaks down a core concept with a mixture of theory and implementation. My motivation for these posts is to drastically shorten the learning curve for people that want to learn these concepts. There is a lot of amazing learning resources out there but the differentiation here is a focus on fundamental RL concepts where we do an end-to-end path from concept to functional implementation.
Monte Carlo Control with Blackjack (WIP)
Temporal Difference Learning with Blackjack (WIP)
https://www.youtube.com/watch?v=2pWv7GOvuf0&list=PLqYmG7hTraZDM-OYHWgPebj2MfCFzFObQ&index=1