Listen

Description

Let's teach our AI how to get from point A to point B of a Frozen Lake environment in the most efficient way possible using dynamic programming. This is considered reinforcement learning and we'll trying two popular techniques (policy iteration and value iteration). We'll use OpenAI's Gym environment and pure python to do this.

Code for this video:
https://github.com/llSourcell/navigating_a_virtual_world_with_dynamic_programming

Please Subscribe! And like. And comment. That's what keeps me going.

Want more inspiration & education? Connect with me:
Twitter: https://twitter.com/sirajraval
Facebook: https://www.facebook.com/sirajology

More learning resources:
https://ocw.mit.edu/courses/aeronautics-and-astronautics/16-410-principles-of-autonomy-and-decision-making-fall-2010/lecture-notes/MIT16_410F10_lec23.pdf
http://uhaweb.hartford.edu/compsci/ccli/projects/QLearning.pdf
https://medium.com/@m.alzantot/deep-reinforcement-learning-demysitifed-episode-2-policy-iteration-value-iteration-and-q-978f9e89ddaa
https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume4/kaelbling96a-html/node19.html
http://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_dp.html
https://www.quora.com/How-is-policy-iteration-different-from-value-iteration
http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/DP.pdf

Join us in the Wizards Slack channel:
http://wizards.herokuapp.com/

And please support me on Patreon:
https://www.patreon.com/user?u=3191693 Instagram: https://www.instagram.com/sirajraval/ Instagram: https://www.instagram.com/sirajraval/