Hands-On Reinforcement Learning
In this hands-on project, we will first implement a simple RL algorithm and apply it to solve the CartPole-v1 environment. Once we become familiar with the basic workflow, we will learn to use various tools for machine learning model training, monitoring, and sharing, by applying these tools to train a robotic arm.
REINFORCE
Here we implement a Reinforce algorithm using PyTorch (code here : reinforce_cartpole.py ). Here is the plot showing the total reward accross episodes :
Familiarization with a complete RL pipeline: Application to training a robotic arm
In this section, we will use the Stable-Baselines3 package to train a robotic arm using RL.
Thus we use the Stable-Baselines3 documentation to implement the python script a2c_sb3_cartpole.py to solve the CartPole environment with the Advantage Actor-Critic (A2C) algorithm.
Weights & Biases repository
Here is the link of the Weights & Biases project where the training run can be found.
Hugging Face Hub repository
Trained model can also be found in this Hugging face repository : link
Full workflow with panda-gym
In this section, we will get familiar with one of the environments provided by panda-gym, the
PandaReachJointsDense-v2
. The objective is to learn how to reach any point in 3D space by directly controlling
the robot's articulations.
Resulting code is : a2c_sb3_panda_reach.py.
Run is stored in this Weight and Biases project : panda_reach-sb3_a2c
Model can be found in the same Hugging Face project thant previously : link
Author
Antoine Lebtahi
License
MIT