Skip to content
Snippets Groups Projects
Forked from an inaccessible project.
Lebtahi Antoine's avatar
Lebtahi Antoine authored
526daa86
History

Hands-On Reinforcement Learning

In this hands-on project, we will first implement a simple RL algorithm and apply it to solve the CartPole-v1 environment. Once we become familiar with the basic workflow, we will learn to use various tools for machine learning model training, monitoring, and sharing, by applying these tools to train a robotic arm.

REINFORCE

Here we implement a Reinforce algorithm using PyTorch (code here : reinforce_cartpole.py ). Here is the plot showing the total reward accross episodes :

rewards by episodes

Familiarization with a complete RL pipeline: Application to training a robotic arm

In this section, we will use the Stable-Baselines3 package to train a robotic arm using RL.

Thus we use the Stable-Baselines3 documentation to implement the python script a2c_sb3_cartpole.py to solve the CartPole environment with the Advantage Actor-Critic (A2C) algorithm.

Weights & Biases repository

Here is the link of the Weights & Biases project where the training run can be found.

Hugging Face Hub repository

Trained model can also be found in this Hugging face repository : link

Full workflow with panda-gym

In this section, we will get familiar with one of the environments provided by panda-gym, the PandaReachJointsDense-v2. The objective is to learn how to reach any point in 3D space by directly controlling the robot's articulations.

Resulting code is : a2c_sb3_panda_reach.py.

Run is stored in this Weight and Biases project : panda_reach-sb3_a2c

Model can be found in the same Hugging Face project thant previously : link

Author

Antoine Lebtahi

License

MIT