TD 1 : Hands-On Reinforcement Learning
This TD introduces different algorithms, frameworks and tools used in Reinforcement Learning. The methods are applied to the robotic field: a Cartpole and the PandaReachJointsDense environment.
Files list
This repo contains several files:
-
images
: the images displayed in the README file -
reinforce_cartpole.py
: python script for the REINFORCE section -
a2c_sb3_cartpole.py
: python script for the Familiarization with a complete RL pipeline section -
a2c_sb3_panda_reach.py
: python script for the Full workflow with panda-gym section
Use
A Python installation is needed to run the scripts.
Install the following Python packages:
pip install gym
pip install stable_baselines3
pip install tqdm
pip install wandb
pip install moviepy
pip install huggingface-sb3
pip install tensorboard
pip install panda-gym
pip install dill
pip install zipfile
Then run the scrpits in your command prompt:
python "path_to_your_python_script"
REINFORCE
The REINFORCE algorithm is used to solve the Cartpole environment. The plot showing the total reward accross episodes can be seen below:
The python script used is: reinforce_cartpole.py
.
Familiarization with a complete RL pipeline: Application to training a robotic arm
Stable-Baselines3 and HuggingFace
In this section, the Stable-Baselines3 package is used to solve the Cartpole with the Advantage Actor-Critic (A2C) algorithm.
The python code used is: a2c_sb3_cartpole.py
.
The trained model is shared on HuggingFace, available on the following link: https://huggingface.co/oscarchaufour/a2c-CartPole-v1
Weights & Biases
The Weights & Biases package is used to visualize the taining and the performances of a model. The link to the run visualization on WandB is: https://wandb.ai/oscar-chaufour/a2c-cartpole-v1?workspace=user-oscar-chaufour
The evolution of certain metrics during the training can be visualized. For example the policy loss at each step can be seen below:
The policy loss follows a decreasing trend, which is coherent to the model learning during the training phase.
Full workflow with panda-gym
The full training-visualization-sharing workflow is applied to the PandaReachJointsDense environment. The python script used is: a2c_sb3_panda_reach.py
. It appears that the PandaReachJointsDense-v3 environment is not known and could not be used, as mentionned in the following error:
---------------------------------------------------------------------------
NameNotFound Traceback (most recent call last)
<ipython-input-5-e21f3cb1d225> in <cell line: 21>()
19 register(id=env_id, entry_point='gym.envs.classic_control:CartPoleEnv', max_episode_steps=500)
20
---> 21 env = gym.make(env_id)
22
23 model = A2C("MlpPolicy", env, verbose=1, tensorboard_log=f"runs/{run.id}")
2 frames
/usr/local/lib/python3.10/dist-packages/gym/envs/registration.py in _check_name_exists(ns, name)
210 suggestion_msg = f"Did you mean: `{suggestion[0]}`?" if suggestion else ""
211
--> 212 raise error.NameNotFound(
213 f"Environment {name} doesn't exist{namespace_msg}. {suggestion_msg}"
214 )
NameNotFound: Environment PandaPushJointsDense doesn't exist.
Contribute
This tutorial may contain errors, inaccuracies, typos or areas for improvement. Feel free to contribute to its improvement by opening an issue.
Author
Oscar Chaufour
License
MIT