TD 1 : Hands-On Reinforcement Learning
MSO 3.4 Apprentissage Automatique
REINFORCE
The REINFORCE algorithm (also known as Vanilla Policy Gradient) is a policy gradient method that optimizes the policy directly using gradient descent. The following is the pseudocode of the REINFORCE algorithm:
🛠️ To be handed in Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in
reinforce_cartpole.py
, and share a plot showing the total reward accross episodes in theREADME.md
. Also, share a filereinforce_cartpole.pth
containing the learned weights. For saving and loading PyTorch models, check this tutorial
Model Evaluation
Now that you have trained your model, it is time to evaluate its performance. Run it with rendering for a few trials and see if the policy is capable of completing the task.
🛠️ To be handed in Implement a script which loads your saved model and use it to solve the cartpole enviroment. Run 100 evaluations and share the final success rate across all evaluations in the
README.md
. Share the code inevaluate_reinforce_cartpole.py
.
From the openai gym wiki we know that the environment counts as solved when the average reward is greater or equal to 195 for over 100 consecutuve trials. From the evaluation script i used the success rate is 1.0 when we allow the maximum number of steps the environment offers.
Familiarization with a complete RL pipeline: Application to training a robotic arm
Stable-Baselines3 (SB3) is a high-level RL library that provides various algorithms and integrated tools to easily train and test reinforcement learning models.
🛠️ To be handed in Store the code in
a2c_sb3_cartpole.py
. Unless otherwise stated, you'll work upon this file for the next sections.
🛠️ To be handed in Link the trained model in the
README.md
file.
🛠️ Share the link of the wandb run in the README.md
file.
wandb: https://wandb.ai/lennartecl-centrale-lyon/sb3?nw=nwuserlennartecl
huggingface: https://huggingface.co/lennartoe/Cartpole-v1/tree/main
Full workflow with panda-gym
Panda-gym is a collection of environments for robotic simulation and control. It provides a range of challenges for training robotic agents in a simulated environment. In this section, you will get familiar with one of the environments provided by panda-gym, the PandaReachJointsDense-v3
. The objective is to learn how to reach any point in 3D space by directly controlling the robot's articulations.
🛠️ To be handed in Share all the code in
a2c_sb3_panda_reach.py
. Share the link of the wandb run and the trained model in theREADME.md
file.
wandb: https://wandb.ai/lennartecl-centrale-lyon/pandasgym_sb3?nw=nwuserlennartecl
huggingface: https://huggingface.co/lennartoe/PandaReachJointsDense-v3/tree/main