Thomas Desgreys
MSO_3_4-TD1

Repository



Hands-On Reinforcement Learning
Thomas DESGREYS

REINFORCE algorithm

Training
see reinforce_cartpole.py
The model is trained and as save as "reinforce_cartpole_best.pth" and the evolutions of loss and score (aka reward)
through the episodes are shown below.


These graphics point out the instability of this training algorithm.
Although, with a bit of luck we end up with a model that reaches the max steps permitted by this gym environment
(500 steps)

Evaluation
see evaluate_reinforce_cartpole.py
During evaluation, we get a 100% success rate for 100 trials.

Familiarization with a complete RL pipeline:
Application to training a robotic arm

Stable-Baselines3
see a2c_sb3_cartpole.py

Hugging Face Hub
Link to the trained model (cartpole)

Weights & Biases
Link to the wandb run (cartpole)

Full workflow with panda-gym
see a2c_sb3_panda_reach.py
As I couldn't make it work on my PC (difficulties to install panda-gym), I've used Google Colab.
see my notebook here (online)
or directly a2c_sb3_panda_reach.ipynb
Link to the trained model (panda reach)
Link to the wandb run (panda reach)