Hands-On Reinforcement Learning
Thomas DESGREYS
REINFORCE algorithm
Training
The model is trained and as save as "reinforce_cartpole_best.pth" and the evolutions of loss and score (aka reward)
through the episodes are shown below.
These graphics point out the instability of this training algorithm.
Although, with a bit of luck we end up with a model that reaches the max steps permitted by this gym environment
(500 steps)
Evaluation
see evaluate_reinforce_cartpole.py
During evaluation, we get a 100% success rate for 100 trials.
Familiarization with a complete RL pipeline:
Application to training a robotic arm
Stable-Baselines3
Hugging Face Hub
Link to the trained model (cartpole)
Weights & Biases
Link to the wandb run (cartpole)
Full workflow with panda-gym
As I couldn't make it work on my PC (difficulties to install panda-gym), I've used Google Colab.
see my notebook here (online) or directly a2c_sb3_panda_reach.ipynb