Senneville Armand
MSO_3_4-TD2

Repository



Hands-on Reinforcement Learning
MOREAU Maxime, 3A - Computer science & M2 DS, ECL22

1. RL for CartPole-v1

1.1 Training


Policy Network:  One simple hidden layer fully connected + softmax


Reinforcement:  Policy Gradient


Save: policy_cartpole.pth


Code: reinforce_cartpole.py
Below is the rewards accross 300 episodes :

model : policy_cartpole.pth


1.2 Evaluation


Code: evaluate_reinforce_cartpole.py
The evaluation has been done one 100 episodes and the sucess threshold is set as 400.

We finally have an evaluation with 100% of sucess:


2. Complete RL pipeline to solve CartPole environment with A2C.
Here we set up a complete pipeline to solve Cartpole environment with A2C algorithm.
Wandb has been set up to follow the learning phase.
https://wandb.ai/maximecerise-ecl/cartpole-a2c