Hands-on Reinforcement Learning
MOREAU Maxime, 3A - Computer science & M2 DS, ECL22
1. RL for CartPole-v1
1.1 Training
-
Policy Network: One simple hidden layer fully connected + softmax
-
Reinforcement: Policy Gradient
-
Save: policy_cartpole.pth
-
Code: reinforce_cartpole.py Below is the rewards accross 300 episodes :
model : policy_cartpole.pth
1.2 Evaluation
- Code: evaluate_reinforce_cartpole.py The evaluation has been done one 100 episodes and the sucess threshold is set as 400.
We finally have an evaluation with 100% of sucess:
2. Complete RL pipeline to solve CartPole environment with A2C.
Here we set up a complete pipeline to solve Cartpole environment with A2C algorithm.
Wandb has been set up to track the learning phase : Report here
3. Panda Reach
Stable-Baselines3 package to train A2C model on the PandaReachJointsDense-v3 environment. 500k timesteps.
a2c_sb3_panda_reach.py :
To run pip install -r "requirement_reach.txt
python a2c_sb3_panda_reach.py
- Code: a2c_sb3_panda_reach.py
- Hugging face : Here
- WandB's report : a2c_reach_panda_report
- Preview here