diff --git a/README.md b/README.md index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0c50cc419c84a15a5e515b3706d6aa27ac978538 100644 --- a/README.md +++ b/README.md @@ -0,0 +1,30 @@ +# Hands-on Reinforcement Learning +<i>MOREAU Maxime, 3A - Computer science & M2 DS, ECL22</i> +### 1. RL for CartPole-v1 +#### 1.1 Training + +- <b>Policy Network: </b> One simple hidden layer fully connected + softmax + +- <b>Reinforcement: </b> Policy Gradient + +- <b>Save:</b> [policy_cartpole.pth](saves/policy_cartpole.pth) +- <b>Code:</b> [reinforce_cartpole.py](reinforce_cartpole.py) +Below is the rewards accross 300 episodes : + + +#### 1.2 Evaluation + +- <b>Code:</b> [evaluate_reinforce_cartpole.py](evaluate_reinforce_cartpole.py) +The evaluation has been done one 100 episodes and the sucess threshold is set at a score of 400. + +We finally have an evaluation with 100% of sucess: + + + +### 2. Complete RL pipeline to solve CartPole environment with A2C. + +Here we set up a complete pipeline to solve Cartpole environment with A2C algorithm. + +Wandb has been set up to follow the learning phase. + +