Skip to content
Snippets Groups Projects
Commit 07497c85 authored by MaximeCerise's avatar MaximeCerise
Browse files

add readme

parent 2c4c5de5
Branches
No related tags found
No related merge requests found
# Hands-on Reinforcement Learning
<i>MOREAU Maxime, 3A - Computer science & M2 DS, ECL22</i>
### 1. RL for CartPole-v1
#### 1.1 Training
- <b>Policy Network: </b> One simple hidden layer fully connected + softmax
- <b>Reinforcement: </b> Policy Gradient
- <b>Save:</b> [policy_cartpole.pth](saves/policy_cartpole.pth)
- <b>Code:</b> [reinforce_cartpole.py](reinforce_cartpole.py)
Below is the rewards accross 300 episodes :
![Rewards across episodes](saves/plot_rewards500.png)
#### 1.2 Evaluation
- <b>Code:</b> [evaluate_reinforce_cartpole.py](evaluate_reinforce_cartpole.py)
The evaluation has been done one 100 episodes and the sucess threshold is set at a score of 400.
We finally have an evaluation with 100% of sucess:
![alt text](saves/eval_sucess_rate.png)
### 2. Complete RL pipeline to solve CartPole environment with A2C.
Here we set up a complete pipeline to solve Cartpole environment with A2C algorithm.
Wandb has been set up to follow the learning phase.
![alt text](saves/rollout.png)
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment