Skip to content
Snippets Groups Projects
Commit 07497c85 authored by MaximeCerise's avatar MaximeCerise
Browse files

add readme

parent 2c4c5de5
No related branches found
No related tags found
No related merge requests found
# Hands-on Reinforcement Learning
<i>MOREAU Maxime, 3A - Computer science & M2 DS, ECL22</i>
### 1. RL for CartPole-v1
#### 1.1 Training
- <b>Policy Network: </b> One simple hidden layer fully connected + softmax
- <b>Reinforcement: </b> Policy Gradient
- <b>Save:</b> [policy_cartpole.pth](saves/policy_cartpole.pth)
- <b>Code:</b> [reinforce_cartpole.py](reinforce_cartpole.py)
Below is the rewards accross 300 episodes :
![Rewards across episodes](saves/plot_rewards500.png)
#### 1.2 Evaluation
- <b>Code:</b> [evaluate_reinforce_cartpole.py](evaluate_reinforce_cartpole.py)
The evaluation has been done one 100 episodes and the sucess threshold is set at a score of 400.
We finally have an evaluation with 100% of sucess:
![alt text](saves/eval_sucess_rate.png)
### 2. Complete RL pipeline to solve CartPole environment with A2C.
Here we set up a complete pipeline to solve Cartpole environment with A2C algorithm.
Wandb has been set up to follow the learning phase.
![alt text](saves/rollout.png)
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment