Skip to content
Snippets Groups Projects
Commit 54877b97 authored by td's avatar td
Browse files

clean readme until "Full workflow with panda-gym"

parent da1ca4a3
Branches
No related tags found
No related merge requests found
......@@ -2,6 +2,8 @@
Thomas DESGREYS
## REINFORCE algorithm
### Training
see [reinforce_cartpole.py](reinforce_cartpole.py)
The model is trained and as save as "reinforce_cartpole_best.pth" and the evolutions of loss and score (aka reward)
through the episodes are shown below.
![cartpole loss](cartpole_loss.png)
......@@ -12,11 +14,20 @@ Although, with a bit of luck we end up with a model that reaches the max steps p
### Evaluation
see [evaluate_reinforce_cartpole.py](evaluate_reinforce_cartpole.py)
During evaluation, we get a 100% success rate for 100 trials.
## Familiarization with a complete RL pipeline: Application to training a robotic arm
We initialize the
## Familiarization with a complete RL pipeline:
Application to training a robotic arm
### Stable-Baselines3
see [a2c_sb3_cartpole.py](a2c_sb3_cartpole.py)
### Hugging Face Hub
[Link to the trained model](https://huggingface.co/Thomstr/A2C_CartPole/tree/main)
https://huggingface.co/Thomstr/A2C_CartPole/tree/main
### Weights & Biases
[Link to the wandb run](https://wandb.ai/thomasdgr-ecole-centrale-de-lyon/cartpole/runs/vh4anh20/workspace?nw=nwuserthomasdgr)
https://wandb.ai/thomasdgr-ecole-centrale-de-lyon/cartpole/runs/vh4anh20/workspace?nw=nwuserthomasdgr
\ No newline at end of file
### Full workflow with panda-gym
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment