diff --git a/README.md b/README.md index 6a163fc44bebfe2a4ac585664b334f49755c4128..40ab7da556692a6b00a8dd31bf1ff62e391d1eba 100644 --- a/README.md +++ b/README.md @@ -2,6 +2,8 @@ Thomas DESGREYS ## REINFORCE algorithm ### Training +see [reinforce_cartpole.py](reinforce_cartpole.py) + The model is trained and as save as "reinforce_cartpole_best.pth" and the evolutions of loss and score (aka reward) through the episodes are shown below.  @@ -12,11 +14,20 @@ Although, with a bit of luck we end up with a model that reaches the max steps p ### Evaluation +see [evaluate_reinforce_cartpole.py](evaluate_reinforce_cartpole.py) + During evaluation, we get a 100% success rate for 100 trials. -## Familiarization with a complete RL pipeline: Application to training a robotic arm -We initialize the +## Familiarization with a complete RL pipeline: +Application to training a robotic arm +### Stable-Baselines3 +see [a2c_sb3_cartpole.py](a2c_sb3_cartpole.py) + +### Hugging Face Hub + +[Link to the trained model](https://huggingface.co/Thomstr/A2C_CartPole/tree/main) -https://huggingface.co/Thomstr/A2C_CartPole/tree/main +### Weights & Biases +[Link to the wandb run](https://wandb.ai/thomasdgr-ecole-centrale-de-lyon/cartpole/runs/vh4anh20/workspace?nw=nwuserthomasdgr) -https://wandb.ai/thomasdgr-ecole-centrale-de-lyon/cartpole/runs/vh4anh20/workspace?nw=nwuserthomasdgr \ No newline at end of file +### Full workflow with panda-gym