From 54877b9755fe695e891fa1371025f09936523299 Mon Sep 17 00:00:00 2001
From: td <thomasdesgreys@gmail.com>
Date: Wed, 5 Mar 2025 16:25:57 +0100
Subject: [PATCH] clean readme until "Full workflow with panda-gym"

---
 README.md | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index 6a163fc..40ab7da 100644
--- a/README.md
+++ b/README.md
@@ -2,6 +2,8 @@
 Thomas DESGREYS
 ## REINFORCE algorithm
 ### Training
+see [reinforce_cartpole.py](reinforce_cartpole.py)
+
 The model is trained and as save as "reinforce_cartpole_best.pth" and the evolutions of loss and score (aka reward)
 through the episodes are shown below.
 ![cartpole loss](cartpole_loss.png)
@@ -12,11 +14,20 @@ Although, with a bit of luck we end up with a model that reaches the max steps p
 
 
 ### Evaluation
+see [evaluate_reinforce_cartpole.py](evaluate_reinforce_cartpole.py)
+
 During evaluation, we get a 100% success rate for 100 trials.
 
-## Familiarization with a complete RL pipeline: Application to training a robotic arm
-We initialize the 
+## Familiarization with a complete RL pipeline:
+Application to training a robotic arm
+### Stable-Baselines3
+see [a2c_sb3_cartpole.py](a2c_sb3_cartpole.py)
+
+### Hugging Face Hub
+
+[Link to the trained model](https://huggingface.co/Thomstr/A2C_CartPole/tree/main)
 
-https://huggingface.co/Thomstr/A2C_CartPole/tree/main
+### Weights & Biases
+[Link to the wandb run](https://wandb.ai/thomasdgr-ecole-centrale-de-lyon/cartpole/runs/vh4anh20/workspace?nw=nwuserthomasdgr)
 
-https://wandb.ai/thomasdgr-ecole-centrale-de-lyon/cartpole/runs/vh4anh20/workspace?nw=nwuserthomasdgr
\ No newline at end of file
+### Full workflow with panda-gym
-- 
GitLab