From c1bb033e13f066a1e1afc663b5b3cce97877654e Mon Sep 17 00:00:00 2001
From: Benyahia Mohammed Oussama <mohammed.benyahia@etu.ec-lyon.fr>
Date: Thu, 6 Mar 2025 13:04:40 +0000
Subject: [PATCH] Edit README.md

---
 README.md | 40 ++++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/README.md b/README.md
index 629055d..a31264a 100644
--- a/README.md
+++ b/README.md
@@ -13,8 +13,8 @@ This repository contains my individual work for the **Hands-On Reinforcement Lea
 ### Training Results  
 - The model was trained for **500 episodes**, showing a steady increase in total rewards. The goal (total reward = 500) was reached consistently after **400 episodes**, confirming successful learning.
 - **Training Plot:**  
-  ![Training Plot](/images/train_rewards.png)  
-  *(Figure: Total rewards increase per episode, indicating successful learning.)*
+  <p align="center"> ![Training Plot](/images/train_rewards.png)  
+  *(Figure: Total rewards increase per episode, indicating successful learning.)* </p>
 
 ### Model Saving  
 - The trained model is saved as: `reinforce_cartpole.pth`.
@@ -25,11 +25,11 @@ This repository contains my individual work for the **Hands-On Reinforcement Lea
 - **Evaluation Results:**  
   - **100%** of the episodes reached a total reward of **500**, demonstrating the model’s reliability.
 - **Evaluation Plot:**  
-  ![Evaluation Plot](/images/eval_rewards.png)  
-  *(Figure: The model consistently reaches a total reward of 500 over 100 evaluation episodes.)*
+  <p align="center"> ![Evaluation Plot](/images/eval_rewards.png)  
+  *(Figure: The model consistently reaches a total reward of 500 over 100 evaluation episodes.)*</p>
 
 - **Example Video:**  
-  ![REINFORCE CartPole Evaluation Video](reinforce_cartpole.mp4)  
+  <p align="center"> ![REINFORCE CartPole Evaluation Video](reinforce_cartpole.mp4) </p> 
 
 ---
 
@@ -42,14 +42,14 @@ This repository contains my individual work for the **Hands-On Reinforcement Lea
 ### Training Results  
 - The model was trained for **500,000 timesteps**, reaching a total reward of **500** consistently after **400 episodes**. It continued training for **1,400 episodes**, confirming stable convergence similar to the REINFORCE approach.
 - **Training Plot:**  
-  ![SB3 CartPole Training Plot](/images/sb3_train.png)  
-  *(Figure: A2C training performance over time.)*
+  <p align="center"> ![SB3 CartPole Training Plot](/images/sb3_train.png)  
+  *(Figure: A2C training performance over time.)* </p>
 
 ### Evaluation  
 - The trained model was evaluated, achieving **100% success**, with all episodes reaching a total reward of **500**.
 - **Evaluation Plot:**  
-  ![SB3 CartPole Evaluation Plot](/images/sb3_eval.png)  
-  *(Figure: A2C model consistently achieves perfect performance over 100 episodes.)*
+  <p align="center"> ![SB3 CartPole Evaluation Plot](/images/sb3_eval.png)  
+  *(Figure: A2C model consistently achieves perfect performance over 100 episodes.)* </p>
 
 ### Model Upload  
 - The trained A2C model is available on Hugging Face Hub:  
@@ -70,7 +70,7 @@ This repository contains my individual work for the **Hands-On Reinforcement Lea
   - The training curve indicates that the **A2C model stabilizes after 1,300 episodes**.  
   - The model exhibits strong and consistent performance.
 - **Training Plot:**  
-  ![W&B Training Plot](/images/sb3_wb_train.png)  
+  <p align="center"> ![W&B Training Plot](/images/sb3_wb_train.png) </p> 
 
 ### Model Upload  
 - The trained A2C model (tracked with W&B) is available on Hugging Face Hub:  
@@ -80,12 +80,12 @@ This repository contains my individual work for the **Hands-On Reinforcement Lea
 - **Evaluation Results:**  
   - **100%** of episodes reached a total reward of **500**, confirming the model’s reliability.
 - **Evaluation Plot:**  
-  ![W&B Evaluation Plot](/images/sb3_wb_eval.png)  
-  *(Figure: Evaluation results tracked using W&B.)*
+  <p align="center">![W&B Evaluation Plot](/images/sb3_wb_eval.png)  
+  *(Figure: Evaluation results tracked using W&B.)* </p>
 - **Example Video:**  
-  ![W&B Evaluation Video](a2c_sb3_cartpole.mp4)  
+  <p align="center">![W&B Evaluation Video](a2c_sb3_cartpole.mp4)  
   The A2C model stabilizes the balancing process more efficiently due to its superior performance compared to the REINFORCE approach.
-
+  </p>
 ---
 
 ## 4. Full Workflow with Panda-Gym
@@ -104,8 +104,8 @@ This repository contains my individual work for the **Hands-On Reinforcement Lea
   - The model successfully learns to reach the target efficiently.  
   - It stabilizes after **2,500 episodes**, with minor fluctuations in rewards.
 - **Training Plot:**  
-  ![Training Total Rewards Plot](/images/panda_sb3_train.png)  
-  *(Figure: The robotic arm’s learning progress over 500,000 timesteps.)*
+  <p align="center"> ![Training Total Rewards Plot](/images/panda_sb3_train.png)  
+  *(Figure: The robotic arm’s learning progress over 500,000 timesteps.)*</p>
 
 ### Model Upload and Evaluation  
 - The trained model is available on Hugging Face Hub:  
@@ -116,14 +116,14 @@ This repository contains my individual work for the **Hands-On Reinforcement Lea
   - The total reward across all episodes ranged between **0 and -1**, indicating stable control.  
   - **100% of episodes** met the success criteria.
 
-![Result](/images/panda_sb3_eval1.png){:width="20%"}
+<p align="center"> ![Result](/images/panda_sb3_eval1.png){:width="20%"}</p>
 
   
 - **Evaluation Plot:**  
-  ![Evaluation Plot](/images/panda_sb3_eval.png)  
-  *(Figure: The robotic arm’s performance in the PandaReachJointsDense-v3 environment.)*
+  <p align="center"> ![Evaluation Plot](/images/panda_sb3_eval.png)  
+  *(Figure: The robotic arm’s performance in the PandaReachJointsDense-v3 environment.)*</p>
 - **Example Video:**  
-  ![Panda-Gym Evaluation Video](a2c_sb3_panda_reach.mp4)  
+  <p align="center"> ![Panda-Gym Evaluation Video](a2c_sb3_panda_reach.mp4) </p> 
 
 ---
 
-- 
GitLab