From c1bb033e13f066a1e1afc663b5b3cce97877654e Mon Sep 17 00:00:00 2001 From: Benyahia Mohammed Oussama <mohammed.benyahia@etu.ec-lyon.fr> Date: Thu, 6 Mar 2025 13:04:40 +0000 Subject: [PATCH] Edit README.md --- README.md | 40 ++++++++++++++++++++-------------------- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/README.md b/README.md index 629055d..a31264a 100644 --- a/README.md +++ b/README.md @@ -13,8 +13,8 @@ This repository contains my individual work for the **Hands-On Reinforcement Lea ### Training Results - The model was trained for **500 episodes**, showing a steady increase in total rewards. The goal (total reward = 500) was reached consistently after **400 episodes**, confirming successful learning. - **Training Plot:** -  - *(Figure: Total rewards increase per episode, indicating successful learning.)* + <p align="center">  + *(Figure: Total rewards increase per episode, indicating successful learning.)* </p> ### Model Saving - The trained model is saved as: `reinforce_cartpole.pth`. @@ -25,11 +25,11 @@ This repository contains my individual work for the **Hands-On Reinforcement Lea - **Evaluation Results:** - **100%** of the episodes reached a total reward of **500**, demonstrating the model’s reliability. - **Evaluation Plot:** -  - *(Figure: The model consistently reaches a total reward of 500 over 100 evaluation episodes.)* + <p align="center">  + *(Figure: The model consistently reaches a total reward of 500 over 100 evaluation episodes.)*</p> - **Example Video:** -  + <p align="center">  </p> --- @@ -42,14 +42,14 @@ This repository contains my individual work for the **Hands-On Reinforcement Lea ### Training Results - The model was trained for **500,000 timesteps**, reaching a total reward of **500** consistently after **400 episodes**. It continued training for **1,400 episodes**, confirming stable convergence similar to the REINFORCE approach. - **Training Plot:** -  - *(Figure: A2C training performance over time.)* + <p align="center">  + *(Figure: A2C training performance over time.)* </p> ### Evaluation - The trained model was evaluated, achieving **100% success**, with all episodes reaching a total reward of **500**. - **Evaluation Plot:** -  - *(Figure: A2C model consistently achieves perfect performance over 100 episodes.)* + <p align="center">  + *(Figure: A2C model consistently achieves perfect performance over 100 episodes.)* </p> ### Model Upload - The trained A2C model is available on Hugging Face Hub: @@ -70,7 +70,7 @@ This repository contains my individual work for the **Hands-On Reinforcement Lea - The training curve indicates that the **A2C model stabilizes after 1,300 episodes**. - The model exhibits strong and consistent performance. - **Training Plot:** -  + <p align="center">  </p> ### Model Upload - The trained A2C model (tracked with W&B) is available on Hugging Face Hub: @@ -80,12 +80,12 @@ This repository contains my individual work for the **Hands-On Reinforcement Lea - **Evaluation Results:** - **100%** of episodes reached a total reward of **500**, confirming the model’s reliability. - **Evaluation Plot:** -  - *(Figure: Evaluation results tracked using W&B.)* + <p align="center"> + *(Figure: Evaluation results tracked using W&B.)* </p> - **Example Video:** -  + <p align="center"> The A2C model stabilizes the balancing process more efficiently due to its superior performance compared to the REINFORCE approach. - + </p> --- ## 4. Full Workflow with Panda-Gym @@ -104,8 +104,8 @@ This repository contains my individual work for the **Hands-On Reinforcement Lea - The model successfully learns to reach the target efficiently. - It stabilizes after **2,500 episodes**, with minor fluctuations in rewards. - **Training Plot:** -  - *(Figure: The robotic arm’s learning progress over 500,000 timesteps.)* + <p align="center">  + *(Figure: The robotic arm’s learning progress over 500,000 timesteps.)*</p> ### Model Upload and Evaluation - The trained model is available on Hugging Face Hub: @@ -116,14 +116,14 @@ This repository contains my individual work for the **Hands-On Reinforcement Lea - The total reward across all episodes ranged between **0 and -1**, indicating stable control. - **100% of episodes** met the success criteria. -{:width="20%"} +<p align="center"> {:width="20%"}</p> - **Evaluation Plot:** -  - *(Figure: The robotic arm’s performance in the PandaReachJointsDense-v3 environment.)* + <p align="center">  + *(Figure: The robotic arm’s performance in the PandaReachJointsDense-v3 environment.)*</p> - **Example Video:** -  + <p align="center">  </p> --- -- GitLab