Skip to content
Snippets Groups Projects
Commit c1bb033e authored by Benyahia Mohammed Oussama's avatar Benyahia Mohammed Oussama
Browse files

Edit README.md

parent cc8cb174
Branches
No related tags found
No related merge requests found
......@@ -13,8 +13,8 @@ This repository contains my individual work for the **Hands-On Reinforcement Lea
### Training Results
- The model was trained for **500 episodes**, showing a steady increase in total rewards. The goal (total reward = 500) was reached consistently after **400 episodes**, confirming successful learning.
- **Training Plot:**
![Training Plot](/images/train_rewards.png)
*(Figure: Total rewards increase per episode, indicating successful learning.)*
<p align="center"> ![Training Plot](/images/train_rewards.png)
*(Figure: Total rewards increase per episode, indicating successful learning.)* </p>
### Model Saving
- The trained model is saved as: `reinforce_cartpole.pth`.
......@@ -25,11 +25,11 @@ This repository contains my individual work for the **Hands-On Reinforcement Lea
- **Evaluation Results:**
- **100%** of the episodes reached a total reward of **500**, demonstrating the model’s reliability.
- **Evaluation Plot:**
![Evaluation Plot](/images/eval_rewards.png)
*(Figure: The model consistently reaches a total reward of 500 over 100 evaluation episodes.)*
<p align="center"> ![Evaluation Plot](/images/eval_rewards.png)
*(Figure: The model consistently reaches a total reward of 500 over 100 evaluation episodes.)*</p>
- **Example Video:**
![REINFORCE CartPole Evaluation Video](reinforce_cartpole.mp4)
<p align="center"> ![REINFORCE CartPole Evaluation Video](reinforce_cartpole.mp4) </p>
---
......@@ -42,14 +42,14 @@ This repository contains my individual work for the **Hands-On Reinforcement Lea
### Training Results
- The model was trained for **500,000 timesteps**, reaching a total reward of **500** consistently after **400 episodes**. It continued training for **1,400 episodes**, confirming stable convergence similar to the REINFORCE approach.
- **Training Plot:**
![SB3 CartPole Training Plot](/images/sb3_train.png)
*(Figure: A2C training performance over time.)*
<p align="center"> ![SB3 CartPole Training Plot](/images/sb3_train.png)
*(Figure: A2C training performance over time.)* </p>
### Evaluation
- The trained model was evaluated, achieving **100% success**, with all episodes reaching a total reward of **500**.
- **Evaluation Plot:**
![SB3 CartPole Evaluation Plot](/images/sb3_eval.png)
*(Figure: A2C model consistently achieves perfect performance over 100 episodes.)*
<p align="center"> ![SB3 CartPole Evaluation Plot](/images/sb3_eval.png)
*(Figure: A2C model consistently achieves perfect performance over 100 episodes.)* </p>
### Model Upload
- The trained A2C model is available on Hugging Face Hub:
......@@ -70,7 +70,7 @@ This repository contains my individual work for the **Hands-On Reinforcement Lea
- The training curve indicates that the **A2C model stabilizes after 1,300 episodes**.
- The model exhibits strong and consistent performance.
- **Training Plot:**
![W&B Training Plot](/images/sb3_wb_train.png)
<p align="center"> ![W&B Training Plot](/images/sb3_wb_train.png) </p>
### Model Upload
- The trained A2C model (tracked with W&B) is available on Hugging Face Hub:
......@@ -80,12 +80,12 @@ This repository contains my individual work for the **Hands-On Reinforcement Lea
- **Evaluation Results:**
- **100%** of episodes reached a total reward of **500**, confirming the model’s reliability.
- **Evaluation Plot:**
![W&B Evaluation Plot](/images/sb3_wb_eval.png)
*(Figure: Evaluation results tracked using W&B.)*
<p align="center">![W&B Evaluation Plot](/images/sb3_wb_eval.png)
*(Figure: Evaluation results tracked using W&B.)* </p>
- **Example Video:**
![W&B Evaluation Video](a2c_sb3_cartpole.mp4)
<p align="center">![W&B Evaluation Video](a2c_sb3_cartpole.mp4)
The A2C model stabilizes the balancing process more efficiently due to its superior performance compared to the REINFORCE approach.
</p>
---
## 4. Full Workflow with Panda-Gym
......@@ -104,8 +104,8 @@ This repository contains my individual work for the **Hands-On Reinforcement Lea
- The model successfully learns to reach the target efficiently.
- It stabilizes after **2,500 episodes**, with minor fluctuations in rewards.
- **Training Plot:**
![Training Total Rewards Plot](/images/panda_sb3_train.png)
*(Figure: The robotic arm’s learning progress over 500,000 timesteps.)*
<p align="center"> ![Training Total Rewards Plot](/images/panda_sb3_train.png)
*(Figure: The robotic arm’s learning progress over 500,000 timesteps.)*</p>
### Model Upload and Evaluation
- The trained model is available on Hugging Face Hub:
......@@ -116,14 +116,14 @@ This repository contains my individual work for the **Hands-On Reinforcement Lea
- The total reward across all episodes ranged between **0 and -1**, indicating stable control.
- **100% of episodes** met the success criteria.
![Result](/images/panda_sb3_eval1.png){:width="20%"}
<p align="center"> ![Result](/images/panda_sb3_eval1.png){:width="20%"}</p>
- **Evaluation Plot:**
![Evaluation Plot](/images/panda_sb3_eval.png)
*(Figure: The robotic arm’s performance in the PandaReachJointsDense-v3 environment.)*
<p align="center"> ![Evaluation Plot](/images/panda_sb3_eval.png)
*(Figure: The robotic arm’s performance in the PandaReachJointsDense-v3 environment.)*</p>
- **Example Video:**
![Panda-Gym Evaluation Video](a2c_sb3_panda_reach.mp4)
<p align="center"> ![Panda-Gym Evaluation Video](a2c_sb3_panda_reach.mp4) </p>
---
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment