@@ -13,8 +13,8 @@ This repository contains my individual work for the **Hands-On Reinforcement Lea
### Training Results
- The model was trained for **500 episodes**, showing a steady increase in total rewards. The goal (total reward = 500) was reached consistently after **400 episodes**, confirming successful learning.
-**Training Plot:**

*(Figure: Total rewards increase per episode, indicating successful learning.)*
@@ -42,14 +42,14 @@ This repository contains my individual work for the **Hands-On Reinforcement Lea
### Training Results
- The model was trained for **500,000 timesteps**, reaching a total reward of **500** consistently after **400 episodes**. It continued training for **1,400 episodes**, confirming stable convergence similar to the REINFORCE approach.
-**Training Plot:**

*(Figure: A2C training performance over time.)*
<palign="center">
*(Figure: A2C training performance over time.)*</p>
### Evaluation
- The trained model was evaluated, achieving **100% success**, with all episodes reaching a total reward of **500**.