From de18e83b8f74b7e66ee48740bca41d3a555783c8 Mon Sep 17 00:00:00 2001
From: Majdi Karim <karim.majdi@etu.ec-lyon.fr>
Date: Tue, 19 Mar 2024 23:51:28 +0000
Subject: [PATCH] Update README.md

---
 README.md | 19 +------------------
 1 file changed, 1 insertion(+), 18 deletions(-)

diff --git a/README.md b/README.md
index 8e265d2..8dd7015 100644
--- a/README.md
+++ b/README.md
@@ -14,24 +14,7 @@ gym : module to provide (and charge) the environment. In this case, CartPole-v1
 
 torch : module used to create the neural network, which is used as the policy to select actions in the CartPole environment. It uses the REINFORCE algorithm to update the policy parameters. It provides a seamless integration with CUDA, which has enabled the execution of GPU-accelerated computations. It is a very extensive machine learning framework, was originally developed by Meta AI and now part of the Linux Foundation umbrella.
 
-While running I got an unexpected error while playing an episode :
-
-/usr/local/lib/python3.10/dist-packages/gymnasium/envs/classic_control/cartpole.py in render(self)
-    279         gfxdraw.filled_polygon(self.surf, pole_coords, (202, 152, 101))
-    280 
---> 281         gfxdraw.aacircle(
-    282             self.surf,
-    283             int(cartx),
-
-OverflowError: signed short integer is less than minimum, 
-
-The error pops-up randomly during the episode inside the terminated loop in fact, after sampling the action from the calculated probability distribution. I insert the action into the environement to make a step :    
-next_observation, reward, done, a, b = env.step(action.item())
-
-but then I get the OverflowError which indicate the action.item() which is the action (either 0 or 1) is wrong.
-
-
-
+The file LOSS.pnj show how the policy loss error is optimized throughout the iterations. We notice that the loss oscilates considerably during the optimization process. It show noisy and rapid variations until the end of the process where the loss seems to decrease significantly since its peaks at that point are the lowest among the previous ones.
 
 # Advantage Actor-Critic (A2C) algorithm
 
-- 
GitLab