diff --git a/README.md b/README.md index 8e265d29f71515e778eb3870da551f3ad9625901..8dd70150f2f29f06e25a72f57e1fed30b0d7dbf7 100644 --- a/README.md +++ b/README.md @@ -14,24 +14,7 @@ gym : module to provide (and charge) the environment. In this case, CartPole-v1 torch : module used to create the neural network, which is used as the policy to select actions in the CartPole environment. It uses the REINFORCE algorithm to update the policy parameters. It provides a seamless integration with CUDA, which has enabled the execution of GPU-accelerated computations. It is a very extensive machine learning framework, was originally developed by Meta AI and now part of the Linux Foundation umbrella. -While running I got an unexpected error while playing an episode : - -/usr/local/lib/python3.10/dist-packages/gymnasium/envs/classic_control/cartpole.py in render(self) - 279 gfxdraw.filled_polygon(self.surf, pole_coords, (202, 152, 101)) - 280 ---> 281 gfxdraw.aacircle( - 282 self.surf, - 283 int(cartx), - -OverflowError: signed short integer is less than minimum, - -The error pops-up randomly during the episode inside the terminated loop in fact, after sampling the action from the calculated probability distribution. I insert the action into the environement to make a step : -next_observation, reward, done, a, b = env.step(action.item()) - -but then I get the OverflowError which indicate the action.item() which is the action (either 0 or 1) is wrong. - - - +The file LOSS.pnj show how the policy loss error is optimized throughout the iterations. We notice that the loss oscilates considerably during the optimization process. It show noisy and rapid variations until the end of the process where the loss seems to decrease significantly since its peaks at that point are the lowest among the previous ones. # Advantage Actor-Critic (A2C) algorithm