Skip to content
Snippets Groups Projects
Commit de18e83b authored by Majdi Karim's avatar Majdi Karim
Browse files

Update README.md

parent 52bd513e
Branches
No related tags found
No related merge requests found
...@@ -14,24 +14,7 @@ gym : module to provide (and charge) the environment. In this case, CartPole-v1 ...@@ -14,24 +14,7 @@ gym : module to provide (and charge) the environment. In this case, CartPole-v1
torch : module used to create the neural network, which is used as the policy to select actions in the CartPole environment. It uses the REINFORCE algorithm to update the policy parameters. It provides a seamless integration with CUDA, which has enabled the execution of GPU-accelerated computations. It is a very extensive machine learning framework, was originally developed by Meta AI and now part of the Linux Foundation umbrella. torch : module used to create the neural network, which is used as the policy to select actions in the CartPole environment. It uses the REINFORCE algorithm to update the policy parameters. It provides a seamless integration with CUDA, which has enabled the execution of GPU-accelerated computations. It is a very extensive machine learning framework, was originally developed by Meta AI and now part of the Linux Foundation umbrella.
While running I got an unexpected error while playing an episode : The file LOSS.pnj show how the policy loss error is optimized throughout the iterations. We notice that the loss oscilates considerably during the optimization process. It show noisy and rapid variations until the end of the process where the loss seems to decrease significantly since its peaks at that point are the lowest among the previous ones.
/usr/local/lib/python3.10/dist-packages/gymnasium/envs/classic_control/cartpole.py in render(self)
279 gfxdraw.filled_polygon(self.surf, pole_coords, (202, 152, 101))
280
--> 281 gfxdraw.aacircle(
282 self.surf,
283 int(cartx),
OverflowError: signed short integer is less than minimum,
The error pops-up randomly during the episode inside the terminated loop in fact, after sampling the action from the calculated probability distribution. I insert the action into the environement to make a step :
next_observation, reward, done, a, b = env.step(action.item())
but then I get the OverflowError which indicate the action.item() which is the action (either 0 or 1) is wrong.
# Advantage Actor-Critic (A2C) algorithm # Advantage Actor-Critic (A2C) algorithm
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment