TD 1 : Hands-On Reinforcement Learning
MSO 3.4 Apprentissage Automatique
Due to local installing issues the code was run on google cloud, google colab platform
REINFORCE algorithm
To implement the REINFORCE algorithm on the CartPole environement, two important python modules are used in order to create instances of two different elements:
gym : module to provide (and charge) the environment. In this case, CartPole-v1 (better described in the following exercise). Gym is a standard API for reinforcement learning, and a diverse collection of reference environments and it is developed by OpenAI.
torch : module used to create the neural network, which is used as the policy to select actions in the CartPole environment. It uses the REINFORCE algorithm to update the policy parameters. It provides a seamless integration with CUDA, which has enabled the execution of GPU-accelerated computations. It is a very extensive machine learning framework, was originally developed by Meta AI and now part of the Linux Foundation umbrella.
While running I got an unexpected error while playing an episode :
/usr/local/lib/python3.10/dist-packages/gymnasium/envs/classic_control/cartpole.py in render(self) 279 gfxdraw.filled_polygon(self.surf, pole_coords, (202, 152, 101)) 280 --> 281 gfxdraw.aacircle( 282 self.surf, 283 int(cartx),
OverflowError: signed short integer is less than minimum,
The error pops-up randomly during the episode inside the terminated loop in fact, after sampling the action from the calculated probability distribution. I insert the action into the environement to make a step :
next_observation, reward, done, a, b = env.step(action.item())
but then I get the OverflowError which indicate the action.item() which is the action (either 0 or 1) is wrong.
Advantage Actor-Critic (A2C) algorithm
In order to explore the A2C algorithm, the Stable-Baselines3 module is used. It provides implementation of state-of-the-art reinforcement learning (RL) algorithms, including DQN, A2C, PPO, and others. It also has built-in support for multi-threading and parallelism (e.g. CUDA), which helps speed up the learning process.
Link to the trained model : https://huggingface.co/Karim-20/a2c_cartpole/blob/main/ECL-TD-RL1-a2c_cartpole.zip
PandaReachJointsDense-v2
requires the module panda-gym, a set of Reinforcement Learning (RL) environments for the Franka Emika Panda robot, integrated with OpenAI Gym. It is a continuous control task where the goal is to reach a target position and orientation with the end-effector of the robot arm while avoiding obstacles. The state of the environment includes the joint angles and velocities of the robot arm, as well as the position and orientation of the end-effector.
link to wandb training evolution : https://wandb.ai/aiblackbelt/sb3-panda-reach/runs/ihcoeovn?workspace=user-aiblackbelt
link to the trained model : https://huggingface.co/Karim-20/a2c_cartpole/blob/main/ECL-TD-RL1-a2c_panda_reach.zip