Skip to content
Snippets Groups Projects
Select Git revision
  • main default protected
1 result

mso_3_4-td1

  • Clone with SSH
  • Clone with HTTPS
  • Forked from Dellandrea Emmanuel / MSO_3_4-TD1
    7 commits behind, 7 commits ahead of the upstream repository.

    TD 1 : Hands-On Reinforcement Learning

    MSO 3.4 Apprentissage Automatique

    Due to local installing issues the code was run on google cloud, google colab platform

    REINFORCE algorithm

    To implement the REINFORCE algorithm on the CartPole environement, two important python modules are used in order to create instances of two different elements:

    gym : module to provide (and charge) the environment. In this case, CartPole-v1 (better described in the following exercise). Gym is a standard API for reinforcement learning, and a diverse collection of reference environments and it is developed by OpenAI.

    torch : module used to create the neural network, which is used as the policy to select actions in the CartPole environment. It uses the REINFORCE algorithm to update the policy parameters. It provides a seamless integration with CUDA, which has enabled the execution of GPU-accelerated computations. It is a very extensive machine learning framework, was originally developed by Meta AI and now part of the Linux Foundation umbrella.

    While running I got an unexpected error while playing an episode :

    /usr/local/lib/python3.10/dist-packages/gymnasium/envs/classic_control/cartpole.py in render(self) 279 gfxdraw.filled_polygon(self.surf, pole_coords, (202, 152, 101)) 280 --> 281 gfxdraw.aacircle( 282 self.surf, 283 int(cartx),

    OverflowError: signed short integer is less than minimum,

    The error pops-up randomly during the episode inside the terminated loop in fact, after sampling the action from the calculated probability distribution. I insert the action into the environement to make a step :
    next_observation, reward, done, a, b = env.step(action.item())

    but then I get the OverflowError which indicate the action.item() which is the action (either 0 or 1) is wrong.

    Advantage Actor-Critic (A2C) algorithm

    In order to explore the A2C algorithm, the Stable-Baselines3 module is used. It provides implementation of state-of-the-art reinforcement learning (RL) algorithms, including DQN, A2C, PPO, and others. It also has built-in support for multi-threading and parallelism (e.g. CUDA), which helps speed up the learning process.

    Link to the trained model : https://huggingface.co/Karim-20/a2c_cartpole/blob/main/ECL-TD-RL1-a2c_cartpole.zip

    PandaReachJointsDense-v2

    requires the module panda-gym, a set of Reinforcement Learning (RL) environments for the Franka Emika Panda robot, integrated with OpenAI Gym. It is a continuous control task where the goal is to reach a target position and orientation with the end-effector of the robot arm while avoiding obstacles. The state of the environment includes the joint angles and velocities of the robot arm, as well as the position and orientation of the end-effector.

    link to wandb training evolution : https://wandb.ai/aiblackbelt/sb3-panda-reach/runs/ihcoeovn?workspace=user-aiblackbelt

    link to the trained model : https://huggingface.co/Karim-20/a2c_cartpole/blob/main/ECL-TD-RL1-a2c_panda_reach.zip