Skip to content
Snippets Groups Projects
Select Git revision
  • main default protected
1 result

mso_3_4-td1

  • Clone with SSH
  • Clone with HTTPS
  • Forked from Dellandrea Emmanuel / MSO_3_4-TD1
    This fork has diverged from the upstream repository.
    user avatar
    oscarchaufour authored
    fb37266a
    History

    TD 1 : Hands-On Reinforcement Learning

    This TD introduces different algorithms, frameworks and tools used in Reinforcement Learning. The methods are applied to the robotic field: a Cartpole and the PandaReachJointsDense environment.

    Files list

    This repo contains several files:

    • images: the images displayed in the README file
    • reinforce_cartpole.py: python script for the REINFORCE section
    • a2c_sb3_cartpole.py: python script for the Familiarization with a complete RL pipeline section
    • a2c_sb3_panda_reach.py: python script for the Full workflow with panda-gym section

    Use

    A Python installation is needed to run the scripts.

    Install the following Python packages:

    pip install gym
    pip install stable_baselines3 
    pip install tqdm
    pip install wandb
    pip install moviepy
    pip install huggingface-sb3
    pip install tensorboard
    pip install panda-gym
    pip install dill
    pip install zipfile
    

    Then run the scrpits in your command prompt:

    python "path_to_your_python_script"

    REINFORCE

    The REINFORCE algorithm is used to solve the Cartpole environment. The plot showing the total reward accross episodes can be seen below: Alt text

    The python script used is: reinforce_cartpole.py.

    Familiarization with a complete RL pipeline: Application to training a robotic arm

    Stable-Baselines3 and HuggingFace

    In this section, the Stable-Baselines3 package is used to solve the Cartpole with the Advantage Actor-Critic (A2C) algorithm.

    The python code used is: a2c_sb3_cartpole.py.

    The trained model is shared on HuggingFace, available on the following link: https://huggingface.co/oscarchaufour/a2c-CartPole-v1

    Weights & Biases

    The Weights & Biases package is used to visualize the taining and the performances of a model. The link to the run visualization on WandB is: https://wandb.ai/oscar-chaufour/a2c-cartpole-v1?workspace=user-oscar-chaufour

    The evolution of certain metrics during the training can be visualized. For example the policy loss at each step can be seen below: Alt text

    The policy loss follows a decreasing trend, which is coherent to the model learning during the training phase.

    Full workflow with panda-gym

    The full training-visualization-sharing workflow is applied to the PandaReachJointsDense environment. The python script used is: a2c_sb3_panda_reach.py. It appears that the PandaReachJointsDense-v3 environment is not known and could not be used, as mentionned in the following error:

    ---------------------------------------------------------------------------
    NameNotFound                              Traceback (most recent call last)
    <ipython-input-5-e21f3cb1d225> in <cell line: 21>()
         19 register(id=env_id, entry_point='gym.envs.classic_control:CartPoleEnv', max_episode_steps=500)
         20 
    ---> 21 env = gym.make(env_id)
         22 
         23 model = A2C("MlpPolicy", env, verbose=1, tensorboard_log=f"runs/{run.id}")
    
    2 frames
    /usr/local/lib/python3.10/dist-packages/gym/envs/registration.py in _check_name_exists(ns, name)
        210     suggestion_msg = f"Did you mean: `{suggestion[0]}`?" if suggestion else ""
        211 
    --> 212     raise error.NameNotFound(
        213         f"Environment {name} doesn't exist{namespace_msg}. {suggestion_msg}"
        214     )
    
    NameNotFound: Environment PandaPushJointsDense doesn't exist. 

    Contribute

    This tutorial may contain errors, inaccuracies, typos or areas for improvement. Feel free to contribute to its improvement by opening an issue.

    Author

    Oscar Chaufour

    License

    MIT