Skip to content
Snippets Groups Projects
Select Git revision
  • main default protected
1 result

mso_3_4-td1

  • Clone with SSH
  • Clone with HTTPS
  • Forked from Dellandrea Emmanuel / MSO_3_4-TD1
    7 commits behind, 7 commits ahead of the upstream repository.
    user avatar
    oscarchaufour authored
    cb94cdbe
    History

    TD 1 : Hands-On Reinforcement Learning

    This TD introduces different algorithms, frameworks and tools used in Reinforcement Learning. The methods are applied to the robotic field: a Cartpole and the PandaReachJointsDense environment.

    REINFORCE

    The REINFORCE algorithm is used to solve the Cartpole environment. The plot showing the total reward accross episodes can be seen below: Alt text The python script used is: reinforce_cartpole.py.

    Familiarization with a complete RL pipeline: Application to training a robotic arm

    Stable-Baselines3 and HuggingFace

    In this section, the Stable-Baselines3 package is used to solve the Cartpole with the Advantage Actor-Critic (A2C) algorithm. The python code used is: a2c_sb3_cartpole.py.

    The trained model is shared on HuggingFace, available on the following link: https://huggingface.co/oscarchaufour/a2c-CartPole-v1

    Weights & Biases

    The Weights & Biases package is used to visualize the taining and the performances of a model. The link to the run visualization on WandB is: https://wandb.ai/oscar-chaufour/a2c-cartpole-v1?workspace=user-oscar-chaufour

    Full workflow with panda-gym

    The full training-visualization-sharing workflow is applied to the PandaReachJointsDense environment.