Skip to content
Snippets Groups Projects
Select Git revision
  • main default protected
1 result

mso_3_4-td1

  • Clone with SSH
  • Clone with HTTPS
  • Forked from Dellandrea Emmanuel / MSO_3_4-TD1
    8 commits ahead of the upstream repository.
    user avatar
    number_cruncher authored
    d08231bc
    History

    TD 1 : Hands-On Reinforcement Learning

    MSO 3.4 Apprentissage Automatique

    REINFORCE

    The REINFORCE algorithm (also known as Vanilla Policy Gradient) is a policy gradient method that optimizes the policy directly using gradient descent. The following is the pseudocode of the REINFORCE algorithm:

    🛠️ To be handed in Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in reinforce_cartpole.py, and share a plot showing the total reward accross episodes in the README.md. Also, share a file reinforce_cartpole.pth containing the learned weights. For saving and loading PyTorch models, check this tutorial REINFORCE CartPole

    Model Evaluation

    Now that you have trained your model, it is time to evaluate its performance. Run it with rendering for a few trials and see if the policy is capable of completing the task.

    🛠️ To be handed in Implement a script which loads your saved model and use it to solve the cartpole enviroment. Run 100 evaluations and share the final success rate across all evaluations in the README.md. Share the code in evaluate_reinforce_cartpole.py.

    From the openai gym wiki we know that the environment counts as solved when the average reward is greater or equal to 195 for over 100 consecutuve trials. From the evaluation script i used the success rate is 1.0 when we allow the maximum number of steps the environment offers.

    Familiarization with a complete RL pipeline: Application to training a robotic arm

    Stable-Baselines3 (SB3) is a high-level RL library that provides various algorithms and integrated tools to easily train and test reinforcement learning models.

    🛠️ To be handed in Store the code in a2c_sb3_cartpole.py. Unless otherwise stated, you'll work upon this file for the next sections.

    🛠️ To be handed in Link the trained model in the README.md file.

    🛠️ Share the link of the wandb run in the README.md file.

    wandb: https://wandb.ai/lennartecl-centrale-lyon/sb3?nw=nwuserlennartecl

    huggingface: https://huggingface.co/lennartoe/Cartpole-v1/tree/main

    Full workflow with panda-gym

    Panda-gym is a collection of environments for robotic simulation and control. It provides a range of challenges for training robotic agents in a simulated environment. In this section, you will get familiar with one of the environments provided by panda-gym, the PandaReachJointsDense-v3. The objective is to learn how to reach any point in 3D space by directly controlling the robot's articulations.

    🛠️ To be handed in Share all the code in a2c_sb3_panda_reach.py. Share the link of the wandb run and the trained model in the README.md file.

    wandb: https://wandb.ai/lennartecl-centrale-lyon/pandasgym_sb3?nw=nwuserlennartecl

    huggingface: https://huggingface.co/lennartoe/PandaReachJointsDense-v3/tree/main