Skip to content
Snippets Groups Projects
Select Git revision
  • main default protected
1 result

hands-on-rl

  • Clone with SSH
  • Clone with HTTPS
  • Forked from an inaccessible project.

    Hands-On Reinforcement Learning

    In this hands-on project, we will first implement a simple RL algorithm and apply it to solve the CartPole-v1 environment. Once we become familiar with the basic workflow, we will learn to use various tools for machine learning model training, monitoring, and sharing, by applying these tools to train a robotic arm.

    REINFORCE

    The REINFORCE algorithm (also known as Vanilla Policy Gradient) is a policy gradient method that optimizes the policy directly using gradient descent.

    The figure below represents rewards over episodes.

    alt text

    Familiarization with a complete RL pipeline: Application to training a robotic arm

    In this section, we will use the Stable-Baselines3 package to train a robotic arm using RL.

    Stable-Baselines3 (SB3) is a high-level RL library that provides various algorithms and integrated tools to easily train and test reinforcement learning models.

    The trained model in HuggingFace: model

    Weights and Bias: Wandb

    Full workflow with panda-gym

    The objective is to learn how to reach any point in 3D space by directly controlling the robot's articulations; using the environment PandaReachJointsDense-v2

    The trained model in HuggingFace: model

    Weights and Bias: Wandb