Hands-On Reinforcement Learning
In this hands-on project, we will first implement a simple RL algorithm and apply it to solve the CartPole-v1 environment. Once we become familiar with the basic workflow, we will learn to use various tools for machine learning model training, monitoring, and sharing, by applying these tools to train a robotic arm.
REINFORCE
The REINFORCE algorithm (also known as Vanilla Policy Gradient) is a policy gradient method that optimizes the policy directly using gradient descent.
The figure below represents rewards over episodes.
Familiarization with a complete RL pipeline: Application to training a robotic arm
In this section, we will use the Stable-Baselines3 package to train a robotic arm using RL.
Stable-Baselines3 (SB3) is a high-level RL library that provides various algorithms and integrated tools to easily train and test reinforcement learning models.
The trained model in HuggingFace: model
Weights and Bias: Wandb
Full workflow with panda-gym
The objective is to learn how to reach any point in 3D space by directly controlling the robot's articulations; using the environment PandaReachJointsDense-v2
The trained model in HuggingFace: model
Weights and Bias: Wandb