"In this hands-on project, we will first implement a simple RL algorithm and apply it to solve the CartPole-v1 environment. Once we become familiar with the basic workflow, we will learn to use various tools for machine learning model training, monitoring, and sharing, by applying these tools to train a robotic arm.\n",
"\n",
"## To be handed in\n",
"\n",
"This work must be done individually. The expected output is a repository named `hands-on-rl` on https://gitlab.ec-lyon.fr. \n",
"\n",
"We assume that `git` is installed, and that you are familiar with the basic `git` commands. (Optionnaly, you can use GitHub Desktop.)\n",
"We also assume that you have access to the [ECL GitLab](https://gitlab.ec-lyon.fr/). If necessary, please consult [this tutorial](https://gitlab.ec-lyon.fr/edelland/inf_tc2/-/blob/main/Tutoriel_gitlab/tutoriel_gitlab.md).\n",
"\n",
"Your repository must contain a `README.md` file that explains **briefly** the successive steps of the project. It must be private, so you need to add your teacher as \"developer\" member.\n",
"\n",
"Throughout the subject, you will find a 🛠 symbol indicating that a specific production is expected.\n",
"\n",
"The last commit is due before 11:59 pm on March 5, 2024. Subsequent commits will not be considered.\n",
"\n",
"> ⚠️ **Warning**\n",
"> Ensure that you only commit the files that are requested. For example, your directory should not contain the generated `.zip` files, nor the `runs` folder... At the end, your repository must contain one `README.md`, three python scripts, and optionally image files for the plots.\n",
"\n",
"## Before you start\n",
"\n",
"Make sure you know the basics of Reinforcement Learning. In case of need, you can refer to the [introduction of the Hugging Face RL course](https://huggingface.co/blog/deep-rl-intro).\n",
"\n",
"## Introduction to Gym\n",
"\n",
"[Gym](https://gymnasium.farama.org/) is a framework for developing and evaluating reinforcement learning environments. It offers various environments, including classic control and toy text scenarios, to test RL algorithms.\n",
"\n",
"### Installation\n",
"\n",
"We recommend to use Python virtual environnements to install the required modules : https://docs.python.org/3/library/venv.html\n",
"Requirement already satisfied: numpy==1.25.0 in c:\\users\\coren\\anaconda3\\lib\\site-packages (1.25.0)\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"pip install numpy==1.25.0 --user"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "bb4d7c39",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: gym==0.26.2 in c:\\users\\coren\\anaconda3\\lib\\site-packages (0.26.2)\n",
"Requirement already satisfied: importlib-metadata>=4.8.0 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from gym==0.26.2) (4.8.1)\n",
"Requirement already satisfied: numpy>=1.18.0 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from gym==0.26.2) (1.25.0)\n",
"Requirement already satisfied: cloudpickle>=1.2.0 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from gym==0.26.2) (2.0.0)\n",
"Requirement already satisfied: gym-notices>=0.0.4 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from gym==0.26.2) (0.0.8)\n",
"Requirement already satisfied: zipp>=0.5 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from importlib-metadata>=4.8.0->gym==0.26.2) (3.6.0)\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"pip install gym==0.26.2"
]
},
{
"cell_type": "markdown",
"id": "c4339dd2",
"metadata": {},
"source": [
"Install also pyglet for the rendering."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "ae74426f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: pyglet==2.0.10 in c:\\users\\coren\\anaconda3\\lib\\site-packages (2.0.10)\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"pip install pyglet==2.0.10"
]
},
{
"cell_type": "markdown",
"id": "d6a2d90b",
"metadata": {},
"source": [
"If needed "
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "712fb75a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: pygame==2.5.2 in c:\\users\\coren\\anaconda3\\lib\\site-packages (2.5.2)\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"pip install pygame==2.5.2"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "3cdd7bcc",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: PyQt5 in c:\\users\\coren\\anaconda3\\lib\\site-packages (5.15.10)\n",
"Requirement already satisfied: PyQt5-Qt5>=5.15.2 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from PyQt5) (5.15.2)\n",
"Note: you may need to restart the kernel to use updated packages.Requirement already satisfied: PyQt5-sip<13,>=12.13 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from PyQt5) (12.13.0)\n",
"\n"
]
}
],
"source": [
"pip install PyQt5"
]
},
{
"cell_type": "markdown",
"id": "79581e82",
"metadata": {},
"source": [
"### Usage\n",
"\n",
"Here is an example of how to use Gym to solve the `CartPole-v1` environment [Documentation](https://gymnasium.farama.org/environments/classic_control/cart_pole/):"
"# Reset the environment and get the initial observation\n",
"observation = env.reset()\n",
"\n",
"for _ in range(1000):\n",
" # Select a random action from the action space\n",
" action = env.action_space.sample()\n",
" # Apply the action to the environment\n",
" # Returns next observation, reward, done signal (indicating\n",
" # if the episode has ended), and an additional info dictionary\n",
" observation, reward, terminated, truncated, info = env.step(action)\n",
" # Render the environment to visualize the agent's behavior\n",
" env.render()\n",
" if terminated: \n",
" # Terminated before max step\n",
" break\n",
"\n",
"env.close()"
]
},
{
"cell_type": "markdown",
"id": "fea5e4cf",
"metadata": {},
"source": [
"## REINFORCE\n",
"\n",
"The REINFORCE algorithm (also known as Vanilla Policy Gradient) is a policy gradient method that optimizes the policy directly using gradient descent. The following is the pseudocode of the REINFORCE algorithm:\n",
"\n",
"```txt\n",
"Setup the CartPole environment\n",
"Setup the agent as a simple neural network with:\n",
" - One fully connected layer with 128 units and ReLU activation followed by a dropout layer\n",
" - One fully connected layer followed by softmax activation\n",
"Repeat 500 times:\n",
" Reset the environment\n",
" Reset the buffer\n",
" Repeat until the end of the episode:\n",
" Compute action probabilities \n",
" Sample the action based on the probabilities and store its probability in the buffer \n",
" Step the environment with the action\n",
" Compute and store in the buffer the return using gamma=0.99 \n",
" Normalize the return\n",
" Compute the policy loss as -sum(log(prob) * return)\n",
" Update the policy using an Adam optimizer and a learning rate of 5e-3\n",
"```\n",
"\n",
"To learn more about REINFORCE, you can refer to [this unit](https://huggingface.co/learn/deep-rl-course/unit4/introduction).\n",
"\n",
"> 🛠 **To be handed in**\n",
"> Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in `reinforce_cartpole.py`, and share a plot showing the total reward accross episodes in the `README.md`.\n",
"\n",
"## Familiarization with a complete RL pipeline: Application to training a robotic arm\n",
"\n",
"In this section, you will use the Stable-Baselines3 package to train a robotic arm using RL. You'll get familiar with several widely-used tools for training, monitoring and sharing machine learning models.\n",
"\n",
"### Get familiar with Stable-Baselines3\n",
"\n",
"Stable-Baselines3 (SB3) is a high-level RL library that provides various algorithms and integrated tools to easily train and test reinforcement learning models.\n",
"\n",
"#### Installation"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "7b5d4e63",
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Collecting stable-baselines3\n",
" Using cached stable_baselines3-2.2.1-py3-none-any.whl (181 kB)\n",
"Requirement already satisfied: matplotlib in c:\\users\\coren\\anaconda3\\lib\\site-packages (from stable-baselines3) (3.4.3)\n",
"Requirement already satisfied: numpy>=1.20 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from stable-baselines3) (1.26.4)\n",
"Requirement already satisfied: pandas in c:\\users\\coren\\anaconda3\\lib\\site-packages (from stable-baselines3) (1.3.4)\n",
"Requirement already satisfied: torch>=1.13 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from stable-baselines3) (2.1.1)\n",
"Requirement already satisfied: cloudpickle in c:\\users\\coren\\anaconda3\\lib\\site-packages (from stable-baselines3) (2.0.0)\n",
"Collecting gymnasium<0.30,>=0.28.1\n",
" Using cached gymnasium-0.29.1-py3-none-any.whl (953 kB)\n",
"Collecting farama-notifications>=0.0.1\n",
" Using cached Farama_Notifications-0.0.4-py3-none-any.whl (2.5 kB)\n",
"Requirement already satisfied: importlib-metadata>=4.8.0 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from gymnasium<0.30,>=0.28.1->stable-baselines3) (4.8.1)\n",
"Requirement already satisfied: typing-extensions>=4.3.0 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from gymnasium<0.30,>=0.28.1->stable-baselines3) (4.4.0)\n",
"Requirement already satisfied: zipp>=0.5 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from importlib-metadata>=4.8.0->gymnasium<0.30,>=0.28.1->stable-baselines3) (3.6.0)\n",
"Requirement already satisfied: jinja2 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from torch>=1.13->stable-baselines3) (2.11.3)\n",
"Requirement already satisfied: sympy in c:\\users\\coren\\anaconda3\\lib\\site-packages (from torch>=1.13->stable-baselines3) (1.9)\n",
"Requirement already satisfied: fsspec in c:\\users\\coren\\anaconda3\\lib\\site-packages (from torch>=1.13->stable-baselines3) (2021.10.1)\n",
"Requirement already satisfied: networkx in c:\\users\\coren\\anaconda3\\lib\\site-packages (from torch>=1.13->stable-baselines3) (2.6.3)\n",
"Requirement already satisfied: filelock in c:\\users\\coren\\anaconda3\\lib\\site-packages (from torch>=1.13->stable-baselines3) (3.3.1)\n",
"Requirement already satisfied: MarkupSafe>=0.23 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from jinja2->torch>=1.13->stable-baselines3) (1.1.1)\n",
"Requirement already satisfied: python-dateutil>=2.7 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from matplotlib->stable-baselines3) (2.8.2)\n",
"Requirement already satisfied: pillow>=6.2.0 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from matplotlib->stable-baselines3) (8.4.0)\n",
"Requirement already satisfied: pyparsing>=2.2.1 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from matplotlib->stable-baselines3) (3.0.4)\n",
"Requirement already satisfied: kiwisolver>=1.0.1 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from matplotlib->stable-baselines3) (1.3.1)\n",
"Requirement already satisfied: cycler>=0.10 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from matplotlib->stable-baselines3) (0.10.0)\n",
"Requirement already satisfied: six in c:\\users\\coren\\anaconda3\\lib\\site-packages (from cycler>=0.10->matplotlib->stable-baselines3) (1.16.0)\n",
"Requirement already satisfied: pytz>=2017.3 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from pandas->stable-baselines3) (2021.3)\n",
"Requirement already satisfied: mpmath>=0.19 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from sympy->torch>=1.13->stable-baselines3) (1.2.1)\n",
" obs, reward, done, info = vec_env.step(action)\n",
" #vec_env.render(\"human\")\n",
" # VecEnv resets automatically\n",
" # if done:\n",
" # obs = vec_env.reset()\n",
"env.close()"
]
},
{
"cell_type": "markdown",
"id": "70d3cc66",
"metadata": {},
"source": [
"#### Usage\n",
"\n",
"Use the [Stable-Baselines3 documentation](https://stable-baselines3.readthedocs.io/en/master/) to implement the code to solve the CartPole environment with the Advantage Actor-Critic (A2C) algorithm.\n",
"\n",
"\n",
"> 🛠 **To be handed in**\n",
"> Store the code in `a2c_sb3_cartpole.py`. Unless otherwise stated, you'll work upon this file for the next sections.\n",
"\n",
"### Get familiar with Hugging Face Hub\n",
"\n",
"Hugging Face Hub is a platform for easy sharing and versioning of trained machine learning models. With Hugging Face Hub, you can quickly and easily share your models with others and make them usable through the API. For example, see the trained A2C agent for CartPole: https://huggingface.co/sb3/a2c-CartPole-v1. Hugging Face Hub provides an API to download and upload SB3 models.\n",
"\n",
"#### Installation of `huggingface_sb3`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cd890835",
"metadata": {},
"outputs": [],
"source": [
"pip install huggingface-sb3==2.3.1"
]
},
{
"cell_type": "markdown",
"id": "0d0bef5b",
"metadata": {},
"source": [
"#### Upload the model on the Hub\n",
"\n",
"Follow the [Hugging Face Hub documentation](https://huggingface.co/docs/hub/stable-baselines3) to upload the previously learned model to the Hub.\n",
"\n",
"> 🛠 **To be handed in**\n",
"> Link the trained model in the `README.md` file.\n",
"\n",
"> 📝 **Note**\n",
"> [RL-Zoo3](https://stable-baselines3.readthedocs.io/en/master/guide/rl_zoo.html) provides more advanced features to save hyperparameters, generate renderings and metrics. Feel free to try them.\n",
"\n",
"### Get familiar with Weights & Biases\n",
"\n",
"Weights & Biases (W&B) is a tool for machine learning experiment management. With W&B, you can track and compare your experiments, visualize your model training and performance.\n",
"\n",
"#### Installation\n",
"\n",
"You'll need to install both `wand` and `tensorboar`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6645c23a",
"metadata": {},
"outputs": [],
"source": [
"pip install wandb tensorboard"
]
},
{
"cell_type": "markdown",
"id": "9af0d167",
"metadata": {},
"source": [
"Use the documentation of [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) and [Weights & Biases](https://docs.wandb.ai/guides/integrations/stable-baselines-3) to track the CartPole training. Make the run public.\n",
"\n",
"🛠 Share the link of the wandb run in the `README.md` file.\n",
"\n",
"> ⚠️ **Warning**\n",
"> Make sure to make the run public!\n",
"\n",
"### Full workflow with panda-gym\n",
"\n",
"[Panda-gym](https://github.com/qgallouedec/panda-gym) is a collection of environments for robotic simulation and control. It provides a range of challenges for training robotic agents in a simulated environment. In this section, you will get familiar with one of the environments provided by panda-gym, the `PandaReachJointsDense-v3`. The objective is to learn how to reach any point in 3D space by directly controlling the robot's articulations.\n",
"\n",
"#### Installation"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "897573a8",
"metadata": {},
"outputs": [],
"source": [
"pip install panda-gym==3.0.7"
]
},
{
"cell_type": "markdown",
"id": "443008be",
"metadata": {},
"source": [
"#### Train, track, and share\n",
"\n",
"Use the Stable-Baselines3 package to train A2C model on the `PandaReachJointsDense-v2` environment. 500k timesteps should be enough. Track the environment with Weights & Biases. Once the training is over, upload the trained model on the Hub.\n",
"\n",
"> 🛠 **To be handed in**\n",
"> Share all the code in `a2c_sb3_panda_reach.py`. Share the link of the wandb run and the trained model in the `README.md` file.\n",
"\n",
"## Contribute\n",
"\n",
"This tutorial may contain errors, inaccuracies, typos or areas for improvement. Feel free to contribute to its improvement by opening an issue.\n",
"\n",
"## Author\n",
"\n",
"Quentin Gallouédec\n",
"\n",
"Updates by Léo Schneider, Emmanuel Dellandréa\n",
"\n",
"## License\n",
"\n",
"MIT"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
%% Cell type:markdown id:e687aca8 tags:
GEREST CORENTIN
# TD 1 : Hands-On Reinforcement Learning
MSO 3.4 Apprentissage Automatique
In this hands-on project, we will first implement a simple RL algorithm and apply it to solve the CartPole-v1 environment. Once we become familiar with the basic workflow, we will learn to use various tools for machine learning model training, monitoring, and sharing, by applying these tools to train a robotic arm.
## To be handed in
This work must be done individually. The expected output is a repository named `hands-on-rl` on https://gitlab.ec-lyon.fr.
We assume that `git` is installed, and that you are familiar with the basic `git` commands. (Optionnaly, you can use GitHub Desktop.)
We also assume that you have access to the [ECL GitLab](https://gitlab.ec-lyon.fr/). If necessary, please consult [this tutorial](https://gitlab.ec-lyon.fr/edelland/inf_tc2/-/blob/main/Tutoriel_gitlab/tutoriel_gitlab.md).
Your repository must contain a `README.md` file that explains **briefly** the successive steps of the project. It must be private, so you need to add your teacher as "developer" member.
Throughout the subject, you will find a 🛠 symbol indicating that a specific production is expected.
The last commit is due before 11:59 pm on March 5, 2024. Subsequent commits will not be considered.
> ⚠️ **Warning**
> Ensure that you only commit the files that are requested. For example, your directory should not contain the generated `.zip` files, nor the `runs` folder... At the end, your repository must contain one `README.md`, three python scripts, and optionally image files for the plots.
## Before you start
Make sure you know the basics of Reinforcement Learning. In case of need, you can refer to the [introduction of the Hugging Face RL course](https://huggingface.co/blog/deep-rl-intro).
## Introduction to Gym
[Gym](https://gymnasium.farama.org/) is a framework for developing and evaluating reinforcement learning environments. It offers various environments, including classic control and toy text scenarios, to test RL algorithms.
### Installation
We recommend to use Python virtual environnements to install the required modules : https://docs.python.org/3/library/venv.html
Requirement already satisfied: numpy==1.25.0 in c:\users\coren\anaconda3\lib\site-packages (1.25.0)
Note: you may need to restart the kernel to use updated packages.
%% Cell type:code id:bb4d7c39 tags:
``` python
pipinstallgym==0.26.2
```
%% Output
Requirement already satisfied: gym==0.26.2 in c:\users\coren\anaconda3\lib\site-packages (0.26.2)
Requirement already satisfied: importlib-metadata>=4.8.0 in c:\users\coren\anaconda3\lib\site-packages (from gym==0.26.2) (4.8.1)
Requirement already satisfied: numpy>=1.18.0 in c:\users\coren\anaconda3\lib\site-packages (from gym==0.26.2) (1.25.0)
Requirement already satisfied: cloudpickle>=1.2.0 in c:\users\coren\anaconda3\lib\site-packages (from gym==0.26.2) (2.0.0)
Requirement already satisfied: gym-notices>=0.0.4 in c:\users\coren\anaconda3\lib\site-packages (from gym==0.26.2) (0.0.8)
Requirement already satisfied: zipp>=0.5 in c:\users\coren\anaconda3\lib\site-packages (from importlib-metadata>=4.8.0->gym==0.26.2) (3.6.0)
Note: you may need to restart the kernel to use updated packages.
%% Cell type:markdown id:c4339dd2 tags:
Install also pyglet for the rendering.
%% Cell type:code id:ae74426f tags:
``` python
pipinstallpyglet==2.0.10
```
%% Output
Requirement already satisfied: pyglet==2.0.10 in c:\users\coren\anaconda3\lib\site-packages (2.0.10)
Note: you may need to restart the kernel to use updated packages.
%% Cell type:markdown id:d6a2d90b tags:
If needed
%% Cell type:code id:712fb75a tags:
``` python
pipinstallpygame==2.5.2
```
%% Output
Requirement already satisfied: pygame==2.5.2 in c:\users\coren\anaconda3\lib\site-packages (2.5.2)
Note: you may need to restart the kernel to use updated packages.
%% Cell type:code id:3cdd7bcc tags:
``` python
pipinstallPyQt5
```
%% Output
Requirement already satisfied: PyQt5 in c:\users\coren\anaconda3\lib\site-packages (5.15.10)
Requirement already satisfied: PyQt5-Qt5>=5.15.2 in c:\users\coren\anaconda3\lib\site-packages (from PyQt5) (5.15.2)
Note: you may need to restart the kernel to use updated packages.Requirement already satisfied: PyQt5-sip<13,>=12.13 in c:\users\coren\anaconda3\lib\site-packages (from PyQt5) (12.13.0)
%% Cell type:markdown id:79581e82 tags:
### Usage
Here is an example of how to use Gym to solve the `CartPole-v1` environment [Documentation](https://gymnasium.farama.org/environments/classic_control/cart_pole/):
%% Cell type:code id:800853bd tags:
``` python
importgym
# Create the environment
env=gym.make("CartPole-v1",render_mode="human")
# Reset the environment and get the initial observation
observation=env.reset()
for_inrange(1000):
# Select a random action from the action space
action=env.action_space.sample()
# Apply the action to the environment
# Returns next observation, reward, done signal (indicating
# if the episode has ended), and an additional info dictionary
# Render the environment to visualize the agent's behavior
env.render()
ifterminated:
# Terminated before max step
break
env.close()
```
%% Cell type:markdown id:fea5e4cf tags:
## REINFORCE
The REINFORCE algorithm (also known as Vanilla Policy Gradient) is a policy gradient method that optimizes the policy directly using gradient descent. The following is the pseudocode of the REINFORCE algorithm:
```txt
Setup the CartPole environment
Setup the agent as a simple neural network with:
- One fully connected layer with 128 units and ReLU activation followed by a dropout layer
- One fully connected layer followed by softmax activation
Repeat 500 times:
Reset the environment
Reset the buffer
Repeat until the end of the episode:
Compute action probabilities
Sample the action based on the probabilities and store its probability in the buffer
Step the environment with the action
Compute and store in the buffer the return using gamma=0.99
Normalize the return
Compute the policy loss as -sum(log(prob) * return)
Update the policy using an Adam optimizer and a learning rate of 5e-3
```
To learn more about REINFORCE, you can refer to [this unit](https://huggingface.co/learn/deep-rl-course/unit4/introduction).
> 🛠 **To be handed in**
> Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in `reinforce_cartpole.py`, and share a plot showing the total reward accross episodes in the `README.md`.
## Familiarization with a complete RL pipeline: Application to training a robotic arm
In this section, you will use the Stable-Baselines3 package to train a robotic arm using RL. You'll get familiar with several widely-used tools for training, monitoring and sharing machine learning models.
### Get familiar with Stable-Baselines3
Stable-Baselines3 (SB3) is a high-level RL library that provides various algorithms and integrated tools to easily train and test reinforcement learning models.
#### Installation
%% Cell type:code id:7b5d4e63 tags:
``` python
!pipinstallstable-baselines3--user
!pipinstallmoviepy
```
%% Output
Collecting stable-baselines3
Using cached stable_baselines3-2.2.1-py3-none-any.whl (181 kB)
Requirement already satisfied: matplotlib in c:\users\coren\anaconda3\lib\site-packages (from stable-baselines3) (3.4.3)
Requirement already satisfied: numpy>=1.20 in c:\users\coren\anaconda3\lib\site-packages (from stable-baselines3) (1.26.4)
Requirement already satisfied: pandas in c:\users\coren\anaconda3\lib\site-packages (from stable-baselines3) (1.3.4)
Requirement already satisfied: torch>=1.13 in c:\users\coren\anaconda3\lib\site-packages (from stable-baselines3) (2.1.1)
Requirement already satisfied: cloudpickle in c:\users\coren\anaconda3\lib\site-packages (from stable-baselines3) (2.0.0)
Collecting gymnasium<0.30,>=0.28.1
Using cached gymnasium-0.29.1-py3-none-any.whl (953 kB)
Collecting farama-notifications>=0.0.1
Using cached Farama_Notifications-0.0.4-py3-none-any.whl (2.5 kB)
Requirement already satisfied: importlib-metadata>=4.8.0 in c:\users\coren\anaconda3\lib\site-packages (from gymnasium<0.30,>=0.28.1->stable-baselines3) (4.8.1)
Requirement already satisfied: typing-extensions>=4.3.0 in c:\users\coren\anaconda3\lib\site-packages (from gymnasium<0.30,>=0.28.1->stable-baselines3) (4.4.0)
Requirement already satisfied: zipp>=0.5 in c:\users\coren\anaconda3\lib\site-packages (from importlib-metadata>=4.8.0->gymnasium<0.30,>=0.28.1->stable-baselines3) (3.6.0)
Requirement already satisfied: jinja2 in c:\users\coren\anaconda3\lib\site-packages (from torch>=1.13->stable-baselines3) (2.11.3)
Requirement already satisfied: sympy in c:\users\coren\anaconda3\lib\site-packages (from torch>=1.13->stable-baselines3) (1.9)
Requirement already satisfied: fsspec in c:\users\coren\anaconda3\lib\site-packages (from torch>=1.13->stable-baselines3) (2021.10.1)
Requirement already satisfied: networkx in c:\users\coren\anaconda3\lib\site-packages (from torch>=1.13->stable-baselines3) (2.6.3)
Requirement already satisfied: filelock in c:\users\coren\anaconda3\lib\site-packages (from torch>=1.13->stable-baselines3) (3.3.1)
Requirement already satisfied: MarkupSafe>=0.23 in c:\users\coren\anaconda3\lib\site-packages (from jinja2->torch>=1.13->stable-baselines3) (1.1.1)
Requirement already satisfied: python-dateutil>=2.7 in c:\users\coren\anaconda3\lib\site-packages (from matplotlib->stable-baselines3) (2.8.2)
Requirement already satisfied: pillow>=6.2.0 in c:\users\coren\anaconda3\lib\site-packages (from matplotlib->stable-baselines3) (8.4.0)
Requirement already satisfied: pyparsing>=2.2.1 in c:\users\coren\anaconda3\lib\site-packages (from matplotlib->stable-baselines3) (3.0.4)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\coren\anaconda3\lib\site-packages (from matplotlib->stable-baselines3) (1.3.1)
Requirement already satisfied: cycler>=0.10 in c:\users\coren\anaconda3\lib\site-packages (from matplotlib->stable-baselines3) (0.10.0)
Requirement already satisfied: six in c:\users\coren\anaconda3\lib\site-packages (from cycler>=0.10->matplotlib->stable-baselines3) (1.16.0)
Requirement already satisfied: pytz>=2017.3 in c:\users\coren\anaconda3\lib\site-packages (from pandas->stable-baselines3) (2021.3)
Requirement already satisfied: mpmath>=0.19 in c:\users\coren\anaconda3\lib\site-packages (from sympy->torch>=1.13->stable-baselines3) (1.2.1)
Use the [Stable-Baselines3 documentation](https://stable-baselines3.readthedocs.io/en/master/) to implement the code to solve the CartPole environment with the Advantage Actor-Critic (A2C) algorithm.
> 🛠 **To be handed in**
> Store the code in `a2c_sb3_cartpole.py`. Unless otherwise stated, you'll work upon this file for the next sections.
### Get familiar with Hugging Face Hub
Hugging Face Hub is a platform for easy sharing and versioning of trained machine learning models. With Hugging Face Hub, you can quickly and easily share your models with others and make them usable through the API. For example, see the trained A2C agent for CartPole: https://huggingface.co/sb3/a2c-CartPole-v1. Hugging Face Hub provides an API to download and upload SB3 models.
#### Installation of `huggingface_sb3`
%% Cell type:code id:cd890835 tags:
``` python
pipinstallhuggingface-sb3==2.3.1
```
%% Cell type:markdown id:0d0bef5b tags:
#### Upload the model on the Hub
Follow the [Hugging Face Hub documentation](https://huggingface.co/docs/hub/stable-baselines3) to upload the previously learned model to the Hub.
> 🛠 **To be handed in**
> Link the trained model in the `README.md` file.
> 📝 **Note**
> [RL-Zoo3](https://stable-baselines3.readthedocs.io/en/master/guide/rl_zoo.html) provides more advanced features to save hyperparameters, generate renderings and metrics. Feel free to try them.
### Get familiar with Weights & Biases
Weights & Biases (W&B) is a tool for machine learning experiment management. With W&B, you can track and compare your experiments, visualize your model training and performance.
#### Installation
You'll need to install both `wand` and `tensorboar`.
%% Cell type:code id:6645c23a tags:
``` python
pipinstallwandbtensorboard
```
%% Cell type:markdown id:9af0d167 tags:
Use the documentation of [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) and [Weights & Biases](https://docs.wandb.ai/guides/integrations/stable-baselines-3) to track the CartPole training. Make the run public.
🛠 Share the link of the wandb run in the `README.md` file.
> ⚠️ **Warning**
> Make sure to make the run public!
### Full workflow with panda-gym
[Panda-gym](https://github.com/qgallouedec/panda-gym) is a collection of environments for robotic simulation and control. It provides a range of challenges for training robotic agents in a simulated environment. In this section, you will get familiar with one of the environments provided by panda-gym, the `PandaReachJointsDense-v3`. The objective is to learn how to reach any point in 3D space by directly controlling the robot's articulations.
#### Installation
%% Cell type:code id:897573a8 tags:
``` python
pipinstallpanda-gym==3.0.7
```
%% Cell type:markdown id:443008be tags:
#### Train, track, and share
Use the Stable-Baselines3 package to train A2C model on the `PandaReachJointsDense-v2` environment. 500k timesteps should be enough. Track the environment with Weights & Biases. Once the training is over, upload the trained model on the Hub.
> 🛠 **To be handed in**
> Share all the code in `a2c_sb3_panda_reach.py`. Share the link of the wandb run and the trained model in the `README.md` file.
## Contribute
This tutorial may contain errors, inaccuracies, typos or areas for improvement. Feel free to contribute to its improvement by opening an issue.
"In this hands-on project, we will first implement a simple RL algorithm and apply it to solve the CartPole-v1 environment. Once we become familiar with the basic workflow, we will learn to use various tools for machine learning model training, monitoring, and sharing, by applying these tools to train a robotic arm.\n",
"\n",
"## To be handed in\n",
"\n",
"This work must be done individually. The expected output is a repository named `hands-on-rl` on https://gitlab.ec-lyon.fr. \n",
"\n",
"We assume that `git` is installed, and that you are familiar with the basic `git` commands. (Optionnaly, you can use GitHub Desktop.)\n",
"We also assume that you have access to the [ECL GitLab](https://gitlab.ec-lyon.fr/). If necessary, please consult [this tutorial](https://gitlab.ec-lyon.fr/edelland/inf_tc2/-/blob/main/Tutoriel_gitlab/tutoriel_gitlab.md).\n",
"\n",
"Your repository must contain a `README.md` file that explains **briefly** the successive steps of the project. It must be private, so you need to add your teacher as \"developer\" member.\n",
"\n",
"Throughout the subject, you will find a 🛠 symbol indicating that a specific production is expected.\n",
"\n",
"The last commit is due before 11:59 pm on March 5, 2024. Subsequent commits will not be considered.\n",
"\n",
"> ⚠️ **Warning**\n",
"> Ensure that you only commit the files that are requested. For example, your directory should not contain the generated `.zip` files, nor the `runs` folder... At the end, your repository must contain one `README.md`, three python scripts, and optionally image files for the plots.\n",
"\n",
"## Before you start\n",
"\n",
"Make sure you know the basics of Reinforcement Learning. In case of need, you can refer to the [introduction of the Hugging Face RL course](https://huggingface.co/blog/deep-rl-intro).\n",
"\n",
"## Introduction to Gym\n",
"\n",
"[Gym](https://gymnasium.farama.org/) is a framework for developing and evaluating reinforcement learning environments. It offers various environments, including classic control and toy text scenarios, to test RL algorithms.\n",
"\n",
"### Installation\n",
"\n",
"We recommend to use Python virtual environnements to install the required modules : https://docs.python.org/3/library/venv.html\n",
"Requirement already satisfied: numpy==1.25.0 in c:\\users\\coren\\anaconda3\\lib\\site-packages (1.25.0)\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"pip install numpy==1.25.0 --user"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "bb4d7c39",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: gym==0.26.2 in c:\\users\\coren\\anaconda3\\lib\\site-packages (0.26.2)\n",
"Requirement already satisfied: importlib-metadata>=4.8.0 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from gym==0.26.2) (4.8.1)\n",
"Requirement already satisfied: numpy>=1.18.0 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from gym==0.26.2) (1.25.0)\n",
"Requirement already satisfied: cloudpickle>=1.2.0 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from gym==0.26.2) (2.0.0)\n",
"Requirement already satisfied: gym-notices>=0.0.4 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from gym==0.26.2) (0.0.8)\n",
"Requirement already satisfied: zipp>=0.5 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from importlib-metadata>=4.8.0->gym==0.26.2) (3.6.0)\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"pip install gym==0.26.2"
]
},
{
"cell_type": "markdown",
"id": "c4339dd2",
"metadata": {},
"source": [
"Install also pyglet for the rendering."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "ae74426f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: pyglet==2.0.10 in c:\\users\\coren\\anaconda3\\lib\\site-packages (2.0.10)\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"pip install pyglet==2.0.10"
]
},
{
"cell_type": "markdown",
"id": "d6a2d90b",
"metadata": {},
"source": [
"If needed "
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "712fb75a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: pygame==2.5.2 in c:\\users\\coren\\anaconda3\\lib\\site-packages (2.5.2)\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"pip install pygame==2.5.2"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "3cdd7bcc",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: PyQt5 in c:\\users\\coren\\anaconda3\\lib\\site-packages (5.15.10)\n",
"Requirement already satisfied: PyQt5-Qt5>=5.15.2 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from PyQt5) (5.15.2)\n",
"Note: you may need to restart the kernel to use updated packages.Requirement already satisfied: PyQt5-sip<13,>=12.13 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from PyQt5) (12.13.0)\n",
"\n"
]
}
],
"source": [
"pip install PyQt5"
]
},
{
"cell_type": "markdown",
"id": "79581e82",
"metadata": {},
"source": [
"### Usage\n",
"\n",
"Here is an example of how to use Gym to solve the `CartPole-v1` environment [Documentation](https://gymnasium.farama.org/environments/classic_control/cart_pole/):"
"# Reset the environment and get the initial observation\n",
"observation = env.reset()\n",
"\n",
"for _ in range(1000):\n",
" # Select a random action from the action space\n",
" action = env.action_space.sample()\n",
" # Apply the action to the environment\n",
" # Returns next observation, reward, done signal (indicating\n",
" # if the episode has ended), and an additional info dictionary\n",
" observation, reward, terminated, truncated, info = env.step(action)\n",
" # Render the environment to visualize the agent's behavior\n",
" env.render()\n",
" if terminated: \n",
" # Terminated before max step\n",
" break\n",
"\n",
"env.close()"
]
},
{
"cell_type": "markdown",
"id": "fea5e4cf",
"metadata": {},
"source": [
"## REINFORCE\n",
"\n",
"The REINFORCE algorithm (also known as Vanilla Policy Gradient) is a policy gradient method that optimizes the policy directly using gradient descent. The following is the pseudocode of the REINFORCE algorithm:\n",
"\n",
"```txt\n",
"Setup the CartPole environment\n",
"Setup the agent as a simple neural network with:\n",
" - One fully connected layer with 128 units and ReLU activation followed by a dropout layer\n",
" - One fully connected layer followed by softmax activation\n",
"Repeat 500 times:\n",
" Reset the environment\n",
" Reset the buffer\n",
" Repeat until the end of the episode:\n",
" Compute action probabilities \n",
" Sample the action based on the probabilities and store its probability in the buffer \n",
" Step the environment with the action\n",
" Compute and store in the buffer the return using gamma=0.99 \n",
" Normalize the return\n",
" Compute the policy loss as -sum(log(prob) * return)\n",
" Update the policy using an Adam optimizer and a learning rate of 5e-3\n",
"```\n",
"\n",
"To learn more about REINFORCE, you can refer to [this unit](https://huggingface.co/learn/deep-rl-course/unit4/introduction).\n",
"\n",
"> 🛠 **To be handed in**\n",
"> Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in `reinforce_cartpole.py`, and share a plot showing the total reward accross episodes in the `README.md`.\n",
"\n",
"## Familiarization with a complete RL pipeline: Application to training a robotic arm\n",
"\n",
"In this section, you will use the Stable-Baselines3 package to train a robotic arm using RL. You'll get familiar with several widely-used tools for training, monitoring and sharing machine learning models.\n",
"\n",
"### Get familiar with Stable-Baselines3\n",
"\n",
"Stable-Baselines3 (SB3) is a high-level RL library that provides various algorithms and integrated tools to easily train and test reinforcement learning models.\n",
"\n",
"#### Installation"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "7b5d4e63",
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Collecting stable-baselines3\n",
" Using cached stable_baselines3-2.2.1-py3-none-any.whl (181 kB)\n",
"Requirement already satisfied: matplotlib in c:\\users\\coren\\anaconda3\\lib\\site-packages (from stable-baselines3) (3.4.3)\n",
"Requirement already satisfied: numpy>=1.20 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from stable-baselines3) (1.26.4)\n",
"Requirement already satisfied: pandas in c:\\users\\coren\\anaconda3\\lib\\site-packages (from stable-baselines3) (1.3.4)\n",
"Requirement already satisfied: torch>=1.13 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from stable-baselines3) (2.1.1)\n",
"Requirement already satisfied: cloudpickle in c:\\users\\coren\\anaconda3\\lib\\site-packages (from stable-baselines3) (2.0.0)\n",
"Collecting gymnasium<0.30,>=0.28.1\n",
" Using cached gymnasium-0.29.1-py3-none-any.whl (953 kB)\n",
"Collecting farama-notifications>=0.0.1\n",
" Using cached Farama_Notifications-0.0.4-py3-none-any.whl (2.5 kB)\n",
"Requirement already satisfied: importlib-metadata>=4.8.0 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from gymnasium<0.30,>=0.28.1->stable-baselines3) (4.8.1)\n",
"Requirement already satisfied: typing-extensions>=4.3.0 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from gymnasium<0.30,>=0.28.1->stable-baselines3) (4.4.0)\n",
"Requirement already satisfied: zipp>=0.5 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from importlib-metadata>=4.8.0->gymnasium<0.30,>=0.28.1->stable-baselines3) (3.6.0)\n",
"Requirement already satisfied: jinja2 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from torch>=1.13->stable-baselines3) (2.11.3)\n",
"Requirement already satisfied: sympy in c:\\users\\coren\\anaconda3\\lib\\site-packages (from torch>=1.13->stable-baselines3) (1.9)\n",
"Requirement already satisfied: fsspec in c:\\users\\coren\\anaconda3\\lib\\site-packages (from torch>=1.13->stable-baselines3) (2021.10.1)\n",
"Requirement already satisfied: networkx in c:\\users\\coren\\anaconda3\\lib\\site-packages (from torch>=1.13->stable-baselines3) (2.6.3)\n",
"Requirement already satisfied: filelock in c:\\users\\coren\\anaconda3\\lib\\site-packages (from torch>=1.13->stable-baselines3) (3.3.1)\n",
"Requirement already satisfied: MarkupSafe>=0.23 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from jinja2->torch>=1.13->stable-baselines3) (1.1.1)\n",
"Requirement already satisfied: python-dateutil>=2.7 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from matplotlib->stable-baselines3) (2.8.2)\n",
"Requirement already satisfied: pillow>=6.2.0 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from matplotlib->stable-baselines3) (8.4.0)\n",
"Requirement already satisfied: pyparsing>=2.2.1 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from matplotlib->stable-baselines3) (3.0.4)\n",
"Requirement already satisfied: kiwisolver>=1.0.1 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from matplotlib->stable-baselines3) (1.3.1)\n",
"Requirement already satisfied: cycler>=0.10 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from matplotlib->stable-baselines3) (0.10.0)\n",
"Requirement already satisfied: six in c:\\users\\coren\\anaconda3\\lib\\site-packages (from cycler>=0.10->matplotlib->stable-baselines3) (1.16.0)\n",
"Requirement already satisfied: pytz>=2017.3 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from pandas->stable-baselines3) (2021.3)\n",
"Requirement already satisfied: mpmath>=0.19 in c:\\users\\coren\\anaconda3\\lib\\site-packages (from sympy->torch>=1.13->stable-baselines3) (1.2.1)\n",
" obs, reward, done, info = vec_env.step(action)\n",
" #vec_env.render(\"human\")\n",
" # VecEnv resets automatically\n",
" # if done:\n",
" # obs = vec_env.reset()\n",
"env.close()"
]
},
{
"cell_type": "markdown",
"id": "70d3cc66",
"metadata": {},
"source": [
"#### Usage\n",
"\n",
"Use the [Stable-Baselines3 documentation](https://stable-baselines3.readthedocs.io/en/master/) to implement the code to solve the CartPole environment with the Advantage Actor-Critic (A2C) algorithm.\n",
"\n",
"\n",
"> 🛠 **To be handed in**\n",
"> Store the code in `a2c_sb3_cartpole.py`. Unless otherwise stated, you'll work upon this file for the next sections.\n",
"\n",
"### Get familiar with Hugging Face Hub\n",
"\n",
"Hugging Face Hub is a platform for easy sharing and versioning of trained machine learning models. With Hugging Face Hub, you can quickly and easily share your models with others and make them usable through the API. For example, see the trained A2C agent for CartPole: https://huggingface.co/sb3/a2c-CartPole-v1. Hugging Face Hub provides an API to download and upload SB3 models.\n",
"\n",
"#### Installation of `huggingface_sb3`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cd890835",
"metadata": {},
"outputs": [],
"source": [
"pip install huggingface-sb3==2.3.1"
]
},
{
"cell_type": "markdown",
"id": "0d0bef5b",
"metadata": {},
"source": [
"#### Upload the model on the Hub\n",
"\n",
"Follow the [Hugging Face Hub documentation](https://huggingface.co/docs/hub/stable-baselines3) to upload the previously learned model to the Hub.\n",
"\n",
"> 🛠 **To be handed in**\n",
"> Link the trained model in the `README.md` file.\n",
"\n",
"> 📝 **Note**\n",
"> [RL-Zoo3](https://stable-baselines3.readthedocs.io/en/master/guide/rl_zoo.html) provides more advanced features to save hyperparameters, generate renderings and metrics. Feel free to try them.\n",
"\n",
"### Get familiar with Weights & Biases\n",
"\n",
"Weights & Biases (W&B) is a tool for machine learning experiment management. With W&B, you can track and compare your experiments, visualize your model training and performance.\n",
"\n",
"#### Installation\n",
"\n",
"You'll need to install both `wand` and `tensorboar`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6645c23a",
"metadata": {},
"outputs": [],
"source": [
"pip install wandb tensorboard"
]
},
{
"cell_type": "markdown",
"id": "9af0d167",
"metadata": {},
"source": [
"Use the documentation of [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) and [Weights & Biases](https://docs.wandb.ai/guides/integrations/stable-baselines-3) to track the CartPole training. Make the run public.\n",
"\n",
"🛠 Share the link of the wandb run in the `README.md` file.\n",
"\n",
"> ⚠️ **Warning**\n",
"> Make sure to make the run public!\n",
"\n",
"### Full workflow with panda-gym\n",
"\n",
"[Panda-gym](https://github.com/qgallouedec/panda-gym) is a collection of environments for robotic simulation and control. It provides a range of challenges for training robotic agents in a simulated environment. In this section, you will get familiar with one of the environments provided by panda-gym, the `PandaReachJointsDense-v3`. The objective is to learn how to reach any point in 3D space by directly controlling the robot's articulations.\n",
"\n",
"#### Installation"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "897573a8",
"metadata": {},
"outputs": [],
"source": [
"pip install panda-gym==3.0.7"
]
},
{
"cell_type": "markdown",
"id": "443008be",
"metadata": {},
"source": [
"#### Train, track, and share\n",
"\n",
"Use the Stable-Baselines3 package to train A2C model on the `PandaReachJointsDense-v2` environment. 500k timesteps should be enough. Track the environment with Weights & Biases. Once the training is over, upload the trained model on the Hub.\n",
"\n",
"> 🛠 **To be handed in**\n",
"> Share all the code in `a2c_sb3_panda_reach.py`. Share the link of the wandb run and the trained model in the `README.md` file.\n",
"\n",
"## Contribute\n",
"\n",
"This tutorial may contain errors, inaccuracies, typos or areas for improvement. Feel free to contribute to its improvement by opening an issue.\n",
"\n",
"## Author\n",
"\n",
"Quentin Gallouédec\n",
"\n",
"Updates by Léo Schneider, Emmanuel Dellandréa\n",
"\n",
"## License\n",
"\n",
"MIT"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
%% Cell type:markdown id:e687aca8 tags:
GEREST CORENTIN
# TD 1 : Hands-On Reinforcement Learning
MSO 3.4 Apprentissage Automatique
In this hands-on project, we will first implement a simple RL algorithm and apply it to solve the CartPole-v1 environment. Once we become familiar with the basic workflow, we will learn to use various tools for machine learning model training, monitoring, and sharing, by applying these tools to train a robotic arm.
## To be handed in
This work must be done individually. The expected output is a repository named `hands-on-rl` on https://gitlab.ec-lyon.fr.
We assume that `git` is installed, and that you are familiar with the basic `git` commands. (Optionnaly, you can use GitHub Desktop.)
We also assume that you have access to the [ECL GitLab](https://gitlab.ec-lyon.fr/). If necessary, please consult [this tutorial](https://gitlab.ec-lyon.fr/edelland/inf_tc2/-/blob/main/Tutoriel_gitlab/tutoriel_gitlab.md).
Your repository must contain a `README.md` file that explains **briefly** the successive steps of the project. It must be private, so you need to add your teacher as "developer" member.
Throughout the subject, you will find a 🛠 symbol indicating that a specific production is expected.
The last commit is due before 11:59 pm on March 5, 2024. Subsequent commits will not be considered.
> ⚠️ **Warning**
> Ensure that you only commit the files that are requested. For example, your directory should not contain the generated `.zip` files, nor the `runs` folder... At the end, your repository must contain one `README.md`, three python scripts, and optionally image files for the plots.
## Before you start
Make sure you know the basics of Reinforcement Learning. In case of need, you can refer to the [introduction of the Hugging Face RL course](https://huggingface.co/blog/deep-rl-intro).
## Introduction to Gym
[Gym](https://gymnasium.farama.org/) is a framework for developing and evaluating reinforcement learning environments. It offers various environments, including classic control and toy text scenarios, to test RL algorithms.
### Installation
We recommend to use Python virtual environnements to install the required modules : https://docs.python.org/3/library/venv.html
Requirement already satisfied: numpy==1.25.0 in c:\users\coren\anaconda3\lib\site-packages (1.25.0)
Note: you may need to restart the kernel to use updated packages.
%% Cell type:code id:bb4d7c39 tags:
``` python
pipinstallgym==0.26.2
```
%% Output
Requirement already satisfied: gym==0.26.2 in c:\users\coren\anaconda3\lib\site-packages (0.26.2)
Requirement already satisfied: importlib-metadata>=4.8.0 in c:\users\coren\anaconda3\lib\site-packages (from gym==0.26.2) (4.8.1)
Requirement already satisfied: numpy>=1.18.0 in c:\users\coren\anaconda3\lib\site-packages (from gym==0.26.2) (1.25.0)
Requirement already satisfied: cloudpickle>=1.2.0 in c:\users\coren\anaconda3\lib\site-packages (from gym==0.26.2) (2.0.0)
Requirement already satisfied: gym-notices>=0.0.4 in c:\users\coren\anaconda3\lib\site-packages (from gym==0.26.2) (0.0.8)
Requirement already satisfied: zipp>=0.5 in c:\users\coren\anaconda3\lib\site-packages (from importlib-metadata>=4.8.0->gym==0.26.2) (3.6.0)
Note: you may need to restart the kernel to use updated packages.
%% Cell type:markdown id:c4339dd2 tags:
Install also pyglet for the rendering.
%% Cell type:code id:ae74426f tags:
``` python
pipinstallpyglet==2.0.10
```
%% Output
Requirement already satisfied: pyglet==2.0.10 in c:\users\coren\anaconda3\lib\site-packages (2.0.10)
Note: you may need to restart the kernel to use updated packages.
%% Cell type:markdown id:d6a2d90b tags:
If needed
%% Cell type:code id:712fb75a tags:
``` python
pipinstallpygame==2.5.2
```
%% Output
Requirement already satisfied: pygame==2.5.2 in c:\users\coren\anaconda3\lib\site-packages (2.5.2)
Note: you may need to restart the kernel to use updated packages.
%% Cell type:code id:3cdd7bcc tags:
``` python
pipinstallPyQt5
```
%% Output
Requirement already satisfied: PyQt5 in c:\users\coren\anaconda3\lib\site-packages (5.15.10)
Requirement already satisfied: PyQt5-Qt5>=5.15.2 in c:\users\coren\anaconda3\lib\site-packages (from PyQt5) (5.15.2)
Note: you may need to restart the kernel to use updated packages.Requirement already satisfied: PyQt5-sip<13,>=12.13 in c:\users\coren\anaconda3\lib\site-packages (from PyQt5) (12.13.0)
%% Cell type:markdown id:79581e82 tags:
### Usage
Here is an example of how to use Gym to solve the `CartPole-v1` environment [Documentation](https://gymnasium.farama.org/environments/classic_control/cart_pole/):
%% Cell type:code id:800853bd tags:
``` python
importgym
# Create the environment
env=gym.make("CartPole-v1",render_mode="human")
# Reset the environment and get the initial observation
observation=env.reset()
for_inrange(1000):
# Select a random action from the action space
action=env.action_space.sample()
# Apply the action to the environment
# Returns next observation, reward, done signal (indicating
# if the episode has ended), and an additional info dictionary
# Render the environment to visualize the agent's behavior
env.render()
ifterminated:
# Terminated before max step
break
env.close()
```
%% Cell type:markdown id:fea5e4cf tags:
## REINFORCE
The REINFORCE algorithm (also known as Vanilla Policy Gradient) is a policy gradient method that optimizes the policy directly using gradient descent. The following is the pseudocode of the REINFORCE algorithm:
```txt
Setup the CartPole environment
Setup the agent as a simple neural network with:
- One fully connected layer with 128 units and ReLU activation followed by a dropout layer
- One fully connected layer followed by softmax activation
Repeat 500 times:
Reset the environment
Reset the buffer
Repeat until the end of the episode:
Compute action probabilities
Sample the action based on the probabilities and store its probability in the buffer
Step the environment with the action
Compute and store in the buffer the return using gamma=0.99
Normalize the return
Compute the policy loss as -sum(log(prob) * return)
Update the policy using an Adam optimizer and a learning rate of 5e-3
```
To learn more about REINFORCE, you can refer to [this unit](https://huggingface.co/learn/deep-rl-course/unit4/introduction).
> 🛠 **To be handed in**
> Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in `reinforce_cartpole.py`, and share a plot showing the total reward accross episodes in the `README.md`.
## Familiarization with a complete RL pipeline: Application to training a robotic arm
In this section, you will use the Stable-Baselines3 package to train a robotic arm using RL. You'll get familiar with several widely-used tools for training, monitoring and sharing machine learning models.
### Get familiar with Stable-Baselines3
Stable-Baselines3 (SB3) is a high-level RL library that provides various algorithms and integrated tools to easily train and test reinforcement learning models.
#### Installation
%% Cell type:code id:7b5d4e63 tags:
``` python
!pipinstallstable-baselines3--user
!pipinstallmoviepy
```
%% Output
Collecting stable-baselines3
Using cached stable_baselines3-2.2.1-py3-none-any.whl (181 kB)
Requirement already satisfied: matplotlib in c:\users\coren\anaconda3\lib\site-packages (from stable-baselines3) (3.4.3)
Requirement already satisfied: numpy>=1.20 in c:\users\coren\anaconda3\lib\site-packages (from stable-baselines3) (1.26.4)
Requirement already satisfied: pandas in c:\users\coren\anaconda3\lib\site-packages (from stable-baselines3) (1.3.4)
Requirement already satisfied: torch>=1.13 in c:\users\coren\anaconda3\lib\site-packages (from stable-baselines3) (2.1.1)
Requirement already satisfied: cloudpickle in c:\users\coren\anaconda3\lib\site-packages (from stable-baselines3) (2.0.0)
Collecting gymnasium<0.30,>=0.28.1
Using cached gymnasium-0.29.1-py3-none-any.whl (953 kB)
Collecting farama-notifications>=0.0.1
Using cached Farama_Notifications-0.0.4-py3-none-any.whl (2.5 kB)
Requirement already satisfied: importlib-metadata>=4.8.0 in c:\users\coren\anaconda3\lib\site-packages (from gymnasium<0.30,>=0.28.1->stable-baselines3) (4.8.1)
Requirement already satisfied: typing-extensions>=4.3.0 in c:\users\coren\anaconda3\lib\site-packages (from gymnasium<0.30,>=0.28.1->stable-baselines3) (4.4.0)
Requirement already satisfied: zipp>=0.5 in c:\users\coren\anaconda3\lib\site-packages (from importlib-metadata>=4.8.0->gymnasium<0.30,>=0.28.1->stable-baselines3) (3.6.0)
Requirement already satisfied: jinja2 in c:\users\coren\anaconda3\lib\site-packages (from torch>=1.13->stable-baselines3) (2.11.3)
Requirement already satisfied: sympy in c:\users\coren\anaconda3\lib\site-packages (from torch>=1.13->stable-baselines3) (1.9)
Requirement already satisfied: fsspec in c:\users\coren\anaconda3\lib\site-packages (from torch>=1.13->stable-baselines3) (2021.10.1)
Requirement already satisfied: networkx in c:\users\coren\anaconda3\lib\site-packages (from torch>=1.13->stable-baselines3) (2.6.3)
Requirement already satisfied: filelock in c:\users\coren\anaconda3\lib\site-packages (from torch>=1.13->stable-baselines3) (3.3.1)
Requirement already satisfied: MarkupSafe>=0.23 in c:\users\coren\anaconda3\lib\site-packages (from jinja2->torch>=1.13->stable-baselines3) (1.1.1)
Requirement already satisfied: python-dateutil>=2.7 in c:\users\coren\anaconda3\lib\site-packages (from matplotlib->stable-baselines3) (2.8.2)
Requirement already satisfied: pillow>=6.2.0 in c:\users\coren\anaconda3\lib\site-packages (from matplotlib->stable-baselines3) (8.4.0)
Requirement already satisfied: pyparsing>=2.2.1 in c:\users\coren\anaconda3\lib\site-packages (from matplotlib->stable-baselines3) (3.0.4)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\coren\anaconda3\lib\site-packages (from matplotlib->stable-baselines3) (1.3.1)
Requirement already satisfied: cycler>=0.10 in c:\users\coren\anaconda3\lib\site-packages (from matplotlib->stable-baselines3) (0.10.0)
Requirement already satisfied: six in c:\users\coren\anaconda3\lib\site-packages (from cycler>=0.10->matplotlib->stable-baselines3) (1.16.0)
Requirement already satisfied: pytz>=2017.3 in c:\users\coren\anaconda3\lib\site-packages (from pandas->stable-baselines3) (2021.3)
Requirement already satisfied: mpmath>=0.19 in c:\users\coren\anaconda3\lib\site-packages (from sympy->torch>=1.13->stable-baselines3) (1.2.1)
Use the [Stable-Baselines3 documentation](https://stable-baselines3.readthedocs.io/en/master/) to implement the code to solve the CartPole environment with the Advantage Actor-Critic (A2C) algorithm.
> 🛠 **To be handed in**
> Store the code in `a2c_sb3_cartpole.py`. Unless otherwise stated, you'll work upon this file for the next sections.
### Get familiar with Hugging Face Hub
Hugging Face Hub is a platform for easy sharing and versioning of trained machine learning models. With Hugging Face Hub, you can quickly and easily share your models with others and make them usable through the API. For example, see the trained A2C agent for CartPole: https://huggingface.co/sb3/a2c-CartPole-v1. Hugging Face Hub provides an API to download and upload SB3 models.
#### Installation of `huggingface_sb3`
%% Cell type:code id:cd890835 tags:
``` python
pipinstallhuggingface-sb3==2.3.1
```
%% Cell type:markdown id:0d0bef5b tags:
#### Upload the model on the Hub
Follow the [Hugging Face Hub documentation](https://huggingface.co/docs/hub/stable-baselines3) to upload the previously learned model to the Hub.
> 🛠 **To be handed in**
> Link the trained model in the `README.md` file.
> 📝 **Note**
> [RL-Zoo3](https://stable-baselines3.readthedocs.io/en/master/guide/rl_zoo.html) provides more advanced features to save hyperparameters, generate renderings and metrics. Feel free to try them.
### Get familiar with Weights & Biases
Weights & Biases (W&B) is a tool for machine learning experiment management. With W&B, you can track and compare your experiments, visualize your model training and performance.
#### Installation
You'll need to install both `wand` and `tensorboar`.
%% Cell type:code id:6645c23a tags:
``` python
pipinstallwandbtensorboard
```
%% Cell type:markdown id:9af0d167 tags:
Use the documentation of [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) and [Weights & Biases](https://docs.wandb.ai/guides/integrations/stable-baselines-3) to track the CartPole training. Make the run public.
🛠 Share the link of the wandb run in the `README.md` file.
> ⚠️ **Warning**
> Make sure to make the run public!
### Full workflow with panda-gym
[Panda-gym](https://github.com/qgallouedec/panda-gym) is a collection of environments for robotic simulation and control. It provides a range of challenges for training robotic agents in a simulated environment. In this section, you will get familiar with one of the environments provided by panda-gym, the `PandaReachJointsDense-v3`. The objective is to learn how to reach any point in 3D space by directly controlling the robot's articulations.
#### Installation
%% Cell type:code id:897573a8 tags:
``` python
pipinstallpanda-gym==3.0.7
```
%% Cell type:markdown id:443008be tags:
#### Train, track, and share
Use the Stable-Baselines3 package to train A2C model on the `PandaReachJointsDense-v2` environment. 500k timesteps should be enough. Track the environment with Weights & Biases. Once the training is over, upload the trained model on the Hub.
> 🛠 **To be handed in**
> Share all the code in `a2c_sb3_panda_reach.py`. Share the link of the wandb run and the trained model in the `README.md` file.
## Contribute
This tutorial may contain errors, inaccuracies, typos or areas for improvement. Feel free to contribute to its improvement by opening an issue.