@@ -8,8 +8,14 @@ In this hands-on project, we will first implement a simple RL algorithm and appl
...
@@ -8,8 +8,14 @@ In this hands-on project, we will first implement a simple RL algorithm and appl
## To be handed in
## To be handed in
This work must be done individually. The expected output is a repository named `hands-on-rl` on https://gitlab.ec-lyon.fr. It must contain a `README.md` file that explains **briefly** the successive steps of the project. Throughout the subject, you will find a 🛠 symbol indicating that a specific production is expected.
This work must be done individually. The expected output is a repository named `hands-on-rl` on https://gitlab.ec-lyon.fr.
The last commit is due before 11:59 pm on Monday, February 13, 2023. Subsequent commits will not be considered.
We assume that `git` is installed, and that you are familiar with the basic `git` commands. (Optionnaly, you can use GitHub Desktop.)
We also assume that you have access to the [ECL GitLab](https://gitlab.ec-lyon.fr/). If necessary, please consult [this tutorial](https://gitlab.ec-lyon.fr/edelland/inf_tc2/-/blob/main/Tutoriel_gitlab/tutoriel_gitlab.md).
Your repository must contain a `README.md` file that explains **briefly** the successive steps of the project. Throughout the subject, you will find a 🛠 symbol indicating that a specific production is expected.
The last commit is due before 11:59 pm on March 5, 2024. Subsequent commits will not be considered.
> ⚠️ **Warning**
> ⚠️ **Warning**
> Ensure that you only commit the files that are requested. For example, your directory should not contain the generated `.zip` files, nor the `runs` folder... At the end, your repository must contain one `README.md`, three python scripts, and optionally image files for the plots.
> Ensure that you only commit the files that are requested. For example, your directory should not contain the generated `.zip` files, nor the `runs` folder... At the end, your repository must contain one `README.md`, three python scripts, and optionally image files for the plots.
...
@@ -20,29 +26,43 @@ Make sure you know the basics of Reinforcement Learning. In case of need, you ca
...
@@ -20,29 +26,43 @@ Make sure you know the basics of Reinforcement Learning. In case of need, you ca
## Introduction to Gym
## Introduction to Gym
Gym is a framework for developing and evaluating reinforcement learning environments. It offers various environments, including classic control and toy text scenarios, to test RL algorithms.
[Gym](https://gymnasium.farama.org/) is a framework for developing and evaluating reinforcement learning environments. It offers various environments, including classic control and toy text scenarios, to test RL algorithms.
### Installation
### Installation
We recommend to use Python virtual environnements to install the required modules : https://docs.python.org/3/library/venv.html
```sh
```sh
pip install gym==0.21
pip install gym==0.26.2
```
```
Install also pyglet for the rendering.
Install also pyglet for the rendering.
```sh
```sh
pip install pyglet==1.5.27
pip install pyglet==2.0.10
```
If needed
```sh
pip install pygame==2.5.2
```
```
```sh
pip install PyQt5
```
### Usage
### Usage
Here is an example of how to use Gym to solve the `CartPole-v1` environment:
Here is an example of how to use Gym to solve the `CartPole-v1` environment[Documentation](https://gymnasium.farama.org/environments/classic_control/cart_pole/):
```python
```python
importgym
importgym
# Create the environment
# Create the environment
env=gym.make("CartPole-v1")
env=gym.make("CartPole-v1",render_mode="human")
# Reset the environment and get the initial observation
# Reset the environment and get the initial observation
observation=env.reset()
observation=env.reset()
...
@@ -53,9 +73,14 @@ for _ in range(100):
...
@@ -53,9 +73,14 @@ for _ in range(100):
# Apply the action to the environment
# Apply the action to the environment
# Returns next observation, reward, done signal (indicating
# Returns next observation, reward, done signal (indicating
# if the episode has ended), and an additional info dictionary
# if the episode has ended), and an additional info dictionary
# Render the environment to visualize the agent's behavior
# Render the environment to visualize the agent's behavior
env.render()
env.render()
ifterminated:
# Terminated before max step
break
env.close()
```
```
## REINFORCE
## REINFORCE
...
@@ -80,7 +105,7 @@ Repeat 500 times:
...
@@ -80,7 +105,7 @@ Repeat 500 times:
Update the policy using an Adam optimizer and a learning rate of 5e-3
Update the policy using an Adam optimizer and a learning rate of 5e-3
```
```
To learn more about REINFORCE, you can refer to [this unit](https://huggingface.co/blog/deep-rl-pg).
To learn more about REINFORCE, you can refer to [this unit](https://huggingface.co/learn/deep-rl-course/unit4/introduction).
> 🛠 **To be handed in**
> 🛠 **To be handed in**
> Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in `reinforce_cartpole.py`, and share a plot showing the total reward accross episodes in the `README.md`.
> Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in `reinforce_cartpole.py`, and share a plot showing the total reward accross episodes in the `README.md`.
...
@@ -97,6 +122,7 @@ Stable-Baselines3 (SB3) is a high-level RL library that provides various algorit
...
@@ -97,6 +122,7 @@ Stable-Baselines3 (SB3) is a high-level RL library that provides various algorit
```sh
```sh
pip install stable-baselines3
pip install stable-baselines3
pip install moviepy
```
```
#### Usage
#### Usage
...
@@ -114,12 +140,12 @@ Hugging Face Hub is a platform for easy sharing and versioning of trained machin
...
@@ -114,12 +140,12 @@ Hugging Face Hub is a platform for easy sharing and versioning of trained machin
#### Installation of `huggingface_sb3`
#### Installation of `huggingface_sb3`
```sh
```sh
pip install huggingface_sb3
pip install huggingface-sb3==2.3.1
```
```
#### Upload the model on the Hub
#### Upload the model on the Hub
Follow the [Hugging Face Hub documentation](https://huggingface.co/docs/hub/index) to upload the previously learned model to the Hub.
Follow the [Hugging Face Hub documentation](https://huggingface.co/docs/hub/stable-baselines3) to upload the previously learned model to the Hub.
> 🛠 **To be handed in**
> 🛠 **To be handed in**
> Link the trained model in the `README.md` file.
> Link the trained model in the `README.md` file.
...
@@ -139,7 +165,7 @@ You'll need to install both `wand` and `tensorboar`.
...
@@ -139,7 +165,7 @@ You'll need to install both `wand` and `tensorboar`.
pip install wandb tensorboard
pip install wandb tensorboard
```
```
Use the documentation of Stable-Baselines3 and [Weights & Biases](https://docs.wandb.ai) to track the CartPole training. Make the run public.
Use the documentation of [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) and [Weights & Biases](https://docs.wandb.ai/guides/integrations/stable-baselines-3) to track the CartPole training. Make the run public.
🛠 Share the link of the wandb run in the `README.md` file.
🛠 Share the link of the wandb run in the `README.md` file.
...
@@ -153,7 +179,7 @@ Panda-gym is a collection of environments for robotic simulation and control. It
...
@@ -153,7 +179,7 @@ Panda-gym is a collection of environments for robotic simulation and control. It
#### Installation
#### Installation
```shell
```shell
pip install panda_gym==2.0.0
pip install panda-gym==3.0.7
```
```
#### Train, track, and share
#### Train, track, and share
...
@@ -170,6 +196,7 @@ This tutorial may contain errors, inaccuracies, typos or areas for improvement.
...
@@ -170,6 +196,7 @@ This tutorial may contain errors, inaccuracies, typos or areas for improvement.