Skip to content
Snippets Groups Projects
Commit 2e754025 authored by Dellandrea Emmanuel's avatar Dellandrea Emmanuel
Browse files

Update README.md

parent 5db001cf
Branches main
No related tags found
No related merge requests found
...@@ -8,8 +8,14 @@ In this hands-on project, we will first implement a simple RL algorithm and appl ...@@ -8,8 +8,14 @@ In this hands-on project, we will first implement a simple RL algorithm and appl
## To be handed in ## To be handed in
This work must be done individually. The expected output is a repository named `hands-on-rl` on https://gitlab.ec-lyon.fr. It must contain a `README.md` file that explains **briefly** the successive steps of the project. Throughout the subject, you will find a 🛠 symbol indicating that a specific production is expected. This work must be done individually. The expected output is a repository named `hands-on-rl` on https://gitlab.ec-lyon.fr.
The last commit is due before 11:59 pm on Monday, February 13, 2023. Subsequent commits will not be considered.
We assume that `git` is installed, and that you are familiar with the basic `git` commands. (Optionnaly, you can use GitHub Desktop.)
We also assume that you have access to the [ECL GitLab](https://gitlab.ec-lyon.fr/). If necessary, please consult [this tutorial](https://gitlab.ec-lyon.fr/edelland/inf_tc2/-/blob/main/Tutoriel_gitlab/tutoriel_gitlab.md).
Your repository must contain a `README.md` file that explains **briefly** the successive steps of the project. Throughout the subject, you will find a 🛠 symbol indicating that a specific production is expected.
The last commit is due before 11:59 pm on March 5, 2024. Subsequent commits will not be considered.
> ⚠️ **Warning** > ⚠️ **Warning**
> Ensure that you only commit the files that are requested. For example, your directory should not contain the generated `.zip` files, nor the `runs` folder... At the end, your repository must contain one `README.md`, three python scripts, and optionally image files for the plots. > Ensure that you only commit the files that are requested. For example, your directory should not contain the generated `.zip` files, nor the `runs` folder... At the end, your repository must contain one `README.md`, three python scripts, and optionally image files for the plots.
...@@ -20,29 +26,43 @@ Make sure you know the basics of Reinforcement Learning. In case of need, you ca ...@@ -20,29 +26,43 @@ Make sure you know the basics of Reinforcement Learning. In case of need, you ca
## Introduction to Gym ## Introduction to Gym
Gym is a framework for developing and evaluating reinforcement learning environments. It offers various environments, including classic control and toy text scenarios, to test RL algorithms. [Gym](https://gymnasium.farama.org/) is a framework for developing and evaluating reinforcement learning environments. It offers various environments, including classic control and toy text scenarios, to test RL algorithms.
### Installation ### Installation
We recommend to use Python virtual environnements to install the required modules : https://docs.python.org/3/library/venv.html
```sh ```sh
pip install gym==0.21 pip install gym==0.26.2
``` ```
Install also pyglet for the rendering. Install also pyglet for the rendering.
```sh ```sh
pip install pyglet==1.5.27 pip install pyglet==2.0.10
```
If needed
```sh
pip install pygame==2.5.2
``` ```
```sh
pip install PyQt5
```
### Usage ### Usage
Here is an example of how to use Gym to solve the `CartPole-v1` environment: Here is an example of how to use Gym to solve the `CartPole-v1` environment [Documentation](https://gymnasium.farama.org/environments/classic_control/cart_pole/):
```python ```python
import gym import gym
# Create the environment # Create the environment
env = gym.make("CartPole-v1") env = gym.make("CartPole-v1", render_mode="human")
# Reset the environment and get the initial observation # Reset the environment and get the initial observation
observation = env.reset() observation = env.reset()
...@@ -53,9 +73,14 @@ for _ in range(100): ...@@ -53,9 +73,14 @@ for _ in range(100):
# Apply the action to the environment # Apply the action to the environment
# Returns next observation, reward, done signal (indicating # Returns next observation, reward, done signal (indicating
# if the episode has ended), and an additional info dictionary # if the episode has ended), and an additional info dictionary
observation, reward, done, info = env.step(action) observation, reward, terminated, truncated, info = env.step(action)
# Render the environment to visualize the agent's behavior # Render the environment to visualize the agent's behavior
env.render() env.render()
if terminated:
# Terminated before max step
break
env.close()
``` ```
## REINFORCE ## REINFORCE
...@@ -80,7 +105,7 @@ Repeat 500 times: ...@@ -80,7 +105,7 @@ Repeat 500 times:
Update the policy using an Adam optimizer and a learning rate of 5e-3 Update the policy using an Adam optimizer and a learning rate of 5e-3
``` ```
To learn more about REINFORCE, you can refer to [this unit](https://huggingface.co/blog/deep-rl-pg). To learn more about REINFORCE, you can refer to [this unit](https://huggingface.co/learn/deep-rl-course/unit4/introduction).
> 🛠 **To be handed in** > 🛠 **To be handed in**
> Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in `reinforce_cartpole.py`, and share a plot showing the total reward accross episodes in the `README.md`. > Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in `reinforce_cartpole.py`, and share a plot showing the total reward accross episodes in the `README.md`.
...@@ -97,6 +122,7 @@ Stable-Baselines3 (SB3) is a high-level RL library that provides various algorit ...@@ -97,6 +122,7 @@ Stable-Baselines3 (SB3) is a high-level RL library that provides various algorit
```sh ```sh
pip install stable-baselines3 pip install stable-baselines3
pip install moviepy
``` ```
#### Usage #### Usage
...@@ -114,12 +140,12 @@ Hugging Face Hub is a platform for easy sharing and versioning of trained machin ...@@ -114,12 +140,12 @@ Hugging Face Hub is a platform for easy sharing and versioning of trained machin
#### Installation of `huggingface_sb3` #### Installation of `huggingface_sb3`
```sh ```sh
pip install huggingface_sb3 pip install huggingface-sb3==2.3.1
``` ```
#### Upload the model on the Hub #### Upload the model on the Hub
Follow the [Hugging Face Hub documentation](https://huggingface.co/docs/hub/index) to upload the previously learned model to the Hub. Follow the [Hugging Face Hub documentation](https://huggingface.co/docs/hub/stable-baselines3) to upload the previously learned model to the Hub.
> 🛠 **To be handed in** > 🛠 **To be handed in**
> Link the trained model in the `README.md` file. > Link the trained model in the `README.md` file.
...@@ -139,7 +165,7 @@ You'll need to install both `wand` and `tensorboar`. ...@@ -139,7 +165,7 @@ You'll need to install both `wand` and `tensorboar`.
pip install wandb tensorboard pip install wandb tensorboard
``` ```
Use the documentation of Stable-Baselines3 and [Weights & Biases](https://docs.wandb.ai) to track the CartPole training. Make the run public. Use the documentation of [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) and [Weights & Biases](https://docs.wandb.ai/guides/integrations/stable-baselines-3) to track the CartPole training. Make the run public.
🛠 Share the link of the wandb run in the `README.md` file. 🛠 Share the link of the wandb run in the `README.md` file.
...@@ -153,7 +179,7 @@ Panda-gym is a collection of environments for robotic simulation and control. It ...@@ -153,7 +179,7 @@ Panda-gym is a collection of environments for robotic simulation and control. It
#### Installation #### Installation
```shell ```shell
pip install panda_gym==2.0.0 pip install panda-gym==3.0.7
``` ```
#### Train, track, and share #### Train, track, and share
...@@ -170,6 +196,7 @@ This tutorial may contain errors, inaccuracies, typos or areas for improvement. ...@@ -170,6 +196,7 @@ This tutorial may contain errors, inaccuracies, typos or areas for improvement.
## Author ## Author
Quentin Gallouédec Quentin Gallouédec
Updates by Léo Schneider, Emmanuel Dellandréa
## License ## License
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment