Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found
Select Git revision
  • main
1 result

Target

Select target project
  • loestrei/mso_3_4-td1
  • edelland/mso_3_4-td1
  • schneidl/mso_3_4-td1
  • epaganel/mso_3_4-td1
  • asennevi/armand-senneville-mso-3-4-td-1
  • hchauvin/mso_3_4-td1
  • mbabay/mso_3_4-td1
  • ochaufou/mso_3_4-td1
  • cgerest/hands-on-rl
  • robertr/mso_3_4-td1
  • kmajdi/mso_3_4-td1
  • jseksik/hands-on-rl
  • coulonj/mso_3_4-td1
  • tdesgreys/mso_3_4-td1
14 results
Select Git revision
  • main
1 result
Show changes
Commits on Source (7)
......@@ -17,7 +17,7 @@ Your repository must contain a `README.md` file that explains **briefly** the su
Throughout the subject, you will find a 🛠 symbol indicating that a specific production is expected.
The last commit is due before 11:59 pm on March 5, 2024. Subsequent commits will not be considered.
The last commit is due before 11:59 pm on March 17, 2025. Subsequent commits will not be considered.
> ⚠️ **Warning**
> Ensure that you only commit the files that are requested. For example, your directory should not contain the generated `.zip` files, nor the `runs` folder... At the end, your repository must contain one `README.md`, three python scripts, and optionally image files for the plots.
......@@ -32,7 +32,7 @@ Make sure you know the basics of Reinforcement Learning. In case of need, you ca
### Installation
We recommend to use Python virtual environnements to install the required modules : https://docs.python.org/3/library/venv.html
We recommend to use Python virtual environnements to install the required modules : https://docs.python.org/3/library/venv.html or https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html if you are using conda.
First, install Pytorch : https://pytorch.org/get-started/locally.
......@@ -40,23 +40,11 @@ Then install the following modules :
```sh
pip install gym==0.26.2
pip install gymnasium
```
Install also pyglet for the rendering.
```sh
pip install pyglet==2.0.10
```
If needed
```sh
pip install pygame==2.5.2
```
```sh
pip install PyQt5
pip install "gymnasium[classic-control]"
```
......@@ -65,7 +53,7 @@ pip install PyQt5
Here is an example of how to use Gym to solve the `CartPole-v1` environment [Documentation](https://gymnasium.farama.org/environments/classic_control/cart_pole/):
```python
import gym
import gymnasium as gym
# Create the environment
env = gym.make("CartPole-v1", render_mode="human")
......@@ -109,12 +97,21 @@ Repeat 500 times:
Normalize the return
Compute the policy loss as -sum(log(prob) * return)
Update the policy using an Adam optimizer and a learning rate of 5e-3
Save the model weights
```
To learn more about REINFORCE, you can refer to [this unit](https://huggingface.co/learn/deep-rl-course/unit4/introduction).
To learn more about REINFORCE, you can refer to [this unit](https://huggingface.co/learn/deep-rl-course/unit4/policy-gradient).
> 🛠 **To be handed in**
> Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in `reinforce_cartpole.py`, and share a plot showing the total reward accross episodes in the `README.md`. Also, share a file `reinforce_cartpole.pth` containing the learned weights. For saving and loading PyTorch models, check [this tutorial](https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-for-inference)
## Model Evaluation
Now that you have trained your model, it is time to evaluate its performance. Run it with rendering for a few trials and see if the policy is capable of completing the task.
> 🛠 **To be handed in**
> Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in `reinforce_cartpole.py`, and share a plot showing the total reward accross episodes in the `README.md`.
> Implement a script which loads your saved model and use it to solve the cartpole enviroment. Run 100 evaluations and share the final success rate across all evaluations in the `README.md`. Share the code in `evaluate_reinforce_cartpole.py`.
## Familiarization with a complete RL pipeline: Application to training a robotic arm
......@@ -128,6 +125,7 @@ Stable-Baselines3 (SB3) is a high-level RL library that provides various algorit
```sh
pip install stable-baselines3
pip install "stable-baselines3[extra]"
pip install moviepy
```
......@@ -146,7 +144,7 @@ Hugging Face Hub is a platform for easy sharing and versioning of trained machin
#### Installation of `huggingface_sb3`
```sh
pip install huggingface-sb3==2.3.1
pip install huggingface-sb3
```
#### Upload the model on the Hub
......@@ -165,10 +163,10 @@ Weights & Biases (W&B) is a tool for machine learning experiment management. Wit
#### Installation
You'll need to install both `wand` and `tensorboar`.
You'll need to install `wand`.
```shell
pip install wandb tensorboard
pip install wandb
```
Use the documentation of [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) and [Weights & Biases](https://docs.wandb.ai/guides/integrations/stable-baselines-3) to track the CartPole training. Make the run public.
......@@ -176,7 +174,7 @@ Use the documentation of [Stable-Baselines3](https://stable-baselines3.readthedo
🛠 Share the link of the wandb run in the `README.md` file.
> ⚠️ **Warning**
> Make sure to make the run public!
> Make sure to make the run public! If it is not possible (due to the restrictions on your account), you can create a WandB [report](https://docs.wandb.ai/guides/reports/create-a-report/), add all relevant graphs and any textual descriptions or explanations you find pertinent, then download a PDF file (landscape format) and upload it along with the code to GitLab. Make sure to arrange the plots in a way that makes them understandable in the PDF (e.g., one graph per row, correct axes, etc.). Specify which report corresponds to which experiment.
### Full workflow with panda-gym
......@@ -190,7 +188,7 @@ pip install panda-gym==3.0.7
#### Train, track, and share
Use the Stable-Baselines3 package to train A2C model on the `PandaReachJointsDense-v2` environment. 500k timesteps should be enough. Track the environment with Weights & Biases. Once the training is over, upload the trained model on the Hub.
Use the Stable-Baselines3 package to train A2C model on the `PandaReachJointsDense-v3` environment. 500k timesteps should be enough. Track the environment with Weights & Biases. Once the training is over, upload the trained model on the Hub.
> 🛠 **To be handed in**
> Share all the code in `a2c_sb3_panda_reach.py`. Share the link of the wandb run and the trained model in the `README.md` file.
......@@ -203,7 +201,7 @@ This tutorial may contain errors, inaccuracies, typos or areas for improvement.
Quentin Gallouédec
Updates by Léo Schneider, Emmanuel Dellandréa
Updates by Bruno Machado, Léo Schneider, Emmanuel Dellandréa
## License
......