Compare revisions

Quentin GALLOUÉDEC · Quentin GALLOUÉDEC · Gallouedec Quentin · Gallouedec Quentin · Gallouedec Quentin · Gallouedec Quentin
--- a/README.md
+++ b/README.md
-# Hands-On Reinforcement Learning
+# TD 1 : Hands-On Reinforcement Learning
+
+MSO 3.4 Apprentissage Automatique
+
+# 

 In this hands-on project, we will first implement a simple RL algorithm and apply it to solve the CartPole-v1 environment. Once we become familiar with the basic workflow, we will learn to use various tools for machine learning model training, monitoring, and sharing, by applying these tools to train a robotic arm.

 ## To be handed in

-This work must be done individually. The expected output is a repository named `hands-on-rl` on https://gitlab.ec-lyon.fr. It must contain a `README.md` file that explains **briefly** the successive steps of the project. Throughout the subject, you will find a 🛠 symbol indicating that a specific production is expected.
-The last commit is due before 11:59 pm on Monday, February 13, 2023. Subsequent commits will not be considered.
+This work must be done individually. The expected output is a repository named `hands-on-rl` on https://gitlab.ec-lyon.fr. 
+
+We assume that `git` is installed, and that you are familiar with the basic `git` commands. (Optionnaly, you can use GitHub Desktop.)
+We also assume that you have access to the [ECL GitLab](https://gitlab.ec-lyon.fr/). If necessary, please consult [this tutorial](https://gitlab.ec-lyon.fr/edelland/inf_tc2/-/blob/main/Tutoriel_gitlab/tutoriel_gitlab.md).
+
+Your repository must contain a `README.md` file that explains **briefly** the successive steps of the project. It must be private, so you need to add your teacher as "developer" member.
+
+Throughout the subject, you will find a 🛠 symbol indicating that a specific production is expected.
+
+The last commit is due before 11:59 pm on March 17, 2025. Subsequent commits will not be considered.
+
+> ⚠️ **Warning**
+> Ensure that you only commit the files that are requested. For example, your directory should not contain the generated `.zip` files, nor the `runs` folder... At the end, your repository must contain one `README.md`, three python scripts, and optionally image files for the plots.
+
+## Before you start
+
+Make sure you know the basics of Reinforcement Learning. In case of need, you can refer to the [introduction of the Hugging Face RL course](https://huggingface.co/blog/deep-rl-intro).

 ## Introduction to Gym

-Gym is a framework for developing and evaluating reinforcement learning environments. It offers various environments, including classic control and toy text scenarios, to test RL algorithms.
+[Gym](https://gymnasium.farama.org/) is a framework for developing and evaluating reinforcement learning environments. It offers various environments, including classic control and toy text scenarios, to test RL algorithms.

 ### Installation

+We recommend to use Python virtual environnements to install the required modules : https://docs.python.org/3/library/venv.html or https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html if you are using conda.
+
+First, install Pytorch : https://pytorch.org/get-started/locally.
+
+Then install the following modules :
+
+
+```sh
+pip install gymnasium
+```
+
 ```sh
-pip install gym==0.21
+pip install "gymnasium[classic-control]"
 ```

+
 ### Usage

-Here is an example of how to use Gym to solve the `CartPole-v1` environment:
+Here is an example of how to use Gym to solve the `CartPole-v1` environment [Documentation](https://gymnasium.farama.org/environments/classic_control/cart_pole/):

 ```python
-import gym
+import gymnasium as gym

 # Create the environment
-env = gym.make("CartPole-v1")
+env = gym.make("CartPole-v1", render_mode="human")

 # Reset the environment and get the initial observation
 observation = env.reset()
@@ -36,9 +67,14 @@ for _ in range(100):
    # Apply the action to the environment
    # Returns next observation, reward, done signal (indicating
    # if the episode has ended), and an additional info dictionary
-    observation, reward, done, info = env.step(action)
+    observation, reward, terminated, truncated, info = env.step(action)
    # Render the environment to visualize the agent's behavior
    env.render()
+    if terminated: 
+        # Terminated before max step
+        break
+
+env.close()
 ```

 ## REINFORCE
@@ -61,9 +97,21 @@ Repeat 500 times:
    Normalize the return
    Compute the policy loss as -sum(log(prob) * return)
    Update the policy using an Adam optimizer and a learning rate of 5e-3
+Save the model weights
 ```

-🛠 Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in `reinforce.py`, and share a plot showing the return accross episodes in the `README.md`.
+To learn more about REINFORCE, you can refer to [this unit](https://huggingface.co/learn/deep-rl-course/unit4/policy-gradient).
+
+> 🛠 **To be handed in**
+> Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in `reinforce_cartpole.py`, and share a plot showing the total reward accross episodes in the `README.md`. Also, share a file `reinforce_cartpole.pth` containing the learned weights. For saving and loading PyTorch models, check [this tutorial](https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-for-inference)
+
+## Model Evaluation
+
+Now that you have trained your model, it is time to evaluate its performance. Run it with rendering for a few trials and see if the policy is capable of completing the task.
+
+> 🛠 **To be handed in**
+> Implement a script which loads your saved model and use it to solve the cartpole enviroment. Run 100 evaluations and share the final success rate across all evaluations in the `README.md`. Share the code in `evaluate_reinforce_cartpole.py`.
+

 ## Familiarization with a complete RL pipeline: Application to training a robotic arm

@@ -76,63 +124,74 @@ Stable-Baselines3 (SB3) is a high-level RL library that provides various algorit
 #### Installation

 ```sh
-pip install stable-baselines3[extra]
+pip install stable-baselines3
+pip install "stable-baselines3[extra]"
+pip install moviepy
 ```

-> ⚠️ If you use zsh as a shell, you'll need to use extra quote: `stable-baselines3"[extra]"`
-
 #### Usage

-Use the Stable-Baselines3 documentation and implement a code to solve the CartPole environment.
+Use the [Stable-Baselines3 documentation](https://stable-baselines3.readthedocs.io/en/master/) to implement the code to solve the CartPole environment with the Advantage Actor-Critic (A2C) algorithm.
+

-🛠 Store the code in `cartpole_sb3.py`. Unless otherwise state, you'll work upon this file for the next sections.
+> 🛠 **To be handed in**
+> Store the code in `a2c_sb3_cartpole.py`. Unless otherwise stated, you'll work upon this file for the next sections.

 ### Get familiar with Hugging Face Hub

 Hugging Face Hub is a platform for easy sharing and versioning of trained machine learning models. With Hugging Face Hub, you can quickly and easily share your models with others and make them usable through the API. For example, see the trained A2C agent for CartPole: https://huggingface.co/sb3/a2c-CartPole-v1. Hugging Face Hub provides an API to download and upload SB3 models.

-#### Installation of ̀ huggingface_sb3`
+#### Installation of `huggingface_sb3`

 ```sh
-pip install huggingface_sb3 pyglet==1.5.1
+pip install huggingface-sb3
 ```

 #### Upload the model on the Hub

-Follow the Hugging Face Hub documentation to upload the previously learned model to the Hub.
+Follow the [Hugging Face Hub documentation](https://huggingface.co/docs/hub/stable-baselines3) to upload the previously learned model to the Hub.

-🛠 Link the trained model in the `README.md` file.
+> 🛠 **To be handed in**
+> Link the trained model in the `README.md` file.
+
+> 📝 **Note**
+>  [RL-Zoo3](https://stable-baselines3.readthedocs.io/en/master/guide/rl_zoo.html) provides more advanced features to save hyperparameters, generate renderings and metrics. Feel free to try them.

 ### Get familiar with Weights & Biases

-Weights & Biases (W&B) is a tool for machine learning experiment management. With W&B, you can track and compare your experiments, visualize your model training and performance, and collaborate with your team.
+Weights & Biases (W&B) is a tool for machine learning experiment management. With W&B, you can track and compare your experiments, visualize your model training and performance.

 #### Installation

+You'll need to install `wand`.

 ```shell
 pip install wandb 
 ```

-Use the documentation of Stable-Baselines3 and Weights & Biases to track the CartPole training. Make the run public.
+Use the documentation of [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) and [Weights & Biases](https://docs.wandb.ai/guides/integrations/stable-baselines-3) to track the CartPole training. Make the run public.

 🛠 Share the link of the wandb run in the `README.md` file.

-### Get familiar with panda-gym
+> ⚠️ **Warning**
+> Make sure to make the run public! If it is not possible (due to the restrictions on your account), you can create a WandB [report](https://docs.wandb.ai/guides/reports/create-a-report/), add all relevant graphs and any textual descriptions or explanations you find pertinent, then download a PDF file (landscape format) and upload it along with the code to GitLab. Make sure to arrange the plots in a way that makes them understandable in the PDF (e.g., one graph per row, correct axes, etc.). Specify which report corresponds to which experiment.
+
+### Full workflow with panda-gym

-Panda-gym is a collection of environments for robotic simulation and control. It provides a range of challenges for training robotic agents in a simulated environment. In this section, you will get familiar with one of the environments provided by panda-gym, the PandaReachJointsDense-v2.
+[Panda-gym](https://github.com/qgallouedec/panda-gym) is a collection of environments for robotic simulation and control. It provides a range of challenges for training robotic agents in a simulated environment. In this section, you will get familiar with one of the environments provided by panda-gym, the `PandaReachJointsDense-v3`. The objective is to learn how to reach any point in 3D space by directly controlling the robot's articulations.

 #### Installation

 ```shell
-pip install panda_gym==2.0.0
+pip install panda-gym==3.0.7
 ```

 #### Train, track, and share

-Use the Stable-Baselines3 package to train A2C model on the `PandaReachJointsDense-v2` environment. 500k timesteps should be enough. Track the environment with Weights & Biases. Once the training is over, upload the trained model on the Hub.
+Use the Stable-Baselines3 package to train A2C model on the `PandaReachJointsDense-v3` environment. 500k timesteps should be enough. Track the environment with Weights & Biases. Once the training is over, upload the trained model on the Hub.

-🛠 Share all the code in `panda_gym_sb3.py`. Share the link of the wandb run and the trained model in the `README.md` file.
+> 🛠 **To be handed in**
+> Share all the code in `a2c_sb3_panda_reach.py`. Share the link of the wandb run and the trained model in the `README.md` file.

 ## Contribute

@@ -141,3 +200,9 @@ This tutorial may contain errors, inaccuracies, typos or areas for improvement.
 ## Author

 Quentin Gallouédec
+
+Updates by Bruno Machado, Léo Schneider, Emmanuel Dellandréa
+
+## License
+
+MIT
No results found