Compare revisions

Dellandrea Emmanuel · Dellandrea Emmanuel · Dellandrea Emmanuel · Dellandrea Emmanuel · Dellandrea Emmanuel · Dellandrea Emmanuel
--- a/README.md
+++ b/README.md
@@ -17,7 +17,7 @@ Your repository must contain a `README.md` file that explains **briefly** the su

 Throughout the subject, you will find a 🛠 symbol indicating that a specific production is expected.

-The last commit is due before 11:59 pm on March 5, 2024. Subsequent commits will not be considered.
+The last commit is due before 11:59 pm on March 17, 2025. Subsequent commits will not be considered.

 > ⚠️ **Warning**
 > Ensure that you only commit the files that are requested. For example, your directory should not contain the generated `.zip` files, nor the `runs` folder... At the end, your repository must contain one `README.md`, three python scripts, and optionally image files for the plots.
@@ -32,7 +32,7 @@ Make sure you know the basics of Reinforcement Learning. In case of need, you ca

 ### Installation

-We recommend to use Python virtual environnements to install the required modules : https://docs.python.org/3/library/venv.html
+We recommend to use Python virtual environnements to install the required modules : https://docs.python.org/3/library/venv.html or https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html if you are using conda.

 First, install Pytorch : https://pytorch.org/get-started/locally.

@@ -40,23 +40,11 @@ Then install the following modules :


 ```sh
-pip install gym==0.26.2
+pip install gymnasium
 ```

-Install also pyglet for the rendering.
-
 ```sh
-pip install pyglet==2.0.10
-```
-
-If needed 
-
-```sh
-pip install pygame==2.5.2
-```
-
-```sh
-pip install PyQt5
+pip install "gymnasium[classic-control]"
 ```


@@ -65,7 +53,7 @@ pip install PyQt5
 Here is an example of how to use Gym to solve the `CartPole-v1` environment [Documentation](https://gymnasium.farama.org/environments/classic_control/cart_pole/):

 ```python
-import gym
+import gymnasium as gym

 # Create the environment
 env = gym.make("CartPole-v1", render_mode="human")
@@ -109,12 +97,21 @@ Repeat 500 times:
    Normalize the return
    Compute the policy loss as -sum(log(prob) * return)
    Update the policy using an Adam optimizer and a learning rate of 5e-3
+Save the model weights
 ```

-To learn more about REINFORCE, you can refer to [this unit](https://huggingface.co/learn/deep-rl-course/unit4/introduction).
+To learn more about REINFORCE, you can refer to [this unit](https://huggingface.co/learn/deep-rl-course/unit4/policy-gradient).
+
+> 🛠 **To be handed in**
+> Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in `reinforce_cartpole.py`, and share a plot showing the total reward accross episodes in the `README.md`. Also, share a file `reinforce_cartpole.pth` containing the learned weights. For saving and loading PyTorch models, check [this tutorial](https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-for-inference)
+
+## Model Evaluation
+
+Now that you have trained your model, it is time to evaluate its performance. Run it with rendering for a few trials and see if the policy is capable of completing the task.

 > 🛠 **To be handed in**
-> Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in `reinforce_cartpole.py`, and share a plot showing the total reward accross episodes in the `README.md`.
+> Implement a script which loads your saved model and use it to solve the cartpole enviroment. Run 100 evaluations and share the final success rate across all evaluations in the `README.md`. Share the code in `evaluate_reinforce_cartpole.py`.
+

 ## Familiarization with a complete RL pipeline: Application to training a robotic arm

@@ -128,6 +125,7 @@ Stable-Baselines3 (SB3) is a high-level RL library that provides various algorit

 ```sh
 pip install stable-baselines3
+pip install "stable-baselines3[extra]"
 pip install moviepy
 ```

@@ -146,7 +144,7 @@ Hugging Face Hub is a platform for easy sharing and versioning of trained machin
 #### Installation of `huggingface_sb3`

 ```sh
-pip install huggingface-sb3==2.3.1
+pip install huggingface-sb3
 ```

 #### Upload the model on the Hub
@@ -165,10 +163,10 @@ Weights & Biases (W&B) is a tool for machine learning experiment management. Wit

 #### Installation

-You'll need to install both `wand` and `tensorboar`.
+You'll need to install `wand`.

 ```shell
-pip install wandb tensorboard
+pip install wandb 
 ```

 Use the documentation of [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) and [Weights & Biases](https://docs.wandb.ai/guides/integrations/stable-baselines-3) to track the CartPole training. Make the run public.
@@ -176,7 +174,7 @@ Use the documentation of [Stable-Baselines3](https://stable-baselines3.readthedo
 🛠 Share the link of the wandb run in the `README.md` file.

 > ⚠️ **Warning**
-> Make sure to make the run public!
+> Make sure to make the run public! If it is not possible (due to the restrictions on your account), you can create a WandB [report](https://docs.wandb.ai/guides/reports/create-a-report/), add all relevant graphs and any textual descriptions or explanations you find pertinent, then download a PDF file (landscape format) and upload it along with the code to GitLab. Make sure to arrange the plots in a way that makes them understandable in the PDF (e.g., one graph per row, correct axes, etc.). Specify which report corresponds to which experiment.

 ### Full workflow with panda-gym

@@ -190,7 +188,7 @@ pip install panda-gym==3.0.7

 #### Train, track, and share

-Use the Stable-Baselines3 package to train A2C model on the `PandaReachJointsDense-v2` environment. 500k timesteps should be enough. Track the environment with Weights & Biases. Once the training is over, upload the trained model on the Hub.
+Use the Stable-Baselines3 package to train A2C model on the `PandaReachJointsDense-v3` environment. 500k timesteps should be enough. Track the environment with Weights & Biases. Once the training is over, upload the trained model on the Hub.

 > 🛠 **To be handed in**
 > Share all the code in `a2c_sb3_panda_reach.py`. Share the link of the wandb run and the trained model in the `README.md` file.
@@ -203,7 +201,7 @@ This tutorial may contain errors, inaccuracies, typos or areas for improvement.

 Quentin Gallouédec

-Updates by Léo Schneider, Emmanuel Dellandréa
+Updates by Bruno Machado, Léo Schneider, Emmanuel Dellandréa

 ## License
No results found