Update README.md

180efac4 · Dellandrea Emmanuel · 74c92be4 · 180efac4
Commit 180efac4 authored 4 months ago by Dellandrea Emmanuel
--- a/README.md
+++ b/README.md
@@ -49,6 +49,10 @@ Install also pyglet for the rendering.
 pip install pyglet==2.0.10
 ```
+```sh
+pip install numpy==1.26.4
+```
 If needed 
 ```sh
@@ -59,6 +63,9 @@ pip install pygame==2.5.2
 pip install PyQt5
 ```
+```sh
+pip install opencv-python
+```
 ### Usage
@@ -109,12 +116,21 @@ Repeat 500 times:
    Normalize the return
    Compute the policy loss as -sum(log(prob) * return)
    Update the policy using an Adam optimizer and a learning rate of 5e-3
+Save the model weights
 ```
 To learn more about REINFORCE, you can refer to [this unit](https://huggingface.co/learn/deep-rl-course/unit4/introduction).
 > 🛠 **To be handed in**
-> Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in `reinforce_cartpole.py`, and share a plot showing the total reward accross episodes in the `README.md`.
+> Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in `reinforce_cartpole.py`, and share a plot showing the total reward accross episodes in the `README.md`. Also, share a file `reinforce_cartpole.pth` containing the learned weights. For saving and loading PyTorch models, check [this tutorial](https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-for-inference)
+## Model Evaluation
+Now that you have trained your model, it is time to evaluate its performance. Run it with rendering for a few trials and see if the policy is capable of completing the task.
+> 🛠 **To be handed in**
+> Implement a script which loads your saved model and use it to solve the cartpole enviroment. Run 100 evaluations and share the final success rate across all evaluations in the `README.md`. Share the code in `evaluate_reinforce_cartpole.py`.
 ## Familiarization with a complete RL pipeline: Application to training a robotic arm
@@ -128,6 +144,7 @@ Stable-Baselines3 (SB3) is a high-level RL library that provides various algorit
 ```sh
 pip install stable-baselines3
+pip install stable-baselines3[extra]
 pip install moviepy
 ```
@@ -190,7 +207,7 @@ pip install panda-gym==3.0.7
 #### Train, track, and share
-Use the Stable-Baselines3 package to train A2C model on the `PandaReachJointsDense-v2` environment. 500k timesteps should be enough. Track the environment with Weights & Biases. Once the training is over, upload the trained model on the Hub.
+Use the Stable-Baselines3 package to train A2C model on the `PandaReachJointsDense-v3` environment. 500k timesteps should be enough. Track the environment with Weights & Biases. Once the training is over, upload the trained model on the Hub.
 > 🛠 **To be handed in**
 > Share all the code in `a2c_sb3_panda_reach.py`. Share the link of the wandb run and the trained model in the `README.md` file.
@@ -203,7 +220,7 @@ This tutorial may contain errors, inaccuracies, typos or areas for improvement.
 Quentin Gallouédec
-Updates by Léo Schneider, Emmanuel Dellandréa
+Updates by Bruno Machado, Léo Schneider, Emmanuel Dellandréa
 ## License