diff --git a/README.md b/README.md index 76f5e073b12e54f3e20ae77210cd09946a190af8..88a21a53621c4dcfce703ae114e9afa5165e661e 100644 --- a/README.md +++ b/README.md @@ -49,6 +49,10 @@ Install also pyglet for the rendering. pip install pyglet==2.0.10 ``` +```sh +pip install numpy==1.26.4 +``` + If needed ```sh @@ -59,6 +63,9 @@ pip install pygame==2.5.2 pip install PyQt5 ``` +```sh +pip install opencv-python +``` ### Usage @@ -109,12 +116,21 @@ Repeat 500 times: Normalize the return Compute the policy loss as -sum(log(prob) * return) Update the policy using an Adam optimizer and a learning rate of 5e-3 +Save the model weights ``` To learn more about REINFORCE, you can refer to [this unit](https://huggingface.co/learn/deep-rl-course/unit4/introduction). > 🛠**To be handed in** -> Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in `reinforce_cartpole.py`, and share a plot showing the total reward accross episodes in the `README.md`. +> Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in `reinforce_cartpole.py`, and share a plot showing the total reward accross episodes in the `README.md`. Also, share a file `reinforce_cartpole.pth` containing the learned weights. For saving and loading PyTorch models, check [this tutorial](https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-for-inference) + +## Model Evaluation + +Now that you have trained your model, it is time to evaluate its performance. Run it with rendering for a few trials and see if the policy is capable of completing the task. + +> 🛠**To be handed in** +> Implement a script which loads your saved model and use it to solve the cartpole enviroment. Run 100 evaluations and share the final success rate across all evaluations in the `README.md`. Share the code in `evaluate_reinforce_cartpole.py`. + ## Familiarization with a complete RL pipeline: Application to training a robotic arm @@ -128,6 +144,7 @@ Stable-Baselines3 (SB3) is a high-level RL library that provides various algorit ```sh pip install stable-baselines3 +pip install stable-baselines3[extra] pip install moviepy ``` @@ -190,7 +207,7 @@ pip install panda-gym==3.0.7 #### Train, track, and share -Use the Stable-Baselines3 package to train A2C model on the `PandaReachJointsDense-v2` environment. 500k timesteps should be enough. Track the environment with Weights & Biases. Once the training is over, upload the trained model on the Hub. +Use the Stable-Baselines3 package to train A2C model on the `PandaReachJointsDense-v3` environment. 500k timesteps should be enough. Track the environment with Weights & Biases. Once the training is over, upload the trained model on the Hub. > 🛠**To be handed in** > Share all the code in `a2c_sb3_panda_reach.py`. Share the link of the wandb run and the trained model in the `README.md` file. @@ -203,7 +220,7 @@ This tutorial may contain errors, inaccuracies, typos or areas for improvement. Quentin Gallouédec -Updates by Léo Schneider, Emmanuel Dellandréa +Updates by Bruno Machado, Léo Schneider, Emmanuel Dellandréa ## License