Skip to content
Snippets Groups Projects
Commit 180efac4 authored by Dellandrea Emmanuel's avatar Dellandrea Emmanuel
Browse files

Update README.md

parent 74c92be4
No related merge requests found
...@@ -49,6 +49,10 @@ Install also pyglet for the rendering. ...@@ -49,6 +49,10 @@ Install also pyglet for the rendering.
pip install pyglet==2.0.10 pip install pyglet==2.0.10
``` ```
```sh
pip install numpy==1.26.4
```
If needed If needed
```sh ```sh
...@@ -59,6 +63,9 @@ pip install pygame==2.5.2 ...@@ -59,6 +63,9 @@ pip install pygame==2.5.2
pip install PyQt5 pip install PyQt5
``` ```
```sh
pip install opencv-python
```
### Usage ### Usage
...@@ -109,12 +116,21 @@ Repeat 500 times: ...@@ -109,12 +116,21 @@ Repeat 500 times:
Normalize the return Normalize the return
Compute the policy loss as -sum(log(prob) * return) Compute the policy loss as -sum(log(prob) * return)
Update the policy using an Adam optimizer and a learning rate of 5e-3 Update the policy using an Adam optimizer and a learning rate of 5e-3
Save the model weights
``` ```
To learn more about REINFORCE, you can refer to [this unit](https://huggingface.co/learn/deep-rl-course/unit4/introduction). To learn more about REINFORCE, you can refer to [this unit](https://huggingface.co/learn/deep-rl-course/unit4/introduction).
> 🛠 **To be handed in** > 🛠 **To be handed in**
> Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in `reinforce_cartpole.py`, and share a plot showing the total reward accross episodes in the `README.md`. > Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in `reinforce_cartpole.py`, and share a plot showing the total reward accross episodes in the `README.md`. Also, share a file `reinforce_cartpole.pth` containing the learned weights. For saving and loading PyTorch models, check [this tutorial](https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-for-inference)
## Model Evaluation
Now that you have trained your model, it is time to evaluate its performance. Run it with rendering for a few trials and see if the policy is capable of completing the task.
> 🛠 **To be handed in**
> Implement a script which loads your saved model and use it to solve the cartpole enviroment. Run 100 evaluations and share the final success rate across all evaluations in the `README.md`. Share the code in `evaluate_reinforce_cartpole.py`.
## Familiarization with a complete RL pipeline: Application to training a robotic arm ## Familiarization with a complete RL pipeline: Application to training a robotic arm
...@@ -128,6 +144,7 @@ Stable-Baselines3 (SB3) is a high-level RL library that provides various algorit ...@@ -128,6 +144,7 @@ Stable-Baselines3 (SB3) is a high-level RL library that provides various algorit
```sh ```sh
pip install stable-baselines3 pip install stable-baselines3
pip install stable-baselines3[extra]
pip install moviepy pip install moviepy
``` ```
...@@ -190,7 +207,7 @@ pip install panda-gym==3.0.7 ...@@ -190,7 +207,7 @@ pip install panda-gym==3.0.7
#### Train, track, and share #### Train, track, and share
Use the Stable-Baselines3 package to train A2C model on the `PandaReachJointsDense-v2` environment. 500k timesteps should be enough. Track the environment with Weights & Biases. Once the training is over, upload the trained model on the Hub. Use the Stable-Baselines3 package to train A2C model on the `PandaReachJointsDense-v3` environment. 500k timesteps should be enough. Track the environment with Weights & Biases. Once the training is over, upload the trained model on the Hub.
> 🛠 **To be handed in** > 🛠 **To be handed in**
> Share all the code in `a2c_sb3_panda_reach.py`. Share the link of the wandb run and the trained model in the `README.md` file. > Share all the code in `a2c_sb3_panda_reach.py`. Share the link of the wandb run and the trained model in the `README.md` file.
...@@ -203,7 +220,7 @@ This tutorial may contain errors, inaccuracies, typos or areas for improvement. ...@@ -203,7 +220,7 @@ This tutorial may contain errors, inaccuracies, typos or areas for improvement.
Quentin Gallouédec Quentin Gallouédec
Updates by Léo Schneider, Emmanuel Dellandréa Updates by Bruno Machado, Léo Schneider, Emmanuel Dellandréa
## License ## License
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment