@@ -49,6 +49,10 @@ Install also pyglet for the rendering.
pip install pyglet==2.0.10
```
```sh
pip install numpy==1.26.4
```
If needed
```sh
...
...
@@ -59,6 +63,9 @@ pip install pygame==2.5.2
pip install PyQt5
```
```sh
pip install opencv-python
```
### Usage
...
...
@@ -109,12 +116,21 @@ Repeat 500 times:
Normalize the return
Compute the policy loss as -sum(log(prob) * return)
Update the policy using an Adam optimizer and a learning rate of 5e-3
Save the model weights
```
To learn more about REINFORCE, you can refer to [this unit](https://huggingface.co/learn/deep-rl-course/unit4/introduction).
> 🛠 **To be handed in**
> Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in `reinforce_cartpole.py`, and share a plot showing the total reward accross episodes in the `README.md`.
> Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in `reinforce_cartpole.py`, and share a plot showing the total reward accross episodes in the `README.md`. Also, share a file `reinforce_cartpole.pth` containing the learned weights. For saving and loading PyTorch models, check [this tutorial](https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-for-inference)
## Model Evaluation
Now that you have trained your model, it is time to evaluate its performance. Run it with rendering for a few trials and see if the policy is capable of completing the task.
> 🛠 **To be handed in**
> Implement a script which loads your saved model and use it to solve the cartpole enviroment. Run 100 evaluations and share the final success rate across all evaluations in the `README.md`. Share the code in `evaluate_reinforce_cartpole.py`.
## Familiarization with a complete RL pipeline: Application to training a robotic arm
...
...
@@ -128,6 +144,7 @@ Stable-Baselines3 (SB3) is a high-level RL library that provides various algorit
```sh
pip install stable-baselines3
pip install stable-baselines3[extra]
pip install moviepy
```
...
...
@@ -190,7 +207,7 @@ pip install panda-gym==3.0.7
#### Train, track, and share
Use the Stable-Baselines3 package to train A2C model on the `PandaReachJointsDense-v2` environment. 500k timesteps should be enough. Track the environment with Weights & Biases. Once the training is over, upload the trained model on the Hub.
Use the Stable-Baselines3 package to train A2C model on the `PandaReachJointsDense-v3` environment. 500k timesteps should be enough. Track the environment with Weights & Biases. Once the training is over, upload the trained model on the Hub.
> 🛠 **To be handed in**
> Share all the code in `a2c_sb3_panda_reach.py`. Share the link of the wandb run and the trained model in the `README.md` file.
...
...
@@ -203,7 +220,7 @@ This tutorial may contain errors, inaccuracies, typos or areas for improvement.
Quentin Gallouédec
Updates by Léo Schneider, Emmanuel Dellandréa
Updates by Bruno Machado, Léo Schneider, Emmanuel Dellandréa