Skip to content
Snippets Groups Projects
Commit 180efac4 authored by Dellandrea Emmanuel's avatar Dellandrea Emmanuel
Browse files

Update README.md

parent 74c92be4
Branches
No related tags found
No related merge requests found
......@@ -49,6 +49,10 @@ Install also pyglet for the rendering.
pip install pyglet==2.0.10
```
```sh
pip install numpy==1.26.4
```
If needed
```sh
......@@ -59,6 +63,9 @@ pip install pygame==2.5.2
pip install PyQt5
```
```sh
pip install opencv-python
```
### Usage
......@@ -109,12 +116,21 @@ Repeat 500 times:
Normalize the return
Compute the policy loss as -sum(log(prob) * return)
Update the policy using an Adam optimizer and a learning rate of 5e-3
Save the model weights
```
To learn more about REINFORCE, you can refer to [this unit](https://huggingface.co/learn/deep-rl-course/unit4/introduction).
> 🛠 **To be handed in**
> Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in `reinforce_cartpole.py`, and share a plot showing the total reward accross episodes in the `README.md`.
> Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in `reinforce_cartpole.py`, and share a plot showing the total reward accross episodes in the `README.md`. Also, share a file `reinforce_cartpole.pth` containing the learned weights. For saving and loading PyTorch models, check [this tutorial](https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-for-inference)
## Model Evaluation
Now that you have trained your model, it is time to evaluate its performance. Run it with rendering for a few trials and see if the policy is capable of completing the task.
> 🛠 **To be handed in**
> Implement a script which loads your saved model and use it to solve the cartpole enviroment. Run 100 evaluations and share the final success rate across all evaluations in the `README.md`. Share the code in `evaluate_reinforce_cartpole.py`.
## Familiarization with a complete RL pipeline: Application to training a robotic arm
......@@ -128,6 +144,7 @@ Stable-Baselines3 (SB3) is a high-level RL library that provides various algorit
```sh
pip install stable-baselines3
pip install stable-baselines3[extra]
pip install moviepy
```
......@@ -190,7 +207,7 @@ pip install panda-gym==3.0.7
#### Train, track, and share
Use the Stable-Baselines3 package to train A2C model on the `PandaReachJointsDense-v2` environment. 500k timesteps should be enough. Track the environment with Weights & Biases. Once the training is over, upload the trained model on the Hub.
Use the Stable-Baselines3 package to train A2C model on the `PandaReachJointsDense-v3` environment. 500k timesteps should be enough. Track the environment with Weights & Biases. Once the training is over, upload the trained model on the Hub.
> 🛠 **To be handed in**
> Share all the code in `a2c_sb3_panda_reach.py`. Share the link of the wandb run and the trained model in the `README.md` file.
......@@ -203,7 +220,7 @@ This tutorial may contain errors, inaccuracies, typos or areas for improvement.
Quentin Gallouédec
Updates by Léo Schneider, Emmanuel Dellandréa
Updates by Bruno Machado, Léo Schneider, Emmanuel Dellandréa
## License
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment