Skip to content
Snippets Groups Projects
Commit d08231bc authored by number_cruncher's avatar number_cruncher
Browse files

graphic

parent 671cde51
No related branches found
No related tags found
No related merge requests found
...@@ -12,6 +12,8 @@ The REINFORCE algorithm (also known as Vanilla Policy Gradient) is a policy grad ...@@ -12,6 +12,8 @@ The REINFORCE algorithm (also known as Vanilla Policy Gradient) is a policy grad
> 🛠 **To be handed in** > 🛠 **To be handed in**
> Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in `reinforce_cartpole.py`, and share a plot showing the total reward accross episodes in the `README.md`. Also, share a file `reinforce_cartpole.pth` containing the learned weights. For saving and loading PyTorch models, check [this tutorial](https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-for-inference) > Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in `reinforce_cartpole.py`, and share a plot showing the total reward accross episodes in the `README.md`. Also, share a file `reinforce_cartpole.pth` containing the learned weights. For saving and loading PyTorch models, check [this tutorial](https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-for-inference)
![REINFORCE CartPole](reinforce_cartpole_dr_0.5.png)
## Model Evaluation ## Model Evaluation
...@@ -23,8 +25,6 @@ Now that you have trained your model, it is time to evaluate its performance. Ru ...@@ -23,8 +25,6 @@ Now that you have trained your model, it is time to evaluate its performance. Ru
From the openai gym wiki we know that the environment counts as solved when the average reward is greater or equal to 195 for over 100 consecutuve trials. From the openai gym wiki we know that the environment counts as solved when the average reward is greater or equal to 195 for over 100 consecutuve trials.
From the evaluation script i used the success rate is 1.0 when we allow the maximum number of steps the environment offers. From the evaluation script i used the success rate is 1.0 when we allow the maximum number of steps the environment offers.
![REINFORCE CartPole](reinforce_cartpole_dr_0.5.png)
## Familiarization with a complete RL pipeline: Application to training a robotic arm ## Familiarization with a complete RL pipeline: Application to training a robotic arm
Stable-Baselines3 (SB3) is a high-level RL library that provides various algorithms and integrated tools to easily train and test reinforcement learning models. Stable-Baselines3 (SB3) is a high-level RL library that provides various algorithms and integrated tools to easily train and test reinforcement learning models.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment