From d914e5f2696879a45cf8486be01facd77f67d019 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Quentin=20GALLOU=C3=89DEC?= <gallouedec.quentin@gmail.com> Date: Fri, 3 Feb 2023 13:07:00 +0100 Subject: [PATCH] total reward and A2C for cartpole --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index e111f2e..afdc178 100644 --- a/README.md +++ b/README.md @@ -63,7 +63,7 @@ Repeat 500 times: Update the policy using an Adam optimizer and a learning rate of 5e-3 ``` -🛠Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in `reinforce.py`, and share a plot showing the return accross episodes in the `README.md`. +🛠Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in `reinforce.py`, and share a plot showing the total reward accross episodes in the `README.md`. ## Familiarization with a complete RL pipeline: Application to training a robotic arm @@ -83,7 +83,7 @@ pip install stable-baselines3[extra] #### Usage -Use the Stable-Baselines3 documentation and implement a code to solve the CartPole environment. +Use the Stable-Baselines3 documentation and implement code to solve the CartPole environment with the Advantage Actor-Critic (A2C) algorithm. 🛠Store the code in `cartpole_sb3.py`. Unless otherwise state, you'll work upon this file for the next sections. -- GitLab