From 936f260f5af1d7fe9296faa257738a7d0c52530d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Quentin=20GALLOU=C3=89DEC?= <gallouedec.quentin@gmail.com> Date: Fri, 3 Feb 2023 11:59:16 +0100 Subject: [PATCH] only store the chosen action prob in reinforce --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 298799b..e111f2e 100644 --- a/README.md +++ b/README.md @@ -54,8 +54,8 @@ Repeat 500 times: Reset the environment Reset the buffer Repeat until the end of the episode: - Compute and store in the buffer the action probabilities - Sample the action based on the probabilities + Compute action probabilities + Sample the action based on the probabilities and store its probability in the buffer Step the environment with the action Compute and store in the buffer the return using gamma=0.99 Normalize the return -- GitLab