Skip to content
Snippets Groups Projects
Commit 936f260f authored by Quentin GALLOUÉDEC's avatar Quentin GALLOUÉDEC
Browse files

only store the chosen action prob in reinforce

parent 1c50ff9b
Branches
No related tags found
No related merge requests found
......@@ -54,8 +54,8 @@ Repeat 500 times:
Reset the environment
Reset the buffer
Repeat until the end of the episode:
Compute and store in the buffer the action probabilities
Sample the action based on the probabilities
Compute action probabilities
Sample the action based on the probabilities and store its probability in the buffer
Step the environment with the action
Compute and store in the buffer the return using gamma=0.99
Normalize the return
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment