diff --git a/README.md b/README.md index 4e23205ac37cc6f8dc8fc1c912b5571c70c9086d..2e5a480c4e7bee3afc041530b5cf878c763aab0b 100644 --- a/README.md +++ b/README.md @@ -4,13 +4,15 @@ Thomas DESGREYS ### Training see [reinforce_cartpole.py](reinforce_cartpole.py) -The model is trained and as save as "reinforce_cartpole_best.pth" and the evolutions of loss and score (aka reward) +The model is trained and saved as [reinforce_cartpole_best.pth](reinforce_cartpole_best.pth) and the evolutions of loss and score (aka reward) through the episodes are shown below. +   + These graphics point out the instability of this training algorithm. Although, with a bit of luck we end up with a model that reaches the max steps permitted by this gym environment -(500 steps) +(500 steps). ### Evaluation