update

de3fdf11 · oscarchaufour · f4bf21a6 · de3fdf11 · de3fdf11
Commit de3fdf11 authored 1 year ago by oscarchaufour
--- a/README.md
+++ b/README.md
@@ -9,6 +9,7 @@ The python script used is: reinforce_cartpole.py.
 ## Familiarization with a complete RL pipeline: Application to training a robotic arm
 ### Stable-Baselines3 and HuggingFace
 In this section, the Stable-Baselines3 package is used to solve the Cartpole with the Advantage Actor-Critic (A2C) algorithm.
+
 The python code used is: a2c_sb3_cartpole.py.

 The trained model is shared on HuggingFace, available on the following link: https://huggingface.co/oscarchaufour/a2c-CartPole-v1
@@ -16,8 +17,14 @@ The trained model is shared on HuggingFace, available on the following link: htt
 ### Weights & Biases
 The Weights & Biases package is used to visualize the taining and the performances of a model. The link to the run visualization on WandB is: https://wandb.ai/oscar-chaufour/a2c-cartpole-v1?workspace=user-oscar-chaufour

+The evolution of certain metrics during the training can be visualized. For example the policy loss for at each step can be seen below: ![Alt text](images/policy_loss_a2c_cartpole.png)
+
+
+
 ### Full workflow with panda-gym
-The full training-visualization-sharing workflow is applied to the PandaReachJointsDense environment.
+The full training-visualization-sharing workflow is applied to the PandaReachJointsDense environment. It appears that the PandaReachJointsDense-v2 environment is not known and could not be used (NameNotFound: Environment PandaReachJointsDense doesn't exist.)
+
+




--- a/images/policy_loss_a2c_cartople.png
+++ b/images/policy_loss_a2c_cartople.png