diff --git a/README.md b/README.md index adf23a7d35a03d97eb70c299e1a55f84366f58c5..a6421b0d31407f71156b90ec947ca5788ba8e397 100644 --- a/README.md +++ b/README.md @@ -9,6 +9,7 @@ The python script used is: reinforce_cartpole.py. ## Familiarization with a complete RL pipeline: Application to training a robotic arm ### Stable-Baselines3 and HuggingFace In this section, the Stable-Baselines3 package is used to solve the Cartpole with the Advantage Actor-Critic (A2C) algorithm. + The python code used is: a2c_sb3_cartpole.py. The trained model is shared on HuggingFace, available on the following link: https://huggingface.co/oscarchaufour/a2c-CartPole-v1 @@ -16,8 +17,14 @@ The trained model is shared on HuggingFace, available on the following link: htt ### Weights & Biases The Weights & Biases package is used to visualize the taining and the performances of a model. The link to the run visualization on WandB is: https://wandb.ai/oscar-chaufour/a2c-cartpole-v1?workspace=user-oscar-chaufour +The evolution of certain metrics during the training can be visualized. For example the policy loss for at each step can be seen below:  + + + ### Full workflow with panda-gym -The full training-visualization-sharing workflow is applied to the PandaReachJointsDense environment. +The full training-visualization-sharing workflow is applied to the PandaReachJointsDense environment. It appears that the PandaReachJointsDense-v2 environment is not known and could not be used (NameNotFound: Environment PandaReachJointsDense doesn't exist.) + + diff --git a/images/policy_loss_a2c_cartople.png b/images/policy_loss_a2c_cartople.png new file mode 100644 index 0000000000000000000000000000000000000000..510aa0452b16bf00d56b2898cdd65fabaf8d2588 Binary files /dev/null and b/images/policy_loss_a2c_cartople.png differ