diff --git a/README.md b/README.md index 78dc2fddbd906d44c011d1907ac856d58e4eafd8..8704f773fa1a549e164861691238486d4120e3a2 100644 --- a/README.md +++ b/README.md @@ -1,31 +1,49 @@ # TD 1 : Hands-On Reinforcement Learning + This TD introduces different algorithms, frameworks and tools used in Reinforcement Learning. The methods are applied to the robotic field: a Cartpole and the PandaReachJointsDense environment. ## REINFORCE + The REINFORCE algorithm is used to solve the Cartpole environment. The plot showing the total reward accross episodes can be seen below:  The python script used is: reinforce_cartpole.py. ## Familiarization with a complete RL pipeline: Application to training a robotic arm + ### Stable-Baselines3 and HuggingFace + In this section, the Stable-Baselines3 package is used to solve the Cartpole with the Advantage Actor-Critic (A2C) algorithm. -The python code used is: a2c_sb3_cartpole.py. +The python code used is: ```a2c_sb3_cartpole.py```. The trained model is shared on HuggingFace, available on the following link: https://huggingface.co/oscarchaufour/a2c-CartPole-v1 ### Weights & Biases + The Weights & Biases package is used to visualize the taining and the performances of a model. The link to the run visualization on WandB is: https://wandb.ai/oscar-chaufour/a2c-cartpole-v1?workspace=user-oscar-chaufour -The evolution of certain metrics during the training can be visualized. For example the policy loss for at each step can be seen below:  +The evolution of certain metrics during the training can be visualized. For example the policy loss at each step can be seen below:  The policy loss follows a decreasing trend, which is coherent to the model learning during the training phase. ### Full workflow with panda-gym + The full training-visualization-sharing workflow is applied to the PandaReachJointsDense environment. It appears that the PandaReachJointsDense-v2 environment is not known and could not be used (NameNotFound: Environment PandaReachJointsDense doesn't exist.) +## Contribute + +This tutorial may contain errors, inaccuracies, typos or areas for improvement. Feel free to contribute to its improvement by opening an issue. + +## Author + +Oscar Chaufour + +## License + +MIT +