Update README.md

fd812c2a · oscarchaufour · d50300a5 · fd812c2a
Commit fd812c2a authored 1 year ago by oscarchaufour
--- a/README.md
+++ b/README.md
 # TD 1 : Hands-On Reinforcement Learning
+
 This TD introduces different algorithms, frameworks and tools used in Reinforcement Learning. The methods are applied to the robotic field: a Cartpole and the PandaReachJointsDense environment. 

 ## REINFORCE
+
 The REINFORCE algorithm is used to solve the Cartpole environment. The plot showing the total reward accross episodes can be seen below: ![Alt text](images/reinforce_rewards.png)

 The python script used is: reinforce_cartpole.py.

 ## Familiarization with a complete RL pipeline: Application to training a robotic arm
+
 ### Stable-Baselines3 and HuggingFace
+
 In this section, the Stable-Baselines3 package is used to solve the Cartpole with the Advantage Actor-Critic (A2C) algorithm.

-The python code used is: a2c_sb3_cartpole.py.
+The python code used is: ```a2c_sb3_cartpole.py```.

 The trained model is shared on HuggingFace, available on the following link: https://huggingface.co/oscarchaufour/a2c-CartPole-v1

 ### Weights & Biases
+
 The Weights & Biases package is used to visualize the taining and the performances of a model. The link to the run visualization on WandB is: https://wandb.ai/oscar-chaufour/a2c-cartpole-v1?workspace=user-oscar-chaufour

-The evolution of certain metrics during the training can be visualized. For example the policy loss for at each step can be seen below: ![Alt text](images/policy_loss_a2c_cartpole.png) 
+The evolution of certain metrics during the training can be visualized. For example the policy loss at each step can be seen below: ![Alt text](images/policy_loss_a2c_cartpole.png) 

 The policy loss follows a decreasing trend, which is coherent to the model learning during the training phase.



 ### Full workflow with panda-gym
+
 The full training-visualization-sharing workflow is applied to the PandaReachJointsDense environment. It appears that the PandaReachJointsDense-v2 environment is not known and could not be used (NameNotFound: Environment PandaReachJointsDense doesn't exist.)

+## Contribute
+
+This tutorial may contain errors, inaccuracies, typos or areas for improvement. Feel free to contribute to its improvement by opening an issue.
+
+## Author
+
+Oscar Chaufour
+
+## License
+
+MIT
+