Skip to content
Snippets Groups Projects
Commit fd812c2a authored by oscarchaufour's avatar oscarchaufour
Browse files

Update README.md

parent d50300a5
No related branches found
No related tags found
No related merge requests found
# TD 1 : Hands-On Reinforcement Learning
This TD introduces different algorithms, frameworks and tools used in Reinforcement Learning. The methods are applied to the robotic field: a Cartpole and the PandaReachJointsDense environment.
## REINFORCE
The REINFORCE algorithm is used to solve the Cartpole environment. The plot showing the total reward accross episodes can be seen below: ![Alt text](images/reinforce_rewards.png)
The python script used is: reinforce_cartpole.py.
## Familiarization with a complete RL pipeline: Application to training a robotic arm
### Stable-Baselines3 and HuggingFace
In this section, the Stable-Baselines3 package is used to solve the Cartpole with the Advantage Actor-Critic (A2C) algorithm.
The python code used is: a2c_sb3_cartpole.py.
The python code used is: ```a2c_sb3_cartpole.py```.
The trained model is shared on HuggingFace, available on the following link: https://huggingface.co/oscarchaufour/a2c-CartPole-v1
### Weights & Biases
The Weights & Biases package is used to visualize the taining and the performances of a model. The link to the run visualization on WandB is: https://wandb.ai/oscar-chaufour/a2c-cartpole-v1?workspace=user-oscar-chaufour
The evolution of certain metrics during the training can be visualized. For example the policy loss for at each step can be seen below: ![Alt text](images/policy_loss_a2c_cartpole.png)
The evolution of certain metrics during the training can be visualized. For example the policy loss at each step can be seen below: ![Alt text](images/policy_loss_a2c_cartpole.png)
The policy loss follows a decreasing trend, which is coherent to the model learning during the training phase.
### Full workflow with panda-gym
The full training-visualization-sharing workflow is applied to the PandaReachJointsDense environment. It appears that the PandaReachJointsDense-v2 environment is not known and could not be used (NameNotFound: Environment PandaReachJointsDense doesn't exist.)
## Contribute
This tutorial may contain errors, inaccuracies, typos or areas for improvement. Feel free to contribute to its improvement by opening an issue.
## Author
Oscar Chaufour
## License
MIT
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment