diff --git a/README.md b/README.md index 4f3241bd449c4b33deef3e45f914e85f0a0696b0..f746d64ab75a8fbdd1d70f459aab64d67ce0850d 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,14 @@ -# TD1-Reinforcement-learning +# Hands-On Reinforcement Learning -## REINFORCE Implementation for CartPole +This repository contains my implementation of various reinforcement learning algorithms and their application to different environments, as part of the MSO 3.4 Apprentissage Automatique course. -This repository contains an implementation of the REINFORCE algorithm (Monte Carlo Policy Gradient) to solve the CartPole-v1 environment from OpenAI Gym. +## 1. REINFORCE Implementation for CartPole -### Implementation Details +### Algorithm Overview +I implemented the REINFORCE algorithm (Monte Carlo Policy Gradient) to solve the CartPole-v1 environment from OpenAI Gym. The algorithm directly optimizes the policy using gradient ascent on the expected return. -The implementation consists of: +### Implementation Details +My implementation includes: 1. A simple policy network with: - Input layer (4 units for state space) @@ -20,27 +22,48 @@ The implementation consists of: - Includes return normalization for training stability ### Training Results - The agent was trained for 500 episodes. The plot below shows the total reward obtained in each episode during training:  -### Files - -- `reinforce_cartpole.py`: Contains the implementation of the policy network and REINFORCE algorithm -- `reinforce_cartpole.pth`: Saved model weights after training -- `training_plot.png`: Visualization of the training progress - ### Evaluation Results -After training, the agent was evaluated on 100 episodes: +After evaluation on 100 episodes, the agent achieved: - Success Rate: 100.00% - Average Reward: 498.60 -### HuggingFace Model +## 2. A2C Implementation with Stable-Baselines3 + +For the second part of the project, I used Stable-Baselines3 to implement the Advantage Actor-Critic (A2C) algorithm for both the CartPole environment and a robotic arm simulation. + +### CartPole with A2C +I used the A2C implementation from Stable-Baselines3 to train an agent on the CartPole-v1 environment. The implementation leverages the high-level API provided by SB3 for efficient training and evaluation. + +### External Tools Integration + +#### Hugging Face Hub +The trained A2C model for CartPole is available on Hugging Face: https://huggingface.co/SimRams/a2c_sb3_cartpole -### Wandb link +#### Weights & Biases +Training progress was tracked using Weights & Biases: https://wandb.ai/sim-ramos01-centrale-lyon/sb3/runs/bv67u8pe?nw=nwusersimramos01 -### Disclaimer about Panda_gym -For an unknown reason, I could not download and use panda_gym. So I just put the code in a2c_sb3_panda_reach.py, but I don't have any way to test it. \ No newline at end of file +## 3. Robotic Arm Training + +### Panda-gym Implementation +I prepared code to train an A2C agent on the PandaReachJointsDense-v3 environment, which involves controlling a robotic arm to reach points in 3D space. + +**Note:** I encountered technical issues with installing and running panda_gym. While the implementation code is included in `a2c_sb3_panda_reach.py`, I was unable to test it. The code is structured to: +1. Set up the environment with the proper configuration +2. Train the A2C model with 500k timesteps +3. Track training with Weights & Biases +4. Upload the trained model to Hugging Face Hub + +## Files + +- `reinforce_cartpole.py`: Implementation of the REINFORCE algorithm +- `reinforce_cartpole.pth`: Saved model weights after training +- `evaluate_reinforce_cartpole.py`: Script for evaluating the trained REINFORCE model +- `a2c_sb3_cartpole.py`: Implementation using Stable-Baselines3 A2C +- `wandb_cartpole.py`: Integration with Weights & Biases +- `a2c_sb3_panda_reach.py`: Code for training a robotic arm (not tested) \ No newline at end of file