final commit for read me

e02f3a6e · LaShelb · f59e840f · e02f3a6e
Commit e02f3a6e authored 4 months ago by LaShelb
--- a/README.md
+++ b/README.md
-# TD1-Reinforcement-learning
+# Hands-On Reinforcement Learning

-## REINFORCE Implementation for CartPole
+This repository contains my implementation of various reinforcement learning algorithms and their application to different environments, as part of the MSO 3.4 Apprentissage Automatique course.

-This repository contains an implementation of the REINFORCE algorithm (Monte Carlo Policy Gradient) to solve the CartPole-v1 environment from OpenAI Gym.
+## 1. REINFORCE Implementation for CartPole

-### Implementation Details
+### Algorithm Overview
+I implemented the REINFORCE algorithm (Monte Carlo Policy Gradient) to solve the CartPole-v1 environment from OpenAI Gym. The algorithm directly optimizes the policy using gradient ascent on the expected return.

-The implementation consists of:
+### Implementation Details
+My implementation includes:

 1. A simple policy network with:
   - Input layer (4 units for state space)
@@ -20,27 +22,48 @@ The implementation consists of:
   - Includes return normalization for training stability

 ### Training Results
-
 The agent was trained for 500 episodes. The plot below shows the total reward obtained in each episode during training:

 ![Training Plot](cartpole_first/training_plot.png)

-### Files
-
- `reinforce_cartpole.py`: Contains the implementation of the policy network and REINFORCE algorithm
- `reinforce_cartpole.pth`: Saved model weights after training
- `training_plot.png`: Visualization of the training progress
-
 ### Evaluation Results
-After training, the agent was evaluated on 100 episodes:
+After evaluation on 100 episodes, the agent achieved:
 - Success Rate: 100.00%
 - Average Reward: 498.60

-### HuggingFace Model
+## 2. A2C Implementation with Stable-Baselines3
+
+For the second part of the project, I used Stable-Baselines3 to implement the Advantage Actor-Critic (A2C) algorithm for both the CartPole environment and a robotic arm simulation.
+
+### CartPole with A2C
+I used the A2C implementation from Stable-Baselines3 to train an agent on the CartPole-v1 environment. The implementation leverages the high-level API provided by SB3 for efficient training and evaluation.
+
+### External Tools Integration
+
+#### Hugging Face Hub
+The trained A2C model for CartPole is available on Hugging Face:
 https://huggingface.co/SimRams/a2c_sb3_cartpole

-### Wandb link
+#### Weights & Biases
+Training progress was tracked using Weights & Biases:
 https://wandb.ai/sim-ramos01-centrale-lyon/sb3/runs/bv67u8pe?nw=nwusersimramos01

-### Disclaimer about Panda_gym
-For an unknown reason, I could not download and use panda_gym. So I just put the code in a2c_sb3_panda_reach.py, but I don't have any way to test it.
\ No newline at end of file
+## 3. Robotic Arm Training
+
+### Panda-gym Implementation
+I prepared code to train an A2C agent on the PandaReachJointsDense-v3 environment, which involves controlling a robotic arm to reach points in 3D space.
+
+**Note:** I encountered technical issues with installing and running panda_gym. While the implementation code is included in `a2c_sb3_panda_reach.py`, I was unable to test it. The code is structured to:
+1. Set up the environment with the proper configuration
+2. Train the A2C model with 500k timesteps
+3. Track training with Weights & Biases
+4. Upload the trained model to Hugging Face Hub
+
+## Files
+
+- `reinforce_cartpole.py`: Implementation of the REINFORCE algorithm
+- `reinforce_cartpole.pth`: Saved model weights after training
+- `evaluate_reinforce_cartpole.py`: Script for evaluating the trained REINFORCE model
+- `a2c_sb3_cartpole.py`: Implementation using Stable-Baselines3 A2C
+- `wandb_cartpole.py`: Integration with Weights & Biases
+- `a2c_sb3_panda_reach.py`: Code for training a robotic arm (not tested)
\ No newline at end of file