Skip to content
Snippets Groups Projects
Commit e02f3a6e authored by LaShelb's avatar LaShelb
Browse files

final commit for read me

parent f59e840f
No related branches found
No related tags found
No related merge requests found
# TD1-Reinforcement-learning
# Hands-On Reinforcement Learning
## REINFORCE Implementation for CartPole
This repository contains my implementation of various reinforcement learning algorithms and their application to different environments, as part of the MSO 3.4 Apprentissage Automatique course.
This repository contains an implementation of the REINFORCE algorithm (Monte Carlo Policy Gradient) to solve the CartPole-v1 environment from OpenAI Gym.
## 1. REINFORCE Implementation for CartPole
### Implementation Details
### Algorithm Overview
I implemented the REINFORCE algorithm (Monte Carlo Policy Gradient) to solve the CartPole-v1 environment from OpenAI Gym. The algorithm directly optimizes the policy using gradient ascent on the expected return.
The implementation consists of:
### Implementation Details
My implementation includes:
1. A simple policy network with:
- Input layer (4 units for state space)
......@@ -20,27 +22,48 @@ The implementation consists of:
- Includes return normalization for training stability
### Training Results
The agent was trained for 500 episodes. The plot below shows the total reward obtained in each episode during training:
![Training Plot](cartpole_first/training_plot.png)
### Files
- `reinforce_cartpole.py`: Contains the implementation of the policy network and REINFORCE algorithm
- `reinforce_cartpole.pth`: Saved model weights after training
- `training_plot.png`: Visualization of the training progress
### Evaluation Results
After training, the agent was evaluated on 100 episodes:
After evaluation on 100 episodes, the agent achieved:
- Success Rate: 100.00%
- Average Reward: 498.60
### HuggingFace Model
## 2. A2C Implementation with Stable-Baselines3
For the second part of the project, I used Stable-Baselines3 to implement the Advantage Actor-Critic (A2C) algorithm for both the CartPole environment and a robotic arm simulation.
### CartPole with A2C
I used the A2C implementation from Stable-Baselines3 to train an agent on the CartPole-v1 environment. The implementation leverages the high-level API provided by SB3 for efficient training and evaluation.
### External Tools Integration
#### Hugging Face Hub
The trained A2C model for CartPole is available on Hugging Face:
https://huggingface.co/SimRams/a2c_sb3_cartpole
### Wandb link
#### Weights & Biases
Training progress was tracked using Weights & Biases:
https://wandb.ai/sim-ramos01-centrale-lyon/sb3/runs/bv67u8pe?nw=nwusersimramos01
### Disclaimer about Panda_gym
For an unknown reason, I could not download and use panda_gym. So I just put the code in a2c_sb3_panda_reach.py, but I don't have any way to test it.
\ No newline at end of file
## 3. Robotic Arm Training
### Panda-gym Implementation
I prepared code to train an A2C agent on the PandaReachJointsDense-v3 environment, which involves controlling a robotic arm to reach points in 3D space.
**Note:** I encountered technical issues with installing and running panda_gym. While the implementation code is included in `a2c_sb3_panda_reach.py`, I was unable to test it. The code is structured to:
1. Set up the environment with the proper configuration
2. Train the A2C model with 500k timesteps
3. Track training with Weights & Biases
4. Upload the trained model to Hugging Face Hub
## Files
- `reinforce_cartpole.py`: Implementation of the REINFORCE algorithm
- `reinforce_cartpole.pth`: Saved model weights after training
- `evaluate_reinforce_cartpole.py`: Script for evaluating the trained REINFORCE model
- `a2c_sb3_cartpole.py`: Implementation using Stable-Baselines3 A2C
- `wandb_cartpole.py`: Integration with Weights & Biases
- `a2c_sb3_panda_reach.py`: Code for training a robotic arm (not tested)
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment