Skip to content
Snippets Groups Projects
Commit b5b834ae authored by Masmoudi Hamza's avatar Masmoudi Hamza
Browse files

Edit README.md

parent 4b619fd2
No related branches found
No related tags found
No related merge requests found
# TD 1 Reinforcement Learning
## Project Overview
This project is part of the TD 1 Reinforcement Learning assignment. It includes implementations of reinforcement learning algorithms to solve the CartPole-v1 environment and train a robotic arm using Panda-gym.
## Environment Setup
### Python Virtual Environment
To set up a Python virtual environment, use the following commands:
python -m venv venv source venv/bin/activate # On macOS/Linux venv\Scripts\activate # On Windows
bash
MSO 3.4 Apprentissage Automatique
# Installation
### Required Libraries
If you want to use this project, you can simply clone or fork the repository.
Install the necessary libraries with the following command:
First, don't forget to install [PyTorch](https://pytorch.org/get-started/locally).
pip install gymnasium stable-baselines3 wandb panda-gym torch matplotlib
Then, you'll need to install some packages for running the Python files:
csharp
## **reinforce_cartpole.py**
pip install gym==0.26.2 pip install pyglet==2.0.10 pip install pygame==2.5.2 pip install PyQt5
## Project Structure
The project is organized as follows:
.
├── README.md
├── a2c_sb3_cartpole.py
├── a2c_sb3_panda_reach.py
├── evaluate_reinforce_cartpole.py
├── reinforce_cartpole.py
├── reward_plot.png
├── script_hub.py
├── test.py
├── training_wandb.py
├── venv
└── wandb
## **a2c_sb3_cartpole.py**
pip install stable-baselines3 pip install moviepy pip install huggingface-sb3==2.3.1 pip install wandb tensorboard
shell
## **a2c_sb3_panda_reach.py**
pip install stable-baselines3 pip install moviepy pip install huggingface-sb3==2.3.1 pip install wandb tensorboard pip install panda-gym==3.0.7
## Experiment Tracking
### Weights & Biases
- **CartPole Experiment:** The link to the Weights & Biases run is not available due to the unavailability of the sharing feature.
- **Panda Reach Experiment:** The link to the Weights & Biases run is not available due to the unavailability of the sharing feature.
## Trained Models
### Hugging Face Hub
- **CartPole Model:** [Hugging Face Hub - CartPole Model](https://huggingface.co/whoshamza/a2c-cartpole/tree/main)
- **Panda Reach Model:** [Hugging Face Hub - Panda Reach Model](https://huggingface.co/whoshamza/panda-reach-model)
## Usage
### Running the CartPole Experiment
To run the CartPole experiment, use the following command:
python a2c_sb3_cartpole.py
# REINFORCE Algorithm
bash
The first part of this project consists of implementing the REINFORCE algorithm. The process follows these steps:
### **Setup the CartPole environment**
### **Setup the agent as a simple neural network with:**
### Running the Panda Reach Experiment
- One fully connected layer with 128 units and ReLU activation followed by a dropout layer
- One fully connected layer followed by softmax activation
To run the Panda Reach experiment, use the following command:
### **Repeat 500 times:**
python a2c_sb3_panda_reach.py
1. Reset the environment
2. Reset the buffer
3. Repeat until the end of the episode:
- Compute action probabilities
- Sample the action based on the probabilities and store its probability in the buffer
- Step the environment with the action
- Compute and store in the buffer the return using gamma=0.99
4. Normalize the return
5. Compute the policy loss as `-sum(log(prob) * return)`
6. Update the policy using an Adam optimizer and a learning rate of `5e-3`
yaml
If you want to run this experiment, execute the following file: [reinforce_cartpole.py](reinforce_cartpole.py).
You'll get a similar curve as this one:
![CartPole Reward Plot](Cartpole_WanDB.jpeg)
### Evaluating the CartPole Model
# CartPole & Stable-Baselines3
To evaluate the CartPole model, use the following command:
This section introduces the **Advantage Actor-Critic (A2C)** algorithm.
python evaluate_reinforce_cartpole.py
The CartPole environment is solved with A2C in the following script: [a2c_sb3_cartpole.py](a2c_sb3_cartpole.py)
shell
You can download the trained model from **Hugging Face Hub** [here](https://huggingface.co/whoshamza/a2c-cartpole/tree/main).
If you want to analyze how the model behaves, check out **Weights & Biases** logs:
![CartPole Weights & Biases](Cartpole_WanDB.jpeg)
## Results
# Full Workflow with Panda-Gym
### Reward Plot
Now, we use **panda-gym** (specifically the **PandaReach-v1** environment) for training a robotic arm.
The reward plot is shown below:
You can run the experiment using the following script: [a2c_sb3_panda_reach.py](a2c_sb3_panda_reach.py).
![Reward Plot](reward_plot.png)
The trained model is available on **Hugging Face Hub** [here](https://huggingface.co/whoshamza/panda-reach-model).
## License
You can also visualize experiment tracking with **Weights & Biases** using this image:
This project is licensed under the MIT License.
\ No newline at end of file
![Panda Reach Tracking](Panda_Reach_WanDB.jpeg)
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment