Edit README.md

b5b834ae · Masmoudi Hamza · 4b619fd2 · b5b834ae
Commit b5b834ae authored 4 months ago by Masmoudi Hamza
--- a/README.md
+++ b/README.md
-
 # TD 1 Reinforcement Learning

-## Project Overview
-
-This project is part of the TD 1 Reinforcement Learning assignment. It includes implementations of reinforcement learning algorithms to solve the CartPole-v1 environment and train a robotic arm using Panda-gym.
-
-## Environment Setup
-
-### Python Virtual Environment
-
-To set up a Python virtual environment, use the following commands:
-
-python -m venv venv source venv/bin/activate # On macOS/Linux venv\Scripts\activate # On Windows
-
-bash
-
+MSO 3.4 Apprentissage Automatique

+# Installation

-### Required Libraries
+If you want to use this project, you can simply clone or fork the repository.

-Install the necessary libraries with the following command:
+First, don't forget to install [PyTorch](https://pytorch.org/get-started/locally).

-pip install gymnasium stable-baselines3 wandb panda-gym torch matplotlib
+Then, you'll need to install some packages for running the Python files:

-csharp
+## **reinforce_cartpole.py**
+pip install gym==0.26.2 pip install pyglet==2.0.10 pip install pygame==2.5.2 pip install PyQt5



-## Project Structure

-The project is organized as follows:

-.
-├── README.md
-├── a2c_sb3_cartpole.py
-├── a2c_sb3_panda_reach.py
-├── evaluate_reinforce_cartpole.py
-├── reinforce_cartpole.py
-├── reward_plot.png
-├── script_hub.py
-├── test.py
-├── training_wandb.py
-├── venv
-└── wandb
+## **a2c_sb3_cartpole.py**
+pip install stable-baselines3 pip install moviepy pip install huggingface-sb3==2.3.1 pip install wandb tensorboard

+shell



+## **a2c_sb3_panda_reach.py**
+pip install stable-baselines3 pip install moviepy pip install huggingface-sb3==2.3.1 pip install wandb tensorboard pip install panda-gym==3.0.7

-## Experiment Tracking
-
-### Weights & Biases
-
- **CartPole Experiment:** The link to the Weights & Biases run is not available due to the unavailability of the sharing feature.  
- **Panda Reach Experiment:** The link to the Weights & Biases run is not available due to the unavailability of the sharing feature.  
-
-## Trained Models
-
-### Hugging Face Hub
-
- **CartPole Model:** [Hugging Face Hub - CartPole Model](https://huggingface.co/whoshamza/a2c-cartpole/tree/main)  
- **Panda Reach Model:** [Hugging Face Hub - Panda Reach Model](https://huggingface.co/whoshamza/panda-reach-model)   

-## Usage

-### Running the CartPole Experiment

-To run the CartPole experiment, use the following command:

-python a2c_sb3_cartpole.py
+# REINFORCE Algorithm

-bash
+The first part of this project consists of implementing the REINFORCE algorithm. The process follows these steps:

+### **Setup the CartPole environment**

+### **Setup the agent as a simple neural network with:**

-### Running the Panda Reach Experiment
+- One fully connected layer with 128 units and ReLU activation followed by a dropout layer
+- One fully connected layer followed by softmax activation

-To run the Panda Reach experiment, use the following command:
+### **Repeat 500 times:**

-python a2c_sb3_panda_reach.py
+1. Reset the environment
+2. Reset the buffer
+3. Repeat until the end of the episode:
+    - Compute action probabilities 
+    - Sample the action based on the probabilities and store its probability in the buffer 
+    - Step the environment with the action
+    - Compute and store in the buffer the return using gamma=0.99 
+4. Normalize the return
+5. Compute the policy loss as `-sum(log(prob) * return)`
+6. Update the policy using an Adam optimizer and a learning rate of `5e-3`

-yaml
+If you want to run this experiment, execute the following file: [reinforce_cartpole.py](reinforce_cartpole.py).

+You'll get a similar curve as this one:

+![CartPole Reward Plot](Cartpole_WanDB.jpeg)

-### Evaluating the CartPole Model
+# CartPole & Stable-Baselines3

-To evaluate the CartPole model, use the following command:
+This section introduces the **Advantage Actor-Critic (A2C)** algorithm.

-python evaluate_reinforce_cartpole.py
+The CartPole environment is solved with A2C in the following script: [a2c_sb3_cartpole.py](a2c_sb3_cartpole.py)

-shell
+You can download the trained model from **Hugging Face Hub** [here](https://huggingface.co/whoshamza/a2c-cartpole/tree/main).

+If you want to analyze how the model behaves, check out **Weights & Biases** logs:

+![CartPole Weights & Biases](Cartpole_WanDB.jpeg)

-## Results
+# Full Workflow with Panda-Gym

-### Reward Plot
+Now, we use **panda-gym** (specifically the **PandaReach-v1** environment) for training a robotic arm.

-The reward plot is shown below:
+You can run the experiment using the following script: [a2c_sb3_panda_reach.py](a2c_sb3_panda_reach.py).

-![Reward Plot](reward_plot.png)
+The trained model is available on **Hugging Face Hub** [here](https://huggingface.co/whoshamza/panda-reach-model).

-## License
+You can also visualize experiment tracking with **Weights & Biases** using this image:

-This project is licensed under the MIT License.
\ No newline at end of file
+![Panda Reach Tracking](Panda_Reach_WanDB.jpeg)
\ No newline at end of file