# Rendu TD 1 : Hands-On Reinforcement Learning

MSO 3.4 Apprentissage Automatique

Corentin GEREST

## Installation
Nécéssite Python version 3.9 ou ultérieure.

Afin de faire fonctionner les fichiers python suivants, vous devrez au préalable installer les bibliothèques suivantes à l'aide de pip:

```python
    pip install torch torchvision torchaudio
    pip install gym==0.26.2
    pip install pyglet==2.0.10
    pip install pygame==2.5.2
    pip install PyQt5
    pip install stable-baselines3
    pip install moviepy
    pip install huggingface-sb3==2.3.1
    pip install wandb tensorboard
    pip install panda-gym==3.0.7
```
## Liste des fichiers
Ce repo contient les fichiers suivants:
- ``reinforce_cartpole.py``
- ``a2c_sb3_cartpole.py``
- ``push_model_HF.py``
- ``train_wb.py``
- ``a2c_sb3_panda_reach.py``
- ``rewards_cartpole.png``

## Utilisation
Pour exécuter chaque fichier, vous pouver utiliser un terminal. avec la commande
```terminal
    python 'nom du fichier'
```

Dans le 1er fichier: reinforce_cartpole.py, on implémente la méthode REINFORCE sur l'environnement CartPole-v1 et on visualise l'évolution du reward au cours des itérations (cf rewards_cartpole.png).

![Image](rewards_cartpole.png)

Dans le 2ème fichier: a2c_sb3_cartpole.py, on se familiarise avec le package Stable-Baselines3 qui fournit des outils intégrés, et on l'utilise pour résoudre l'environnement CartPole avec l'algo A2C (Advantage Actor-Critic). 
En plus d'avoir accès à l'évolution du reward au cours des itérations, ce script sauvegarde un modèle sous le nom [insérer nom final].
Afin d'upload ce modèle sur Hugging Face, j'ai utilisé le court script 'push_model_HF.py'.
Le modèle entrainé est retrouvable ici: https://huggingface.co/CorentinGst/Cartpolev1/tree/main/a2c_cartpole_model.

Dans le 3ème fichier 'train_wb.py', on track l'entraînement sur l'environnement CartPole via Weights and Biases (W&B). Le résultat est visible ici: https://wandb.ai/corentin-ge/wandb_test_cartpole/runs/ybbl1bih?workspace=user-corentin-ge

Finalement, en mettant bout à bout ces différentes étapes, on peut constituer un flux de travail complet sur un nouvel environnement: PandaReachJointsDense.

HF: 

W&B: https://wandb.ai/corentin-ge/a2c_sb3_panda_reach/runs/pqlrv40v?workspace=user-corentin-ge


## Crédits/Citation
## License