diff --git a/README.md b/README.md
index 009ba30c10e7823edd2a650f7c05dd4784d19cf6..f0139ada38c263f2155c5ac930fcdbe5f35b0c21 100644
--- a/README.md
+++ b/README.md
@@ -1,210 +1,58 @@
-# TD 1 : Hands-On Reinforcement Learning
+# Rendu TD 1 : Hands-On Reinforcement Learning
 
 MSO 3.4 Apprentissage Automatique
 
-# 
+Corentin GEREST
 
-In this hands-on project, we will first implement a simple RL algorithm and apply it to solve the CartPole-v1 environment. Once we become familiar with the basic workflow, we will learn to use various tools for machine learning model training, monitoring, and sharing, by applying these tools to train a robotic arm.
+## Installation
+Nécéssite Python version 3.9 ou ultérieure.
 
-## To be handed in
-
-This work must be done individually. The expected output is a repository named `hands-on-rl` on https://gitlab.ec-lyon.fr. 
-
-We assume that `git` is installed, and that you are familiar with the basic `git` commands. (Optionnaly, you can use GitHub Desktop.)
-We also assume that you have access to the [ECL GitLab](https://gitlab.ec-lyon.fr/). If necessary, please consult [this tutorial](https://gitlab.ec-lyon.fr/edelland/inf_tc2/-/blob/main/Tutoriel_gitlab/tutoriel_gitlab.md).
-
-Your repository must contain a `README.md` file that explains **briefly** the successive steps of the project. It must be private, so you need to add your teacher as "developer" member.
-
-Throughout the subject, you will find a 🛠 symbol indicating that a specific production is expected.
-
-The last commit is due before 11:59 pm on March 5, 2024. Subsequent commits will not be considered.
-
-> ⚠️ **Warning**
-> Ensure that you only commit the files that are requested. For example, your directory should not contain the generated `.zip` files, nor the `runs` folder... At the end, your repository must contain one `README.md`, three python scripts, and optionally image files for the plots.
-
-## Before you start
-
-Make sure you know the basics of Reinforcement Learning. In case of need, you can refer to the [introduction of the Hugging Face RL course](https://huggingface.co/blog/deep-rl-intro).
-
-## Introduction to Gym
-
-[Gym](https://gymnasium.farama.org/) is a framework for developing and evaluating reinforcement learning environments. It offers various environments, including classic control and toy text scenarios, to test RL algorithms.
-
-### Installation
-
-We recommend to use Python virtual environnements to install the required modules : https://docs.python.org/3/library/venv.html
-
-First, install Pytorch : https://pytorch.org/get-started/locally.
-
-Then install the following modules :
-
-
-```sh
-pip install gym==0.26.2
-```
-
-Install also pyglet for the rendering.
-
-```sh
-pip install pyglet==2.0.10
-```
-
-If needed 
-
-```sh
-pip install pygame==2.5.2
-```
-
-```sh
-pip install PyQt5
-```
-
-
-### Usage
-
-Here is an example of how to use Gym to solve the `CartPole-v1` environment [Documentation](https://gymnasium.farama.org/environments/classic_control/cart_pole/):
+Afin de faire fonctionner les fichiers python suivants, vous devrez au préalable installer les bibliothèques suivantes à l'aide de pip:
 
 ```python
-import gym
-
-# Create the environment
-env = gym.make("CartPole-v1", render_mode="human")
-
-# Reset the environment and get the initial observation
-observation = env.reset()
-
-for _ in range(100):
-    # Select a random action from the action space
-    action = env.action_space.sample()
-    # Apply the action to the environment
-    # Returns next observation, reward, done signal (indicating
-    # if the episode has ended), and an additional info dictionary
-    observation, reward, terminated, truncated, info = env.step(action)
-    # Render the environment to visualize the agent's behavior
-    env.render()
-    if terminated: 
-        # Terminated before max step
-        break
-
-env.close()
+    pip install torch torchvision torchaudio
+    pip install gym==0.26.2
+    pip install pyglet==2.0.10
+    pip install pygame==2.5.2
+    pip install PyQt5
+    pip install stable-baselines3
+    pip install moviepy
+    pip install huggingface-sb3==2.3.1
+    pip install wandb tensorboard
+    pip install panda-gym==3.0.7
 ```
+## Liste des fichiers
+Ce repo contient les fichiers suivants:
+- ``reinforce_cartpole.py``
+- ``a2c_sb3_cartpole.py``
+- ``push_model_HF.py``
+- ``train_wb.py``
+- ``a2c_sb3_panda_reach.py``
+- ``rewards_cartpole.png``
 
-## REINFORCE
-
-The REINFORCE algorithm (also known as Vanilla Policy Gradient) is a policy gradient method that optimizes the policy directly using gradient descent. The following is the pseudocode of the REINFORCE algorithm:
-
-```txt
-Setup the CartPole environment
-Setup the agent as a simple neural network with:
-    - One fully connected layer with 128 units and ReLU activation followed by a dropout layer
-    - One fully connected layer followed by softmax activation
-Repeat 500 times:
-    Reset the environment
-    Reset the buffer
-    Repeat until the end of the episode:
-        Compute action probabilities 
-        Sample the action based on the probabilities and store its probability in the buffer 
-        Step the environment with the action
-        Compute and store in the buffer the return using gamma=0.99 
-    Normalize the return
-    Compute the policy loss as -sum(log(prob) * return)
-    Update the policy using an Adam optimizer and a learning rate of 5e-3
+## Utilisation
+Pour exécuter chaque fichier, vous pouver utiliser un terminal. avec la commande
+```terminal
+    python 'nom du fichier'
 ```
 
-To learn more about REINFORCE, you can refer to [this unit](https://huggingface.co/learn/deep-rl-course/unit4/introduction).
-
-> 🛠 **To be handed in**
-> Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in `reinforce_cartpole.py`, and share a plot showing the total reward accross episodes in the `README.md`.
-
-## Familiarization with a complete RL pipeline: Application to training a robotic arm
-
-In this section, you will use the Stable-Baselines3 package to train a robotic arm using RL. You'll get familiar with several widely-used tools for training, monitoring and sharing machine learning models.
-
-### Get familiar with Stable-Baselines3
-
-Stable-Baselines3 (SB3) is a high-level RL library that provides various algorithms and integrated tools to easily train and test reinforcement learning models.
-
-#### Installation
-
-```sh
-pip install stable-baselines3
-pip install moviepy
-```
-
-#### Usage
-
-Use the [Stable-Baselines3 documentation](https://stable-baselines3.readthedocs.io/en/master/) to implement the code to solve the CartPole environment with the Advantage Actor-Critic (A2C) algorithm.
-
-
-> 🛠 **To be handed in**
-> Store the code in `a2c_sb3_cartpole.py`. Unless otherwise stated, you'll work upon this file for the next sections.
-
-### Get familiar with Hugging Face Hub
-
-Hugging Face Hub is a platform for easy sharing and versioning of trained machine learning models. With Hugging Face Hub, you can quickly and easily share your models with others and make them usable through the API. For example, see the trained A2C agent for CartPole: https://huggingface.co/sb3/a2c-CartPole-v1. Hugging Face Hub provides an API to download and upload SB3 models.
-
-#### Installation of `huggingface_sb3`
-
-```sh
-pip install huggingface-sb3==2.3.1
-```
-
-#### Upload the model on the Hub
-
-Follow the [Hugging Face Hub documentation](https://huggingface.co/docs/hub/stable-baselines3) to upload the previously learned model to the Hub.
-
-> 🛠 **To be handed in**
-> Link the trained model in the `README.md` file.
-
-> 📝 **Note**
->  [RL-Zoo3](https://stable-baselines3.readthedocs.io/en/master/guide/rl_zoo.html) provides more advanced features to save hyperparameters, generate renderings and metrics. Feel free to try them.
-
-### Get familiar with Weights & Biases
-
-Weights & Biases (W&B) is a tool for machine learning experiment management. With W&B, you can track and compare your experiments, visualize your model training and performance.
-
-#### Installation
-
-You'll need to install both `wand` and `tensorboar`.
-
-```shell
-pip install wandb tensorboard
-```
-
-Use the documentation of [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) and [Weights & Biases](https://docs.wandb.ai/guides/integrations/stable-baselines-3) to track the CartPole training. Make the run public.
-
-🛠 Share the link of the wandb run in the `README.md` file.
-
-> ⚠️ **Warning**
-> Make sure to make the run public!
-
-### Full workflow with panda-gym
-
-[Panda-gym](https://github.com/qgallouedec/panda-gym) is a collection of environments for robotic simulation and control. It provides a range of challenges for training robotic agents in a simulated environment. In this section, you will get familiar with one of the environments provided by panda-gym, the `PandaReachJointsDense-v3`. The objective is to learn how to reach any point in 3D space by directly controlling the robot's articulations.
-
-#### Installation
-
-```shell
-pip install panda-gym==3.0.7
-```
-
-#### Train, track, and share
-
-Use the Stable-Baselines3 package to train A2C model on the `PandaReachJointsDense-v2` environment. 500k timesteps should be enough. Track the environment with Weights & Biases. Once the training is over, upload the trained model on the Hub.
+Dans le 1er fichier: reinforce_cartpole.py, on implémente la méthode REINFORCE sur l'environnement CartPole-v1 et on visualise l'évolution du reward au cours des itérations (cf rewards_cartpole.png).
 
-> 🛠 **To be handed in**
-> Share all the code in `a2c_sb3_panda_reach.py`. Share the link of the wandb run and the trained model in the `README.md` file.
+![Image](Reinforce_cartpole_rewards2.png)
 
-## Contribute
+Dans le 2ème fichier: a2c_sb3_cartpole.py, on se familiarise avec le package Stable-Baselines3 qui fournit des outils intégrés, et on l'utilise pour résoudre l'environnement CartPole avec l'algo A2C (Advantage Actor-Critic). 
+En plus d'avoir accès à l'évolution du reward au cours des itérations, ce script sauvegarde un modèle sous le nom [insérer nom final].
+Afin d'upload ce modèle sur Hugging Face, j'ai utilisé le court script 'push_model_HF.py'.
+Le modèle entrainé est retrouvable ici: https://huggingface.co/CorentinGst/Cartpolev1/tree/main/a2c_cartpole_model.
 
-This tutorial may contain errors, inaccuracies, typos or areas for improvement. Feel free to contribute to its improvement by opening an issue.
+Dans le 3ème fichier 'train_wb.py', je track l'entraînement sur l'environnement CartPole via Weights and Biases (W&B). Le résultat est visible ici: https://wandb.ai/corentin-ge/wandb_test_cartpole/runs/ybbl1bih?workspace=user-corentin-ge
 
-## Author
+Finalement, en mettant bout à bout ces différentes étapes, on peut constituer un flux de travail complet sur un nouvel environnement: PandaReachJointsDense.
 
-Quentin Gallouédec
+HF: 
 
-Updates by Léo Schneider, Emmanuel Dellandréa
+W&B: https://wandb.ai/corentin-ge/a2c_sb3_panda_reach/runs/pqlrv40v?workspace=user-corentin-ge
 
-## License
 
-MIT
+## Crédits/Citation
+## License
\ No newline at end of file
diff --git a/a2c_sb3_cartpole.py b/a2c_sb3_cartpole.py
new file mode 100644
index 0000000000000000000000000000000000000000..2e817e9b35ddbe1ab15f58fd6c506595c78361a5
--- /dev/null
+++ b/a2c_sb3_cartpole.py
@@ -0,0 +1,52 @@
+import gym
+from stable_baselines3 import A2C
+from stable_baselines3.common.env_util import make_vec_env
+from stable_baselines3.common.evaluation import evaluate_policy
+import matplotlib.pyplot as plt
+import numpy as np
+from tqdm import tqdm
+
+if __name__ == "__main__":
+    episodes = 500
+
+    # Create and wrap the CartPole environment
+    env = make_vec_env("CartPole-v1", n_envs=4)
+    
+    # Define the A2C model
+    model = A2C('MlpPolicy', env, verbose=1)
+
+    # Training loop
+    episode_rewards = []
+    for episode in tqdm(range(episodes)):
+        # Reinitialize environment and rewards for each episode
+        obs = env.reset()
+        episode_reward = 0
+
+        while True:
+            # Compute action probabilities
+            action, _states = model.predict(obs)
+            obs, reward, done, _ = env.step(action)
+            episode_reward += np.sum(reward)
+
+            # Break if any environment is done
+            if np.any(done):
+                break
+        
+        episode_rewards.append(episode_reward)
+
+        # Log progress
+        # print(f"Episode: {episode + 1}, Reward: {episode_reward_sum}")
+
+    # Save model
+    model.save("a2c_cartpole_model")
+
+    # Close environments
+    env.close()
+    
+    # Plot result
+    plt.plot(episode_rewards)
+    plt.xlabel('Episode')
+    plt.ylabel('Total reward')
+    plt.title('Evolution of reward')
+    plt.show()
+
diff --git a/a2c_sb3_panda_reach.py b/a2c_sb3_panda_reach.py
new file mode 100644
index 0000000000000000000000000000000000000000..ea9d1d39c9687710f6f7dbbc009f2d593dca3b9c
--- /dev/null
+++ b/a2c_sb3_panda_reach.py
@@ -0,0 +1,59 @@
+import gym
+from stable_baselines3 import A2C
+from stable_baselines3.common.monitor import Monitor
+from stable_baselines3.common.vec_env import DummyVecEnv
+import wandb
+import panda_gym
+from wandb.integration.sb3 import WandbCallback
+from huggingface_hub import login
+from huggingface_sb3 import push_to_hub
+
+
+if __name__ == "__main__":
+    # Log in HF
+    login()
+
+    # Initialize a new wandb run
+    # Configs
+    config = {
+        "policy_type": "MultiInputPolicy",
+        "total_timesteps": 500_000,
+        "env_name": "PandaReachJointsDense-v3",
+    }
+
+    # WB initialization
+    run = wandb.init(
+        project="a2c_sb3_panda_reach",
+        sync_tensorboard=True,
+        monitor_gym=True,
+    )
+
+    # WB callback
+    wandb_callback = WandbCallback(
+        gradient_save_freq=100,
+        model_save_path=f"models/{run.id}",
+        verbose=2,
+    )
+
+    env = gym.make("PandaReachJointsDense-v3")
+
+    model = A2C("MultiInputPolicy",
+        env,
+        verbose=1,
+        tensorboard_log=f"runs/{run.id}"
+    )
+    
+    model.learn(
+        total_timesteps=500_000,
+        callback=wandb_callback
+    )
+    model.save("PandaReachJointsDense_1.zip")
+
+    run.finish()
+
+    # Upload on HF
+    push_to_hub(
+        repo_id="CorentinGst/PandaReachJointsDense_1",
+        filename="PandaReachJointsDense_1.zip",
+        commit_message="Add my 1st model trained on PandaReachJointsDense-v3 env",
+    )
\ No newline at end of file
diff --git a/push_model_HF.py b/push_model_HF.py
new file mode 100644
index 0000000000000000000000000000000000000000..e0705f1452a7b65e73949f8448106168ee5e365c
--- /dev/null
+++ b/push_model_HF.py
@@ -0,0 +1,10 @@
+from huggingface_hub import login
+from huggingface_sb3 import push_to_hub
+
+login()
+
+push_to_hub(
+    repo_id="CorentinGst/CartPolev2",
+    filename="a2c_cartpole_model.zip",
+    commit_message="Test HF push API ",
+)
\ No newline at end of file
diff --git a/reinforce_cartpole.py b/reinforce_cartpole.py
new file mode 100644
index 0000000000000000000000000000000000000000..3fee45cfe9cb38244f8bed8a768fc073ab4d1872
--- /dev/null
+++ b/reinforce_cartpole.py
@@ -0,0 +1,101 @@
+import numpy as np
+from tqdm import tqdm
+import matplotlib.pyplot as plt
+
+import gym
+import torch
+import torch.nn as nn
+import torch.optim as optim
+
+
+# Define the neural network for the policy
+class PolicyNetwork(nn.Module):
+    def __init__(self, input_dim, output_dim):
+        super(PolicyNetwork, self).__init__()
+        self.fc1 = nn.Linear(input_dim, 128)
+        self.relu = nn.ReLU()
+        self.dropout = nn.Dropout(p=0.6)
+        self.fc2 = nn.Linear(128, output_dim)
+        self.softmax = nn.Softmax(dim=1)
+
+    def forward(self, x):
+        x = self.fc1(x)
+        x = self.relu(x)
+        x = self.dropout(x)
+        x = self.fc2(x)
+        return self.softmax(x)
+
+# Normalize function
+def normalize_rewards(rewards):
+    rewards = np.array(rewards)
+    rewards = (rewards - np.mean(rewards)) / (np.std(rewards) + 1e-9)
+    return rewards
+
+
+if __name__ == "__main__":
+    
+    # Hyperparameters
+    learning_rate = 5e-3
+    gamma = 0.99
+    episodes = 450
+
+    # Environment setup
+    env = gym.make("CartPole-v1")  # , render_mode="human")
+
+    input_dim = env.observation_space.shape[0]
+    output_dim = env.action_space.n
+
+    # Policy network
+    policy = PolicyNetwork(input_dim, output_dim)
+    optimizer = optim.Adam(policy.parameters(), lr=learning_rate)
+
+    # Training loop
+    episode_rewards = []
+    for episode in tqdm(range(episodes)):
+
+        state = env.reset()[0]
+        # --> here with my python environment I need to specify index [0]
+        # but if I use another python environment for example with Google collab
+        # I have to use the following script:
+            # "state = env.reset()"
+
+        saved_log_probs = []
+        rewards = []
+
+        while True:
+            state_tensor = torch.from_numpy(state).float().unsqueeze(0)
+            action_probs = policy(state_tensor)
+            m = torch.distributions.Categorical(action_probs)
+            action = m.sample()
+            saved_log_probs.append(m.log_prob(action))
+            state, reward, done, _, _ = env.step(action.item())
+            rewards.append(reward)
+            if done:
+                break
+
+        # Compute returns
+        returns = torch.tensor([sum(rewards[i:] * (0.99 ** np.arange(len(rewards) - i)))
+                                 for i in range(len(rewards))])
+        returns = (returns - returns.mean()) / (returns.std() + 1e-9)
+
+        # Compute policy loss and entropy loss
+        policy_loss = -torch.stack(saved_log_probs).mul(returns).sum()
+        entropy_loss = -0.01 * (action_probs * torch.log(action_probs)).sum(dim=1).mean()
+
+        total_loss = policy_loss + entropy_loss
+
+        optimizer.zero_grad()
+        total_loss.backward()
+        optimizer.step()
+
+        episode_reward = sum(rewards)
+        episode_rewards.append(episode_reward)
+        
+
+    # Plotting
+    plt.plot(episode_rewards)
+    plt.xlabel('Episode')
+    plt.ylabel('Total reward')
+    plt.title('REINFORCE on CartPole')
+    plt.savefig('rewards_cartpole.png')
+    plt.show()
\ No newline at end of file
diff --git a/rewards_cartpole.png b/rewards_cartpole.png
new file mode 100644
index 0000000000000000000000000000000000000000..fd2e3b609d42a37c2be425b4173c53e3473d307e
Binary files /dev/null and b/rewards_cartpole.png differ
diff --git a/train_wb.py b/train_wb.py
new file mode 100644
index 0000000000000000000000000000000000000000..b44d27e854aa43c702acdd3227bd238d6abc69fc
--- /dev/null
+++ b/train_wb.py
@@ -0,0 +1,42 @@
+import gym
+import numpy as np
+
+from stable_baselines3 import A2C
+from stable_baselines3.common.env_util import make_vec_env
+
+from stable_baselines3.common.callbacks import EvalCallback
+import wandb
+from wandb.integration.sb3 import WandbCallback
+
+
+if __name__ == "__main__":
+    # Configuration for W&B
+    wandb.init(project="wandb_test_cartpole", name="cartpole_v1")
+
+    # Create environnement
+    env = make_vec_env("CartPole-v1", n_envs=4)
+
+    # Model initialization
+    model = A2C("MlpPolicy", env, verbose=1, tensorboard_log="./wandb_test_cartpole_tensorboard/")
+
+    # Evaluation callback
+    eval_callback = EvalCallback(
+        env,
+        best_model_save_path='./logs/',
+        log_path='./logs/',
+        eval_freq=500,
+        deterministic=True,
+        render=False
+    )
+
+    # Weights & Biases callback
+    wb_callback = WandbCallback(
+        gradient_save_freq=1000,
+        model_save_path=f"models/{wandb.run.id}",
+        verbose=2,
+    )
+
+    # Actual training
+    model.learn(total_timesteps=25000, callback=[eval_callback, wb_callback])
+    model.save("test_wb_cartpole")
+    wandb.finish()
\ No newline at end of file