Skip to content
Snippets Groups Projects
Select Git revision
  • e02f3a6e754382718f628ea530491dcc20bcaea1
  • main default protected
2 results

td1-reinforcement-learning

  • Clone with SSH
  • Clone with HTTPS
  • user avatar
    LaShelb authored
    e02f3a6e
    History

    Hands-On Reinforcement Learning

    This repository contains my implementation of various reinforcement learning algorithms and their application to different environments, as part of the MSO 3.4 Apprentissage Automatique course.

    1. REINFORCE Implementation for CartPole

    Algorithm Overview

    I implemented the REINFORCE algorithm (Monte Carlo Policy Gradient) to solve the CartPole-v1 environment from OpenAI Gym. The algorithm directly optimizes the policy using gradient ascent on the expected return.

    Implementation Details

    My implementation includes:

    1. A simple policy network with:

      • Input layer (4 units for state space)
      • Hidden layer (128 units with ReLU activation and dropout)
      • Output layer (2 units with softmax activation for action probabilities)
    2. REINFORCE algorithm features:

      • Uses PyTorch for neural network and automatic differentiation
      • Implements full episode Monte Carlo returns with discount factor γ=0.99
      • Uses Adam optimizer with learning rate 5e-3
      • Includes return normalization for training stability

    Training Results

    The agent was trained for 500 episodes. The plot below shows the total reward obtained in each episode during training:

    Training Plot

    Evaluation Results

    After evaluation on 100 episodes, the agent achieved:

    • Success Rate: 100.00%
    • Average Reward: 498.60

    2. A2C Implementation with Stable-Baselines3

    For the second part of the project, I used Stable-Baselines3 to implement the Advantage Actor-Critic (A2C) algorithm for both the CartPole environment and a robotic arm simulation.

    CartPole with A2C

    I used the A2C implementation from Stable-Baselines3 to train an agent on the CartPole-v1 environment. The implementation leverages the high-level API provided by SB3 for efficient training and evaluation.

    External Tools Integration

    Hugging Face Hub

    The trained A2C model for CartPole is available on Hugging Face: https://huggingface.co/SimRams/a2c_sb3_cartpole

    Weights & Biases

    Training progress was tracked using Weights & Biases: https://wandb.ai/sim-ramos01-centrale-lyon/sb3/runs/bv67u8pe?nw=nwusersimramos01

    3. Robotic Arm Training

    Panda-gym Implementation

    I prepared code to train an A2C agent on the PandaReachJointsDense-v3 environment, which involves controlling a robotic arm to reach points in 3D space.

    Note: I encountered technical issues with installing and running panda_gym. While the implementation code is included in a2c_sb3_panda_reach.py, I was unable to test it. The code is structured to:

    1. Set up the environment with the proper configuration
    2. Train the A2C model with 500k timesteps
    3. Track training with Weights & Biases
    4. Upload the trained model to Hugging Face Hub

    Files

    • reinforce_cartpole.py: Implementation of the REINFORCE algorithm
    • reinforce_cartpole.pth: Saved model weights after training
    • evaluate_reinforce_cartpole.py: Script for evaluating the trained REINFORCE model
    • a2c_sb3_cartpole.py: Implementation using Stable-Baselines3 A2C
    • wandb_cartpole.py: Integration with Weights & Biases
    • a2c_sb3_panda_reach.py: Code for training a robotic arm (not tested)