Skip to content
Snippets Groups Projects
Select Git revision
  • main default protected
1 result

td1-reinforcement-learning

  • Clone with SSH
  • Clone with HTTPS
  • user avatar
    LaShelb authored
    f59e840f
    History

    TD1-Reinforcement-learning

    REINFORCE Implementation for CartPole

    This repository contains an implementation of the REINFORCE algorithm (Monte Carlo Policy Gradient) to solve the CartPole-v1 environment from OpenAI Gym.

    Implementation Details

    The implementation consists of:

    1. A simple policy network with:

      • Input layer (4 units for state space)
      • Hidden layer (128 units with ReLU activation and dropout)
      • Output layer (2 units with softmax activation for action probabilities)
    2. REINFORCE algorithm features:

      • Uses PyTorch for neural network and automatic differentiation
      • Implements full episode Monte Carlo returns with discount factor γ=0.99
      • Uses Adam optimizer with learning rate 5e-3
      • Includes return normalization for training stability

    Training Results

    The agent was trained for 500 episodes. The plot below shows the total reward obtained in each episode during training:

    Training Plot

    Files

    • reinforce_cartpole.py: Contains the implementation of the policy network and REINFORCE algorithm
    • reinforce_cartpole.pth: Saved model weights after training
    • training_plot.png: Visualization of the training progress

    Evaluation Results

    After training, the agent was evaluated on 100 episodes:

    • Success Rate: 100.00%
    • Average Reward: 498.60

    HuggingFace Model

    https://huggingface.co/SimRams/a2c_sb3_cartpole

    Wandb link

    https://wandb.ai/sim-ramos01-centrale-lyon/sb3/runs/bv67u8pe?nw=nwusersimramos01

    Disclaimer about Panda_gym

    For an unknown reason, I could not download and use panda_gym. So I just put the code in a2c_sb3_panda_reach.py, but I don't have any way to test it.