Skip to content
Snippets Groups Projects
Select Git revision
  • 69dc99c009455a012db97e02a5f69bdb1a8f35dd
  • main default protected
2 results

td1-reinforcement-learning

  • Clone with SSH
  • Clone with HTTPS
  • user avatar
    LaShelb authored
    69dc99c0
    History

    TD1-Reinforcement-learning

    REINFORCE Implementation for CartPole

    This repository contains an implementation of the REINFORCE algorithm (Monte Carlo Policy Gradient) to solve the CartPole-v1 environment from OpenAI Gym.

    Implementation Details

    The implementation consists of:

    1. A simple policy network with:

      • Input layer (4 units for state space)
      • Hidden layer (128 units with ReLU activation and dropout)
      • Output layer (2 units with softmax activation for action probabilities)
    2. REINFORCE algorithm features:

      • Uses PyTorch for neural network and automatic differentiation
      • Implements full episode Monte Carlo returns with discount factor γ=0.99
      • Uses Adam optimizer with learning rate 5e-3
      • Includes return normalization for training stability

    Training Results

    The agent was trained for 500 episodes. The plot below shows the total reward obtained in each episode during training:

    Training Plot

    Files

    • reinforce_cartpole.py: Contains the implementation of the policy network and REINFORCE algorithm
    • reinforce_cartpole.pth: Saved model weights after training
    • training_plot.png: Visualization of the training progress

    Evaluation Results

    After training, the agent was evaluated on 100 episodes:

    • Success Rate: 100.00%
    • Average Reward: 498.60

    HuggingFace Model

    https://huggingface.co/SimRams/a2c_sb3_cartpole

    Wandb link

    https://wandb.ai/sim-ramos01-centrale-lyon/sb3/runs/bv67u8pe?nw=nwusersimramos01