Skip to content
Snippets Groups Projects
Select Git revision
  • main default protected
1 result

README.md

Blame
  • Forked from Dellandrea Emmanuel / MSO_3_4-TD1
    25 commits behind the upstream repository.
    README.md 5.82 KiB

    Hands-On Reinforcement Learning

    In this hands-on project, we will first implement a simple RL algorithm and apply it to solve the CartPole-v1 environment. Once we become familiar with the basic workflow, we will learn to use various tools for machine learning model training, monitoring, and sharing, by applying these tools to train a robotic arm.

    To be handed in

    This work must be done individually. The expected output is a repository named hands-on-rl on https://gitlab.ec-lyon.fr. It must contain a README.md file that explains briefly the successive steps of the project. Throughout the subject, you will find a 🛠️ symbol indicating that a specific production is expected. The last commit is due before 11:59 pm on Monday, February 13, 2023. Subsequent commits will not be considered.

    Introduction to Gym

    Gym is a framework for developing and evaluating reinforcement learning environments. It offers various environments, including classic control and toy text scenarios, to test RL algorithms.

    Installation

    pip install gym==0.21

    Usage

    Here is an example of how to use Gym to solve the Cartpole-v1 environment:

    import gym
    
    # Create the environment
    env = gym.make("Cartpole-v1")
    
    # Reset the environment and get the initial observation
    observation = env.reset()
    
    for _ in range(100):
        # Select a random action from the action space
        action = env.action_space.sample()
        # Apply the action to the environment 
        # Returns next observation, reward, done signal (indicating
        # if the episode has ended), and an additional info dictionary
        observation, reward, done, info = env.step(action)
        # Render the environment to visualize the agent's behavior
        env.render() 

    REINFORCE

    The REINFORCE algorithm (also known as Vanilla Policy Gradient) is a policy gradient method that optimizes the policy directly using gradient descent. The following is the pseudocode of the REINFORCE algorithm:

    Setup the Cartpole environment
    Setup the agent as a simple neural network with:
        - One fully connected layer with 128 units and ReLU activation followed by a dropout layer
        - One fully connected layer followed by softmax activation
    Repeat 500 times:
        Reset the environment
        Reset the buffer
        Repeat until the end of the episode:
            Compute and store in the buffer the action probabilities 
            Sample the action based on the probabilities
            Step the environment with the action
            Compute and store in the buffer the return using gamma=0.99 
        Normalize the return
        Compute the policy loss as -sum(log(prob) * return)
        Update the policy using an Adam optimizer and a learning rate of 5e-3

    🛠️ Use PyTorch to implement REINFORCE and solve the Cartpole environement. Share the code in reinforce.py, and share a plot showing the return accross episodes in the README.md.

    Familiarization with a complete RL pipeline: Application to training a robotic arm

    In this section, you will use the Stable-Baselines3 package to train a robotic arm using RL. You'll get familiar with several widely-used tools for training, monitoring and sharing machine learning models.

    Get familiar with Stable-Baselines3

    Stable-Baselines3 (SB3) is a high-level RL library that provides various algorithms and integrated tools to easily train and test reinforcement learning models.

    Installation