-
Gallouedec Quentin authoredGallouedec Quentin authored
Hands-On Reinforcement Learning
In this hands-on project, we will first implement a simple RL algorithm and apply it to solve the CartPole-v1 environment. Once we become familiar with the basic workflow, we will learn to use various tools for machine learning model training, monitoring, and sharing, by applying these tools to train a robotic arm.
To be handed in
This work must be done individually. The expected output is a repository named hands-on-rl
on https://gitlab.ec-lyon.fr. It must contain a README.md
file that explains briefly the successive steps of the project. Throughout the subject, you will find a
⚠️ Warning Ensure that you only commit the files that are requested. For example, your directory should not contain the generated.zip
files, nor theruns
folder... At the end, your repository must contain oneREADME.md
, three python scripts, and optionally image files for the plots.
Before you start
Make sure you know the basics of Reinforcement Learning. In case of need, you can refer to the introduction of the Hugging Face RL course.
Introduction to Gym
Gym is a framework for developing and evaluating reinforcement learning environments. It offers various environments, including classic control and toy text scenarios, to test RL algorithms.
Installation
pip install gym==0.21
Install also pyglet for the rendering.
pip install pyglet==1.5.27
Usage
Here is an example of how to use Gym to solve the CartPole-v1
environment:
import gym
# Create the environment
env = gym.make("CartPole-v1")
# Reset the environment and get the initial observation
observation = env.reset()
for _ in range(100):
# Select a random action from the action space
action = env.action_space.sample()
# Apply the action to the environment
# Returns next observation, reward, done signal (indicating
# if the episode has ended), and an additional info dictionary
observation, reward, done, info = env.step(action)
# Render the environment to visualize the agent's behavior
env.render()
REINFORCE
The REINFORCE algorithm (also known as Vanilla Policy Gradient) is a policy gradient method that optimizes the policy directly using gradient descent. The following is the pseudocode of the REINFORCE algorithm:
Setup the CartPole environment
Setup the agent as a simple neural network with:
- One fully connected layer with 128 units and ReLU activation followed by a dropout layer
- One fully connected layer followed by softmax activation
Repeat 500 times:
Reset the environment
Reset the buffer
Repeat until the end of the episode:
Compute action probabilities
Sample the action based on the probabilities and store its probability in the buffer
Step the environment with the action
Compute and store in the buffer the return using gamma=0.99
Normalize the return
Compute the policy loss as -sum(log(prob) * return)
Update the policy using an Adam optimizer and a learning rate of 5e-3
To learn more about REINFORCE, you can refer to this unit.