diff --git a/README.md b/README.md index a2a41cc66d72572d0281a576ed5b76e8b02bd306..233cf416fcc7b34b5c5e6b847b7346caf9776f9a 100644 --- a/README.md +++ b/README.md @@ -8,8 +8,14 @@ In this hands-on project, we will first implement a simple RL algorithm and appl ## To be handed in -This work must be done individually. The expected output is a repository named `hands-on-rl` on https://gitlab.ec-lyon.fr. It must contain a `README.md` file that explains **briefly** the successive steps of the project. Throughout the subject, you will find a 🛠 symbol indicating that a specific production is expected. -The last commit is due before 11:59 pm on Monday, February 13, 2023. Subsequent commits will not be considered. +This work must be done individually. The expected output is a repository named `hands-on-rl` on https://gitlab.ec-lyon.fr. + +We assume that `git` is installed, and that you are familiar with the basic `git` commands. (Optionnaly, you can use GitHub Desktop.) +We also assume that you have access to the [ECL GitLab](https://gitlab.ec-lyon.fr/). If necessary, please consult [this tutorial](https://gitlab.ec-lyon.fr/edelland/inf_tc2/-/blob/main/Tutoriel_gitlab/tutoriel_gitlab.md). + +Your repository must contain a `README.md` file that explains **briefly** the successive steps of the project. Throughout the subject, you will find a 🛠 symbol indicating that a specific production is expected. + +The last commit is due before 11:59 pm on March 5, 2024. Subsequent commits will not be considered. > ⚠️ **Warning** > Ensure that you only commit the files that are requested. For example, your directory should not contain the generated `.zip` files, nor the `runs` folder... At the end, your repository must contain one `README.md`, three python scripts, and optionally image files for the plots. @@ -20,29 +26,43 @@ Make sure you know the basics of Reinforcement Learning. In case of need, you ca ## Introduction to Gym -Gym is a framework for developing and evaluating reinforcement learning environments. It offers various environments, including classic control and toy text scenarios, to test RL algorithms. +[Gym](https://gymnasium.farama.org/) is a framework for developing and evaluating reinforcement learning environments. It offers various environments, including classic control and toy text scenarios, to test RL algorithms. ### Installation +We recommend to use Python virtual environnements to install the required modules : https://docs.python.org/3/library/venv.html + + ```sh -pip install gym==0.21 +pip install gym==0.26.2 ``` Install also pyglet for the rendering. ```sh -pip install pyglet==1.5.27 +pip install pyglet==2.0.10 +``` + +If needed + +```sh +pip install pygame==2.5.2 ``` +```sh +pip install PyQt5 +``` + + ### Usage -Here is an example of how to use Gym to solve the `CartPole-v1` environment: +Here is an example of how to use Gym to solve the `CartPole-v1` environment [Documentation](https://gymnasium.farama.org/environments/classic_control/cart_pole/): ```python import gym # Create the environment -env = gym.make("CartPole-v1") +env = gym.make("CartPole-v1", render_mode="human") # Reset the environment and get the initial observation observation = env.reset() @@ -50,12 +70,17 @@ observation = env.reset() for _ in range(100): # Select a random action from the action space action = env.action_space.sample() - # Apply the action to the environment + # Apply the action to the environment # Returns next observation, reward, done signal (indicating # if the episode has ended), and an additional info dictionary - observation, reward, done, info = env.step(action) + observation, reward, terminated, truncated, info = env.step(action) # Render the environment to visualize the agent's behavior - env.render() + env.render() + if terminated: + # Terminated before max step + break + +env.close() ``` ## REINFORCE @@ -80,7 +105,7 @@ Repeat 500 times: Update the policy using an Adam optimizer and a learning rate of 5e-3 ``` -To learn more about REINFORCE, you can refer to [this unit](https://huggingface.co/blog/deep-rl-pg). +To learn more about REINFORCE, you can refer to [this unit](https://huggingface.co/learn/deep-rl-course/unit4/introduction). > 🛠 **To be handed in** > Use PyTorch to implement REINFORCE and solve the CartPole environement. Share the code in `reinforce_cartpole.py`, and share a plot showing the total reward accross episodes in the `README.md`. @@ -97,6 +122,7 @@ Stable-Baselines3 (SB3) is a high-level RL library that provides various algorit ```sh pip install stable-baselines3 +pip install moviepy ``` #### Usage @@ -114,12 +140,12 @@ Hugging Face Hub is a platform for easy sharing and versioning of trained machin #### Installation of `huggingface_sb3` ```sh -pip install huggingface_sb3 +pip install huggingface-sb3==2.3.1 ``` #### Upload the model on the Hub -Follow the [Hugging Face Hub documentation](https://huggingface.co/docs/hub/index) to upload the previously learned model to the Hub. +Follow the [Hugging Face Hub documentation](https://huggingface.co/docs/hub/stable-baselines3) to upload the previously learned model to the Hub. > 🛠 **To be handed in** > Link the trained model in the `README.md` file. @@ -139,7 +165,7 @@ You'll need to install both `wand` and `tensorboar`. pip install wandb tensorboard ``` -Use the documentation of Stable-Baselines3 and [Weights & Biases](https://docs.wandb.ai) to track the CartPole training. Make the run public. +Use the documentation of [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) and [Weights & Biases](https://docs.wandb.ai/guides/integrations/stable-baselines-3) to track the CartPole training. Make the run public. 🛠 Share the link of the wandb run in the `README.md` file. @@ -153,7 +179,7 @@ Panda-gym is a collection of environments for robotic simulation and control. It #### Installation ```shell -pip install panda_gym==2.0.0 +pip install panda-gym==3.0.7 ``` #### Train, track, and share @@ -170,6 +196,7 @@ This tutorial may contain errors, inaccuracies, typos or areas for improvement. ## Author Quentin Gallouédec +Updates by Léo Schneider, Emmanuel Dellandréa ## License