diff --git a/README.md b/README.md index a2a41cc66d72572d0281a576ed5b76e8b02bd306..f6050f1de36c34426aacadb885873b5c8720b3af 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,7 @@ In this hands-on project, we will first implement a simple RL algorithm and appl ## To be handed in This work must be done individually. The expected output is a repository named `hands-on-rl` on https://gitlab.ec-lyon.fr. It must contain a `README.md` file that explains **briefly** the successive steps of the project. Throughout the subject, you will find a 🛠 symbol indicating that a specific production is expected. -The last commit is due before 11:59 pm on Monday, February 13, 2023. Subsequent commits will not be considered. +The last commit is due before 11:59 pm on February 20, 2024. Subsequent commits will not be considered. > ⚠️ **Warning** > Ensure that you only commit the files that are requested. For example, your directory should not contain the generated `.zip` files, nor the `runs` folder... At the end, your repository must contain one `README.md`, three python scripts, and optionally image files for the plots. @@ -25,13 +25,19 @@ Gym is a framework for developing and evaluating reinforcement learning environm ### Installation ```sh -pip install gym==0.21 +pip install gym==0.26.2 ``` Install also pyglet for the rendering. ```sh -pip install pyglet==1.5.27 +pip install pyglet==2.0.10 +``` + +If needed + +```sh +pip install pygame==2.5.2 ``` ### Usage @@ -42,7 +48,7 @@ Here is an example of how to use Gym to solve the `CartPole-v1` environment: import gym # Create the environment -env = gym.make("CartPole-v1") +env = gym.make("CartPole-v1", render_mode="human") # Reset the environment and get the initial observation observation = env.reset() @@ -50,12 +56,17 @@ observation = env.reset() for _ in range(100): # Select a random action from the action space action = env.action_space.sample() - # Apply the action to the environment + # Apply the action to the environment # Returns next observation, reward, done signal (indicating # if the episode has ended), and an additional info dictionary - observation, reward, done, info = env.step(action) + observation, reward, terminated, truncated, info = env.step(action) # Render the environment to visualize the agent's behavior - env.render() + env.render() + if terminated: + # Terminated before max step + break + +env.close() ``` ## REINFORCE