Compare revisions

Schneider Leo · Schneider Leo · a4200040
--- a/README.md
+++ b/README.md
@@ -9,7 +9,7 @@ In this hands-on project, we will first implement a simple RL algorithm and appl
 ## To be handed in

 This work must be done individually. The expected output is a repository named `hands-on-rl` on https://gitlab.ec-lyon.fr. It must contain a `README.md` file that explains **briefly** the successive steps of the project. Throughout the subject, you will find a 🛠 symbol indicating that a specific production is expected.
-The last commit is due before 11:59 pm on Monday, February 13, 2023. Subsequent commits will not be considered.
+The last commit is due before 11:59 pm on February 20, 2024. Subsequent commits will not be considered.

 > ⚠️ **Warning**
 > Ensure that you only commit the files that are requested. For example, your directory should not contain the generated `.zip` files, nor the `runs` folder... At the end, your repository must contain one `README.md`, three python scripts, and optionally image files for the plots.
@@ -25,24 +25,30 @@ Gym is a framework for developing and evaluating reinforcement learning environm
 ### Installation

 ```sh
-pip install gym==0.21
+pip install gym==0.26.2
 ```

 Install also pyglet for the rendering.

 ```sh
-pip install pyglet==1.5.27
+pip install pyglet==2.0.10
+```
+
+If needed 
+
+```sh
+pip install pygame==2.5.2
 ```

 ### Usage

-Here is an example of how to use Gym to solve the `CartPole-v1` environment:
+Here is an example of how to use Gym to solve the `CartPole-v1` environment (https://gymnasium.farama.org/environments/classic_control/cart_pole/):

 ```python
 import gym

 # Create the environment
-env = gym.make("CartPole-v1")
+env = gym.make("CartPole-v1", render_mode="human")

 # Reset the environment and get the initial observation
 observation = env.reset()
@@ -53,9 +59,14 @@ for _ in range(100):
    # Apply the action to the environment
    # Returns next observation, reward, done signal (indicating
    # if the episode has ended), and an additional info dictionary
-    observation, reward, done, info = env.step(action)
+    observation, reward, terminated, truncated, info = env.step(action)
    # Render the environment to visualize the agent's behavior
    env.render()
+    if terminated: 
+        # Terminated before max step
+        break
+
+env.close()
 ```

 ## REINFORCE
No results found