"Requirement already satisfied: numpy>=1.18.0 in /usr/local/lib/python3.10/dist-packages (from gym==0.26.2) (1.23.5)\n",
"Requirement already satisfied: cloudpickle>=1.2.0 in /usr/local/lib/python3.10/dist-packages (from gym==0.26.2) (2.2.1)\n",
"Requirement already satisfied: gym-notices>=0.0.4 in /usr/local/lib/python3.10/dist-packages (from gym==0.26.2) (0.0.8)\n",
"Building wheels for collected packages: gym\n",
" Building wheel for gym (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
" Created wheel for gym: filename=gym-0.26.2-py3-none-any.whl size=827620 sha256=c0e52d00d327e4107d9d7487856d18889281e05f722af341747342271291b415\n",
" Stored in directory: /root/.cache/pip/wheels/b9/22/6d/3e7b32d98451b4cd9d12417052affbeeeea012955d437da1da\n",
"Successfully built gym\n",
"Installing collected packages: gym\n",
" Attempting uninstall: gym\n",
" Found existing installation: gym 0.25.2\n",
" Uninstalling gym-0.25.2:\n",
" Successfully uninstalled gym-0.25.2\n",
"\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
"dopamine-rl 4.0.6 requires gym<=0.25.2, but you have gym 0.26.2 which is incompatible.\u001b[0m\u001b[31m\n",
"The REINFORCE algorithm (also known as Vanilla Policy Gradient) is a policy gradient method that optimizes the policy directly using gradient descent. The following is the pseudocode of the REINFORCE algorithm:"
],
"metadata": {
"id": "4w5SGUKBozft"
}
},
{
"cell_type": "markdown",
"source": [
"**Setup the CartPole environment**\n",
"\n",
"**Setup the agent as a simple neural network with:**\n",
" \n",
"- One fully connected layer with 128 units and ReLU activation followed by a dropout layer\n",
"\n",
"- One fully connected layer followed by softmax activation\n",
"\n",
"**Repeat 500 times:**\n",
"\n",
" Reset the environment\n",
" \n",
" Reset the buffer\n",
" \n",
" Repeat until the end of the episode:\n",
" \n",
" - Compute action probabilities\n",
" \n",
" - Sample the action based on the probabilities and\n",
" store its probability in the buffer\n",
" \n",
" - Step the environment with the action\n",
" \n",
" - Compute and store in the buffer the return using gamma=0.99\n",
" \n",
" Normalize the return\n",
" \n",
" Compute the policy loss as -sum(log(prob) * return)\n",
" \n",
" Update the policy using an Adam optimizer and a learning rate of 5e-3"
],
"metadata": {
"id": "ibubEQw2pPfn"
}
},
{
"cell_type": "code",
"source": [
"import gym, pygame, numpy as np, matplotlib.pyplot as plt\n",
"import torch, torch.nn as nn, torch.nn.functional as F, torch.optim as optim\n",
"We can see that even if the average reward per episode increases, it does not reach the max reward value, and still oscillates much after 500 episodes."
],
"metadata": {
"id": "pBWdzYVTin8v"
}
},
{
"cell_type": "markdown",
"source": [
"**Familiarization with a complete RL pipeline: Application to training a robotic arm**\n",
"\n",
"In this section, you will use the Stable-Baselines3 package to train a robotic arm using RL. You'll get familiar with several widely-used tools for training, monitoring and sharing machine learning models."
"/usr/local/lib/python3.10/dist-packages/notebook/utils.py:280: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.\n",
"evalue": "It appears that you do not have permission to access the requested resource. Please reach out to the project owner to grant you access. If you have the correct permissions, verify that there are no issues with your networking setup.(Error 404: Not Found)",
"\u001b[0;31mCommError\u001b[0m: It appears that you do not have permission to access the requested resource. Please reach out to the project owner to grant you access. If you have the correct permissions, verify that there are no issues with your networking setup.(Error 404: Not Found)"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"I am struggling to access my Hugging Face account even though I generated tokens... as I cannot understand what is happening I will stop there :'("
],
"metadata": {
"id": "_fV955Vjvett"
}
}
]
}
\ No newline at end of file
%% Cell type:code id: tags:
```
!pip install torch
```
%% Output
Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (2.1.0+cu121)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch) (3.13.1)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch) (4.5.0)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch) (1.12)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch) (3.2.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch) (3.1.3)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch) (2023.6.0)
Requirement already satisfied: triton==2.1.0 in /usr/local/lib/python3.10/dist-packages (from torch) (2.1.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch) (2.1.4)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch) (1.3.0)
%% Cell type:code id: tags:
```
!pip install gym==0.26.2
!pip install pyglet==2.0.10
!pip install pygame==2.5.2
!pip install PyQt5
```
%% Output
Collecting gym==0.26.2
Downloading gym-0.26.2.tar.gz (721 kB)
[2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m721.7/721.7 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
Requirement already satisfied: numpy>=1.18.0 in /usr/local/lib/python3.10/dist-packages (from gym==0.26.2) (1.23.5)
Requirement already satisfied: cloudpickle>=1.2.0 in /usr/local/lib/python3.10/dist-packages (from gym==0.26.2) (2.2.1)
Requirement already satisfied: gym-notices>=0.0.4 in /usr/local/lib/python3.10/dist-packages (from gym==0.26.2) (0.0.8)
Building wheels for collected packages: gym
Building wheel for gym (pyproject.toml) ... [?25l[?25hdone
Created wheel for gym: filename=gym-0.26.2-py3-none-any.whl size=827620 sha256=c0e52d00d327e4107d9d7487856d18889281e05f722af341747342271291b415
Stored in directory: /root/.cache/pip/wheels/b9/22/6d/3e7b32d98451b4cd9d12417052affbeeeea012955d437da1da
Successfully built gym
Installing collected packages: gym
Attempting uninstall: gym
Found existing installation: gym 0.25.2
Uninstalling gym-0.25.2:
Successfully uninstalled gym-0.25.2
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
dopamine-rl 4.0.6 requires gym<=0.25.2, but you have gym 0.26.2 which is incompatible.[0m[31m
The REINFORCE algorithm (also known as Vanilla Policy Gradient) is a policy gradient method that optimizes the policy directly using gradient descent. The following is the pseudocode of the REINFORCE algorithm:
%% Cell type:markdown id: tags:
**Setup the CartPole environment**
**Setup the agent as a simple neural network with:**
- One fully connected layer with 128 units and ReLU activation followed by a dropout layer
- One fully connected layer followed by softmax activation
**Repeat 500 times:**
Reset the environment
Reset the buffer
Repeat until the end of the episode:
- Compute action probabilities
- Sample the action based on the probabilities and
store its probability in the buffer
- Step the environment with the action
- Compute and store in the buffer the return using gamma=0.99
Normalize the return
Compute the policy loss as -sum(log(prob) * return)
Update the policy using an Adam optimizer and a learning rate of 5e-3
%% Cell type:code id: tags:
```
import gym, pygame, numpy as np, matplotlib.pyplot as plt
import torch, torch.nn as nn, torch.nn.functional as F, torch.optim as optim
We can see that even if the average reward per episode increases, it does not reach the max reward value, and still oscillates much after 500 episodes.
%% Cell type:markdown id: tags:
**Familiarization with a complete RL pipeline: Application to training a robotic arm**
In this section, you will use the Stable-Baselines3 package to train a robotic arm using RL. You'll get familiar with several widely-used tools for training, monitoring and sharing machine learning models.
%% Cell type:code id: tags:
```
!pip install stable-baselines3
!pip install moviepy
!pip install huggingface-sb3==2.3.1
!pip install wandb
```
%% Output
Requirement already satisfied: stable-baselines3 in /usr/local/lib/python3.10/dist-packages (2.2.1)
Requirement already satisfied: gymnasium<0.30,>=0.28.1 in /usr/local/lib/python3.10/dist-packages (from stable-baselines3) (0.29.1)
Requirement already satisfied: numpy>=1.20 in /usr/local/lib/python3.10/dist-packages (from stable-baselines3) (1.25.2)
Requirement already satisfied: torch>=1.13 in /usr/local/lib/python3.10/dist-packages (from stable-baselines3) (2.1.0+cu121)
Requirement already satisfied: cloudpickle in /usr/local/lib/python3.10/dist-packages (from stable-baselines3) (2.2.1)
Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from stable-baselines3) (1.5.3)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (from stable-baselines3) (3.7.1)
Requirement already satisfied: typing-extensions>=4.3.0 in /usr/local/lib/python3.10/dist-packages (from gymnasium<0.30,>=0.28.1->stable-baselines3) (4.10.0)
Requirement already satisfied: farama-notifications>=0.0.1 in /usr/local/lib/python3.10/dist-packages (from gymnasium<0.30,>=0.28.1->stable-baselines3) (0.0.4)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch>=1.13->stable-baselines3) (3.13.1)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=1.13->stable-baselines3) (1.12)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=1.13->stable-baselines3) (3.2.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=1.13->stable-baselines3) (3.1.3)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch>=1.13->stable-baselines3) (2023.6.0)
Requirement already satisfied: triton==2.1.0 in /usr/local/lib/python3.10/dist-packages (from torch>=1.13->stable-baselines3) (2.1.0)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->stable-baselines3) (1.2.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib->stable-baselines3) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->stable-baselines3) (4.49.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->stable-baselines3) (1.4.5)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->stable-baselines3) (23.2)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->stable-baselines3) (9.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->stable-baselines3) (3.1.1)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib->stable-baselines3) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->stable-baselines3) (2023.4)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.7->matplotlib->stable-baselines3) (1.16.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch>=1.13->stable-baselines3) (2.1.5)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch>=1.13->stable-baselines3) (1.3.0)
Requirement already satisfied: moviepy in /usr/local/lib/python3.10/dist-packages (1.0.3)
Requirement already satisfied: decorator<5.0,>=4.0.2 in /usr/local/lib/python3.10/dist-packages (from moviepy) (4.4.2)
Requirement already satisfied: tqdm<5.0,>=4.11.2 in /usr/local/lib/python3.10/dist-packages (from moviepy) (4.66.2)
Requirement already satisfied: requests<3.0,>=2.8.1 in /usr/local/lib/python3.10/dist-packages (from moviepy) (2.31.0)
Requirement already satisfied: proglog<=1.0.0 in /usr/local/lib/python3.10/dist-packages (from moviepy) (0.1.10)
Requirement already satisfied: numpy>=1.17.3 in /usr/local/lib/python3.10/dist-packages (from moviepy) (1.25.2)
Requirement already satisfied: imageio<3.0,>=2.5 in /usr/local/lib/python3.10/dist-packages (from moviepy) (2.31.6)
Requirement already satisfied: imageio-ffmpeg>=0.2.0 in /usr/local/lib/python3.10/dist-packages (from moviepy) (0.4.9)
Requirement already satisfied: pillow<10.1.0,>=8.3.2 in /usr/local/lib/python3.10/dist-packages (from imageio<3.0,>=2.5->moviepy) (9.4.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from imageio-ffmpeg>=0.2.0->moviepy) (67.7.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests<3.0,>=2.8.1->moviepy) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3.0,>=2.8.1->moviepy) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3.0,>=2.8.1->moviepy) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3.0,>=2.8.1->moviepy) (2024.2.2)
Requirement already satisfied: huggingface-sb3==2.3.1 in /usr/local/lib/python3.10/dist-packages (2.3.1)
Requirement already satisfied: huggingface-hub~=0.8 in /usr/local/lib/python3.10/dist-packages (from huggingface-sb3==2.3.1) (0.20.3)
Requirement already satisfied: pyyaml~=6.0 in /usr/local/lib/python3.10/dist-packages (from huggingface-sb3==2.3.1) (6.0.1)
Requirement already satisfied: wasabi in /usr/local/lib/python3.10/dist-packages (from huggingface-sb3==2.3.1) (1.1.2)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from huggingface-sb3==2.3.1) (1.25.2)
Requirement already satisfied: cloudpickle>=1.6 in /usr/local/lib/python3.10/dist-packages (from huggingface-sb3==2.3.1) (2.2.1)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from huggingface-hub~=0.8->huggingface-sb3==2.3.1) (3.13.1)
Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub~=0.8->huggingface-sb3==2.3.1) (2023.6.0)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from huggingface-hub~=0.8->huggingface-sb3==2.3.1) (2.31.0)
Requirement already satisfied: tqdm>=4.42.1 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub~=0.8->huggingface-sb3==2.3.1) (4.66.2)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub~=0.8->huggingface-sb3==2.3.1) (4.10.0)
Requirement already satisfied: packaging>=20.9 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub~=0.8->huggingface-sb3==2.3.1) (23.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface-hub~=0.8->huggingface-sb3==2.3.1) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface-hub~=0.8->huggingface-sb3==2.3.1) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface-hub~=0.8->huggingface-sb3==2.3.1) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface-hub~=0.8->huggingface-sb3==2.3.1) (2024.2.2)
/usr/local/lib/python3.10/dist-packages/notebook/utils.py:280: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
return LooseVersion(v) >= LooseVersion(check)
[34m[1mwandb[0m: (1) Private W&B dashboard, no account required
[34m[1mwandb[0m: (2) Use an existing W&B account
[34m[1mwandb[0m: Enter your choice: 1
[34m[1mwandb[0m: You chose 'Private W&B dashboard, no account required'
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
Problem at: <ipython-input-28-b66b1c5d0647> 35 <cell line: 35>
/usr/local/lib/python3.10/dist-packages/wandb/sdk/wandb_init.py in init(self)
783 backend.cleanup()
784 self.teardown()
--> 785 raise error
786
787 assert run_result is not None # for mypy
CommError: It appears that you do not have permission to access the requested resource. Please reach out to the project owner to grant you access. If you have the correct permissions, verify that there are no issues with your networking setup.(Error 404: Not Found)
%% Cell type:markdown id: tags:
I am struggling to access my Hugging Face account even though I generated tokens... as I cannot understand what is happening I will stop there :'(