Compare revisions

HeberArteagaJ · 58dab060
--- a/TD2 Deep Learning.ipynb
+++ b/TD2 Deep Learning.ipynb
 %% Cell type:markdown id:7edf7168 tags:

 # TD2: Deep learning

 %% Cell type:markdown id:fbb8c8df tags:

 In this TD, you must modify this notebook to answer the questions. To do this,

 1. Fork this repository
 2. Clone your forked repository on your local computer
 3. Answer the questions
 4. Commit and push regularly

 The last commit is due on Wednesday, December 4, 11:59 PM. Later commits will not be taken into account.

 %% Cell type:markdown id:3d167a29 tags:

 Install and test PyTorch from  https://pytorch.org/get-started/locally.

 %% Cell type:code id:330a42f5 tags:

 ``` python
 %pip install torch torchvision
 ```

+%% Output
+
+    Requirement already satisfied: torch in /Users/heber/.pyenv/versions/3.11.7/lib/python3.11/site-packages (2.2.0)
+    Collecting torchvision
+      Downloading torchvision-0.20.1-cp311-cp311-macosx_11_0_arm64.whl.metadata (6.1 kB)
+    Requirement already satisfied: filelock in /Users/heber/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from torch) (3.13.1)
+    Requirement already satisfied: typing-extensions>=4.8.0 in /Users/heber/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from torch) (4.9.0)
+    Requirement already satisfied: sympy in /Users/heber/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from torch) (1.12)
+    Requirement already satisfied: networkx in /Users/heber/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from torch) (3.2.1)
+    Requirement already satisfied: jinja2 in /Users/heber/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from torch) (3.1.3)
+    Requirement already satisfied: fsspec in /Users/heber/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from torch) (2024.2.0)
+    Requirement already satisfied: numpy in /Users/heber/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from torchvision) (1.26.3)
+    Collecting torch
+      Downloading torch-2.5.1-cp311-none-macosx_11_0_arm64.whl.metadata (28 kB)
+    Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /Users/heber/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from torchvision) (10.2.0)
+    Collecting sympy==1.13.1 (from torch)
+      Downloading sympy-1.13.1-py3-none-any.whl.metadata (12 kB)
+    Requirement already satisfied: mpmath<1.4,>=1.1.0 in /Users/heber/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from sympy==1.13.1->torch) (1.3.0)
+    Requirement already satisfied: MarkupSafe>=2.0 in /Users/heber/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from jinja2->torch) (2.1.5)
+    Downloading torchvision-0.20.1-cp311-cp311-macosx_11_0_arm64.whl (1.8 MB)
+    [2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m827.0 kB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
+    [?25hDownloading torch-2.5.1-cp311-none-macosx_11_0_arm64.whl (63.9 MB)
+    [2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m63.9/63.9 MB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0mm
+    [?25hDownloading sympy-1.13.1-py3-none-any.whl (6.2 MB)
+    [2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.2/6.2 MB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0mm
+    [?25hInstalling collected packages: sympy, torch, torchvision
+      Attempting uninstall: sympy
+        Found existing installation: sympy 1.12
+        Uninstalling sympy-1.12:
+          Successfully uninstalled sympy-1.12
+      Attempting uninstall: torch
+        Found existing installation: torch 2.2.0
+        Uninstalling torch-2.2.0:
+          Successfully uninstalled torch-2.2.0
+    Successfully installed sympy-1.13.1 torch-2.5.1 torchvision-0.20.1
+    
+    [1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.3.1[0m
+    [1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
+    Note: you may need to restart the kernel to use updated packages.
+
 %% Cell type:markdown id:0882a636 tags:


 To test run the following code

 %% Cell type:code id:b1950f0a tags:

 ``` python
 import torch

 N, D = 14, 10
 x = torch.randn(N, D).type(torch.FloatTensor)
 print(x)

 from torchvision import models

 alexnet = models.alexnet()
 print(alexnet)
 ```

+%% Output
+
+    tensor([[-0.4614,  0.2167,  1.3662,  0.5457,  2.7665,  0.8728, -0.1837,  0.0607,
+              1.5946, -0.7726],
+            [-0.8952,  0.7103, -0.7606,  0.9257, -0.1401,  0.5907,  0.7204,  1.3177,
+             -0.4342,  0.4527],
+            [ 0.7967,  0.1907, -0.5346,  1.4139, -0.5380, -2.1966,  0.4751,  1.4743,
+              1.2449, -0.8389],
+            [ 0.0833,  0.5977, -0.7399, -0.4702, -0.6887,  1.1328, -1.1584,  0.3544,
+              1.0611, -0.0325],
+            [ 0.5764, -0.5985, -1.0803, -0.7565, -1.0020,  1.7249, -0.6647,  0.7847,
+              1.7402,  0.8243],
+            [-0.9695,  0.5117,  1.9237,  1.7299,  1.0193,  0.3211, -0.5839,  0.5866,
+              1.0019, -0.2681],
+            [-0.4172, -2.3619, -1.1206, -0.7292,  0.9231, -0.3644,  0.6110,  1.3185,
+              1.2674, -1.5235],
+            [ 0.2213, -0.5554, -0.4785,  0.9106,  0.1333,  1.1237,  0.2859, -1.6737,
+             -0.8616, -2.5445],
+            [ 0.2351,  1.3325,  0.1848,  0.1473,  1.3133, -0.7523,  0.6736,  1.8610,
+             -0.1847,  1.0223],
+            [-0.6824, -0.0298, -0.1910,  1.4017, -1.9937,  0.4087,  0.0165,  1.7551,
+             -0.6690, -0.7425],
+            [-1.3005, -0.5498, -1.3494, -1.2090,  0.3210,  0.7386,  0.5926, -0.6941,
+             -0.1688, -0.6065],
+            [ 0.4044,  0.6994, -0.9141, -0.3529,  1.0734, -0.9639,  0.0657, -0.2253,
+              0.3391,  0.5039],
+            [-2.1911,  1.6130, -0.7344, -1.0796, -0.3465, -0.9285, -0.5405, -0.0072,
+             -0.1058, -1.7597],
+            [-1.4770,  0.3449,  0.6489,  1.7304, -0.0802, -0.0332, -0.2949,  0.2265,
+             -0.7456,  0.8549]])
+    AlexNet(
+      (features): Sequential(
+        (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
+        (1): ReLU(inplace=True)
+        (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
+        (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
+        (4): ReLU(inplace=True)
+        (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
+        (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
+        (7): ReLU(inplace=True)
+        (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
+        (9): ReLU(inplace=True)
+        (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
+        (11): ReLU(inplace=True)
+        (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
+      )
+      (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
+      (classifier): Sequential(
+        (0): Dropout(p=0.5, inplace=False)
+        (1): Linear(in_features=9216, out_features=4096, bias=True)
+        (2): ReLU(inplace=True)
+        (3): Dropout(p=0.5, inplace=False)
+        (4): Linear(in_features=4096, out_features=4096, bias=True)
+        (5): ReLU(inplace=True)
+        (6): Linear(in_features=4096, out_features=1000, bias=True)
+      )
+    )
+
 %% Cell type:markdown id:23f266da tags:

 ## Exercise 1: CNN on CIFAR10

 The goal is to apply a Convolutional Neural Net (CNN) model on the CIFAR10 image dataset and test the accuracy of the model on the basis of image classification. Compare the Accuracy VS the neural network implemented during TD1.

 Have a look at the following documentation to be familiar with PyTorch.

 https://pytorch.org/tutorials/beginner/pytorch_with_examples.html

 https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html

 %% Cell type:markdown id:4ba1c82d tags:

 You can test if GPU is available on your machine and thus train on it to speed up the process

 %% Cell type:code id:6e18f2fd tags:

 ``` python
 import torch

 # check if CUDA is available
 train_on_gpu = torch.cuda.is_available()

 if not train_on_gpu:
    print("CUDA is not available.  Training on CPU ...")
 else:
    print("CUDA is available!  Training on GPU ...")
 ```

+%% Output
+
+    CUDA is not available.  Training on CPU ...
+
+%% Cell type:code id:abb4553c tags:
+
+``` python
+if torch.backends.mps.is_available():
+    mps_device = torch.device("mps")
+    x = torch.ones(1, device=mps_device)
+    print (x)
+else:
+    print ("MPS device not found.")
+```
+
+%% Output
+
+    tensor([1.], device='mps:0')
+
 %% Cell type:markdown id:5cf214eb tags:

 Next we load the CIFAR10 dataset

+%% Cell type:code id:711b0b8e tags:
+
+``` python
+import numpy as np
+from torchvision import datasets, transforms
+from torch.utils.data.sampler import SubsetRandomSampler
+```
+
 %% Cell type:code id:462666a2 tags:

 ``` python
 import numpy as np
 from torchvision import datasets, transforms
 from torch.utils.data.sampler import SubsetRandomSampler

 # number of subprocesses to use for data loading
 num_workers = 0
 # how many samples per batch to load
 batch_size = 20
 # percentage of training set to use as validation
 valid_size = 0.2

 # convert data to a normalized torch.FloatTensor
 transform = transforms.Compose(
    [transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
 )

 # choose the training and test datasets
 train_data = datasets.CIFAR10("data", train=True, download=True, transform=transform)
 test_data = datasets.CIFAR10("data", train=False, download=True, transform=transform)

 # obtain training indices that will be used for validation
 num_train = len(train_data)
 indices = list(range(num_train))
 np.random.shuffle(indices)
 split = int(np.floor(valid_size * num_train))
 train_idx, valid_idx = indices[split:], indices[:split]

 # define samplers for obtaining training and validation batches
 train_sampler = SubsetRandomSampler(train_idx)
 valid_sampler = SubsetRandomSampler(valid_idx)

 # prepare data loaders (combine dataset and sampler)
 train_loader = torch.utils.data.DataLoader(
    train_data, batch_size=batch_size, sampler=train_sampler, num_workers=num_workers
 )
 valid_loader = torch.utils.data.DataLoader(
    train_data, batch_size=batch_size, sampler=valid_sampler, num_workers=num_workers
 )
 test_loader = torch.utils.data.DataLoader(
    test_data, batch_size=batch_size, num_workers=num_workers
 )

 # specify the image classes
 classes = [
    "airplane",
    "automobile",
    "bird",
    "cat",
    "deer",
    "dog",
    "frog",
    "horse",
    "ship",
    "truck",
 ]
 ```

+%% Output
+
+    Files already downloaded and verified
+    Files already downloaded and verified
+
 %% Cell type:markdown id:58ec3903 tags:

 CNN definition (this one is an example)

 %% Cell type:code id:317bf070 tags:

 ``` python
 import torch.nn as nn
 import torch.nn.functional as F

 # define the CNN architecture


 class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


 # create a complete CNN
 model = Net()
 print(model)
 # move tensors to GPU if CUDA is available
 if train_on_gpu:
    model.cuda()
 ```

+%% Output
+
+    Net(
+      (conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
+      (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
+      (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
+      (fc1): Linear(in_features=400, out_features=120, bias=True)
+      (fc2): Linear(in_features=120, out_features=84, bias=True)
+      (fc3): Linear(in_features=84, out_features=10, bias=True)
+    )
+
 %% Cell type:markdown id:a2dc4974 tags:

 Loss function and training using SGD (Stochastic Gradient Descent) optimizer

 %% Cell type:code id:4b53f229 tags:

 ``` python
 import torch.optim as optim

 criterion = nn.CrossEntropyLoss()  # specify loss function
 optimizer = optim.SGD(model.parameters(), lr=0.01)  # specify optimizer

 n_epochs = 30  # number of epochs to train the model
 train_loss_list = []  # list to store loss to visualize
 valid_loss_min = np.Inf  # track change in validation loss

 for epoch in range(n_epochs):
    # Keep track of training and validation loss
    train_loss = 0.0
    valid_loss = 0.0

    # Train the model
    model.train()
    for data, target in train_loader:
        # Move tensors to GPU if CUDA is available
        if train_on_gpu:
            data, target = data.cuda(), target.cuda()
        # Clear the gradients of all optimized variables
        optimizer.zero_grad()
        # Forward pass: compute predicted outputs by passing inputs to the model
        output = model(data)
        # Calculate the batch loss
        loss = criterion(output, target)
        # Backward pass: compute gradient of the loss with respect to model parameters
        loss.backward()
        # Perform a single optimization step (parameter update)
        optimizer.step()
        # Update training loss
        train_loss += loss.item() * data.size(0)

    # Validate the model
    model.eval()
    for data, target in valid_loader:
        # Move tensors to GPU if CUDA is available
        if train_on_gpu:
            data, target = data.cuda(), target.cuda()
        # Forward pass: compute predicted outputs by passing inputs to the model
        output = model(data)
        # Calculate the batch loss
        loss = criterion(output, target)
        # Update average validation loss
        valid_loss += loss.item() * data.size(0)

    # Calculate average losses
    train_loss = train_loss / len(train_loader)
    valid_loss = valid_loss / len(valid_loader)
    train_loss_list.append(train_loss)

    # Print training/validation statistics
    print(
        "Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}".format(
            epoch, train_loss, valid_loss
        )
    )

    # Save model if validation loss has decreased
    if valid_loss <= valid_loss_min:
        print(
            "Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...".format(
                valid_loss_min, valid_loss
            )
        )
        torch.save(model.state_dict(), "model_cifar.pt")
        valid_loss_min = valid_loss
 ```

+%% Output
+
+    Epoch: 0 	Training Loss: 28.707199 	Validation Loss: 28.363214
+    Validation loss decreased (inf --> 28.363214).  Saving model ...
+    Epoch: 1 	Training Loss: 27.053440 	Validation Loss: 26.921309
+    Validation loss decreased (28.363214 --> 26.921309).  Saving model ...
+    Epoch: 2 	Training Loss: 25.798181 	Validation Loss: 25.484369
+    Validation loss decreased (26.921309 --> 25.484369).  Saving model ...
+    Epoch: 3 	Training Loss: 24.616021 	Validation Loss: 25.825257
+    Epoch: 4 	Training Loss: 23.607140 	Validation Loss: 24.406983
+    Validation loss decreased (25.484369 --> 24.406983).  Saving model ...
+    Epoch: 5 	Training Loss: 22.641223 	Validation Loss: 23.463277
+    Validation loss decreased (24.406983 --> 23.463277).  Saving model ...
+    Epoch: 6 	Training Loss: 21.727461 	Validation Loss: 23.323754
+    Validation loss decreased (23.463277 --> 23.323754).  Saving model ...
+    Epoch: 7 	Training Loss: 20.908013 	Validation Loss: 22.815489
+    Validation loss decreased (23.323754 --> 22.815489).  Saving model ...
+    Epoch: 8 	Training Loss: 20.072570 	Validation Loss: 22.468899
+    Validation loss decreased (22.815489 --> 22.468899).  Saving model ...
+    Epoch: 9 	Training Loss: 19.337123 	Validation Loss: 23.307148
+    Epoch: 10 	Training Loss: 18.578279 	Validation Loss: 22.322720
+    Validation loss decreased (22.468899 --> 22.322720).  Saving model ...
+    Epoch: 11 	Training Loss: 17.925301 	Validation Loss: 22.491466
+    Epoch: 12 	Training Loss: 17.266396 	Validation Loss: 22.145613
+    Validation loss decreased (22.322720 --> 22.145613).  Saving model ...
+    Epoch: 13 	Training Loss: 16.644972 	Validation Loss: 21.923327
+    Validation loss decreased (22.145613 --> 21.923327).  Saving model ...
+    Epoch: 14 	Training Loss: 16.097757 	Validation Loss: 22.242258
+    Epoch: 15 	Training Loss: 15.522903 	Validation Loss: 22.269535
+    Epoch: 16 	Training Loss: 14.930308 	Validation Loss: 23.073589
+    Epoch: 17 	Training Loss: 14.374154 	Validation Loss: 23.190186
+    Epoch: 18 	Training Loss: 13.829007 	Validation Loss: 23.638800
+    Epoch: 19 	Training Loss: 13.414001 	Validation Loss: 25.147587
+    Epoch: 20 	Training Loss: 12.890743 	Validation Loss: 24.385583
+    Epoch: 21 	Training Loss: 12.456227 	Validation Loss: 24.933902
+    Epoch: 22 	Training Loss: 11.993389 	Validation Loss: 25.289021
+    Epoch: 23 	Training Loss: 11.565563 	Validation Loss: 26.004760
+    Epoch: 24 	Training Loss: 11.188692 	Validation Loss: 26.451757
+    Epoch: 25 	Training Loss: 10.716678 	Validation Loss: 27.236794
+    Epoch: 26 	Training Loss: 10.315807 	Validation Loss: 27.493770
+    Epoch: 27 	Training Loss: 9.975283 	Validation Loss: 27.571290
+    Epoch: 28 	Training Loss: 9.440035 	Validation Loss: 29.006522
+    Epoch: 29 	Training Loss: 9.220511 	Validation Loss: 29.190469
+
 %% Cell type:markdown id:13e1df74 tags:

 Does overfit occur? If so, do an early stopping.

+%% Cell type:markdown id:4e567158 tags:
+
+Yes, overfitting occurs. This is evident starting around Epoch 15, where the Validation Loss stops decreasing and begins to oscillate or increase, while the Training Loss continues to decrease.
+This indicates the model is fitting too closely to the training data and failling to generalize well to the validation data.
+By doing an early stopping, the training should stop around Epoch 15, where the Validation Loss reaches its minimum value of 21.882406. Continuing beyond this point does not improve validation performance and increases the risk of overfitting.
+
+%% Cell type:code id:11952c52 tags:
+
+``` python
+# EARLY STOP
+import torch.optim as optim
+
+min_epochs = 10
+patience = 3 # Nb of epochs to wait after no improvement
+epochs_no_improve = 0
+
+
+criterion = nn.CrossEntropyLoss()  # specify loss function
+optimizer = optim.SGD(model.parameters(), lr=0.01)  # specify optimizer
+
+n_epochs = 30  # number of epochs to train the model
+valid_loss_list = []  # list to store validation loss to visualize
+train_loss_list = []  # list to store trainloss to visualize
+valid_loss_min = np.Inf  # track change in validation loss
+
+for epoch in range(n_epochs):
+    # Keep track of training and validation loss
+    train_loss = 0.0
+    valid_loss = 0.0
+
+    # Train the model
+    model.train()
+    for data, target in train_loader:
+        # Move tensors to GPU if CUDA is available
+        if train_on_gpu:
+            data, target = data.cuda(), target.cuda()
+        # Clear the gradients of all optimized variables
+        optimizer.zero_grad()
+        # Forward pass: compute predicted outputs by passing inputs to the model
+        output = model(data)
+        # Calculate the batch loss
+        loss = criterion(output, target)
+        # Backward pass: compute gradient of the loss with respect to model parameters
+        loss.backward()
+        # Perform a single optimization step (parameter update)
+        optimizer.step()
+        # Update training loss
+        train_loss += loss.item() * data.size(0)
+
+    # Validate the model
+    model.eval()
+    for data, target in valid_loader:
+        # Move tensors to GPU if CUDA is available
+        if train_on_gpu:
+            data, target = data.cuda(), target.cuda()
+        # Forward pass: compute predicted outputs by passing inputs to the model
+        output = model(data)
+        # Calculate the batch loss
+        loss = criterion(output, target)
+        # Update average validation loss
+        valid_loss += loss.item() * data.size(0)
+
+    # Calculate average losses
+    train_loss = train_loss / len(train_loader)
+    valid_loss = valid_loss / len(valid_loader)
+    train_loss_list.append(train_loss)
+    valid_loss_list.append(valid_loss)
+
+    # Print training/validation statistics
+    print(
+        "Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}".format(
+            epoch, train_loss, valid_loss
+        )
+    )
+
+    # Save model if validation loss has decreased
+    if valid_loss <= valid_loss_min:
+        print(
+            "Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...".format(
+                valid_loss_min, valid_loss
+            )
+        )
+        torch.save(model.state_dict(), "model_cifar_1_early_stop.pt")
+        valid_loss_min = valid_loss
+        epochs_no_improve = 0
+    elif epoch >= min_epochs:
+        epochs_no_improve += 1
+        if epochs_no_improve >= patience:
+            print(f"Validation loss increased for {patience} times consecutives. Applying Early Stop.")
+            break
+```
+
+%% Output
+
+    Epoch: 0 	Training Loss: 8.891932 	Validation Loss: 30.875338
+    Validation loss decreased (inf --> 30.875338).  Saving model ...
+
+    ---------------------------------------------------------------------------
+    KeyboardInterrupt                         Traceback (most recent call last)
+Cell     In[35], line 35
+         33 loss = criterion(output, target)
+         34 # Backward pass: compute gradient of the loss with respect to model parameters
+    ---> 35 loss.backward()
+         36 # Perform a single optimization step (parameter update)
+         37 optimizer.step()
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/_tensor.py:581, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
+        571 if has_torch_function_unary(self):
+        572     return handle_torch_function(
+        573         Tensor.backward,
+        574         (self,),
+       (...)
+        579         inputs=inputs,
+        580     )
+    --> 581 torch.autograd.backward(
+        582     self, gradient, retain_graph, create_graph, inputs=inputs
+        583 )
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/autograd/__init__.py:347, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
+        342     retain_graph = create_graph
+        344 # The reason we repeat the same comment below is that
+        345 # some Python versions print out the first line of a multi-line function
+        346 # calls in the traceback and some print out the last line
+    --> 347 _engine_run_backward(
+        348     tensors,
+        349     grad_tensors_,
+        350     retain_graph,
+        351     create_graph,
+        352     inputs,
+        353     allow_unreachable=True,
+        354     accumulate_grad=True,
+        355 )
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/autograd/graph.py:825, in _engine_run_backward(t_outputs, *args, **kwargs)
+        823     unregister_hooks = _register_logging_hooks_on_whole_graph(t_outputs)
+        824 try:
+    --> 825     return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
+        826         t_outputs, *args, **kwargs
+        827     )  # Calls into the C++ engine to run the backward pass
+        828 finally:
+        829     if attach_logging_hooks:
+    KeyboardInterrupt:
+
 %% Cell type:code id:d39df818 tags:

 ``` python
 import matplotlib.pyplot as plt

-plt.plot(range(n_epochs), train_loss_list)
+plt.plot(range(len(train_loss_list)), train_loss_list)
+plt.xlabel("Epoch")
+plt.ylabel("Train Loss")
+plt.title("Performance of Model 1")
+plt.show()
+```
+
+%% Output
+
+
+
+%% Cell type:code id:2111dfe9 tags:
+
+``` python
+import matplotlib.pyplot as plt
+
+plt.plot(range(len(valid_loss_list)), valid_loss_list)
 plt.xlabel("Epoch")
-plt.ylabel("Loss")
+plt.ylabel("Validation Loss")
 plt.title("Performance of Model 1")
 plt.show()
 ```

+%% Output
+
+
+
 %% Cell type:markdown id:11df8fd4 tags:

 Now loading the model with the lowest validation loss value

 %% Cell type:code id:e93efdfc tags:

 ``` python
-model.load_state_dict(torch.load("./model_cifar.pt"))
+# model.load_state_dict(torch.load("./model_cifar.pt"))
+model.load_state_dict(torch.load("./model_cifar_1_early_stop.pt"))

 # track test loss
 test_loss = 0.0
 class_correct = list(0.0 for i in range(10))
 class_total = list(0.0 for i in range(10))

 model.eval()
 # iterate over test data
 for data, target in test_loader:
    # move tensors to GPU if CUDA is available
    if train_on_gpu:
        data, target = data.cuda(), target.cuda()
    # forward pass: compute predicted outputs by passing inputs to the model
    output = model(data)
    # calculate the batch loss
    loss = criterion(output, target)
    # update test loss
    test_loss += loss.item() * data.size(0)
    # convert output probabilities to predicted class
    _, pred = torch.max(output, 1)
    # compare predictions to true label
    correct_tensor = pred.eq(target.data.view_as(pred))
    correct = (
        np.squeeze(correct_tensor.numpy())
        if not train_on_gpu
        else np.squeeze(correct_tensor.cpu().numpy())
    )
    # calculate test accuracy for each object class
    for i in range(batch_size):
        label = target.data[i]
        class_correct[label] += correct[i].item()
        class_total[label] += 1

 # average test loss
 test_loss = test_loss / len(test_loader)
 print("Test Loss: {:.6f}\n".format(test_loss))

 for i in range(10):
    if class_total[i] > 0:
        print(
            "Test Accuracy of %5s: %2d%% (%2d/%2d)"
            % (
                classes[i],
                100 * class_correct[i] / class_total[i],
                np.sum(class_correct[i]),
                np.sum(class_total[i]),
            )
        )
    else:
        print("Test Accuracy of %5s: N/A (no training examples)" % (classes[i]))

 print(
    "\nTest Accuracy (Overall): %2d%% (%2d/%2d)"
    % (
        100.0 * np.sum(class_correct) / np.sum(class_total),
        np.sum(class_correct),
        np.sum(class_total),
    )
 )
 ```

+%% Output
+
+    /var/folders/qb/94v41qkx157gvjjjv1rchcr00000gn/T/ipykernel_25820/3291884398.py:1: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
+      model.load_state_dict(torch.load("./model_cifar.pt"))
+
+    Test Loss: 21.811477
+    
+    Test Accuracy of airplane: 71% (716/1000)
+    Test Accuracy of automobile: 75% (750/1000)
+    Test Accuracy of  bird: 55% (558/1000)
+    Test Accuracy of   cat: 44% (442/1000)
+    Test Accuracy of  deer: 60% (604/1000)
+    Test Accuracy of   dog: 52% (521/1000)
+    Test Accuracy of  frog: 64% (644/1000)
+    Test Accuracy of horse: 58% (588/1000)
+    Test Accuracy of  ship: 74% (746/1000)
+    Test Accuracy of truck: 68% (681/1000)
+    
+    Test Accuracy (Overall): 62% (6250/10000)
+
 %% Cell type:markdown id:944991a2 tags:

 Build a new network with the following structure.

 - It has 3 convolutional layers of kernel size 3 and padding of 1.
 - The first convolutional layer must output 16 channels, the second 32 and the third 64.
 - At each convolutional layer output, we apply a ReLU activation then a MaxPool with kernel size of 2.
 - Then, three fully connected layers, the first two being followed by a ReLU activation and a dropout whose value you will suggest.
 - The first fully connected layer will have an output size of 512.
 - The second fully connected layer will have an output size of 64.

 Compare the results obtained with this new network to those obtained previously.

+%% Cell type:code id:8b67c2c6 tags:
+
+``` python
+import torch.nn as nn
+import torch.nn.functional as F
+
+# define the CNN architecture
+
+class NewNet(nn.Module):
+    def __init__(self, dropout_value=0.5):
+        super(NewNet, self).__init__()
+        # Convolutional layers
+        self.conv1 = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, padding=1)
+        self.conv2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, padding=1)
+        self.conv3 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1)
+
+        # MaxPool
+        self.pool = nn.MaxPool2d(kernel_size=2)
+
+        # Dropout
+        self.dropout = nn.Dropout(p=dropout_value)
+
+        # Fully connected layers
+        # self.fc1 = nn.Linear(in_features=64 * (input_size // 8) * (input_size // 8), out_features=512)
+        self.fc1 = nn.Linear(in_features=64 * 4 * 4, out_features=512)
+        self.fc2 = nn.Linear(in_features=512, out_features=64)
+        self.fc3 = nn.Linear(64, 10)
+
+    def forward(self, x):
+        # Convolutional layers with ReLU and MaxPool
+        x = self.pool(F.relu(self.conv1(x)))
+        x = self.pool(F.relu(self.conv2(x)))
+        x = self.pool(F.relu(self.conv3(x)))
+
+        x = x.view(x.size(0), -1)
+        x = self.dropout(F.relu(self.fc1(x)))
+        x = self.dropout(F.relu(self.fc2(x)))
+        x = self.fc3(x)
+        return x
+
+
+# # create a complete CNN
+# new_model = NewNet()
+# print(new_model)
+# # move tensors to GPU if CUDA is available
+# if train_on_gpu:
+#     new_model.cuda()
+```
+
+%% Cell type:code id:3cc6cc8a tags:
+
+``` python
+
+# create a complete CNN
+new_model = NewNet()
+print(new_model)
+# move tensors to GPU if CUDA is available
+if train_on_gpu:
+    new_model.cuda()
+
+
+import torch.optim as optim
+
+min_epochs = 10
+patience = 3 # Nb of epochs to wait after no improvement
+epochs_no_improve = 0
+
+
+criterion = nn.CrossEntropyLoss()  # specify loss function
+optimizer = optim.SGD(new_model.parameters(), lr=0.01)  # specify optimizer
+
+n_epochs = 30  # number of epochs to train the model
+valid_loss_list = []  # list to store validation loss to visualize
+train_loss_list = []  # list to store trainloss to visualize
+valid_loss_min = np.Inf  # track change in validation loss
+
+for epoch in range(n_epochs):
+    # Keep track of training and validation loss
+    train_loss = 0.0
+    valid_loss = 0.0
+
+    # Train the model
+    new_model.train()
+    for data, target in train_loader:
+        # Move tensors to GPU if CUDA is available
+        if train_on_gpu:
+            data, target = data.cuda(), target.cuda()
+        # Clear the gradients of all optimized variables
+        optimizer.zero_grad()
+        # Forward pass: compute predicted outputs by passing inputs to the model
+        output = new_model(data)
+        # Calculate the batch loss
+        loss = criterion(output, target)
+        # Backward pass: compute gradient of the loss with respect to model parameters
+        loss.backward()
+        # Perform a single optimization step (parameter update)
+        optimizer.step()
+        # Update training loss
+        train_loss += loss.item() * data.size(0)
+
+    # Validate the model
+    new_model.eval()
+    for data, target in valid_loader:
+        # Move tensors to GPU if CUDA is available
+        if train_on_gpu:
+            data, target = data.cuda(), target.cuda()
+        # Forward pass: compute predicted outputs by passing inputs to the model
+        output = new_model(data)
+        # Calculate the batch loss
+        loss = criterion(output, target)
+        # Update average validation loss
+        valid_loss += loss.item() * data.size(0)
+
+    # Calculate average losses
+    train_loss = train_loss / len(train_loader)
+    valid_loss = valid_loss / len(valid_loader)
+    train_loss_list.append(train_loss)
+    valid_loss_list.append(valid_loss)
+
+    # Print training/validation statistics
+    print(
+        "Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}".format(
+            epoch, train_loss, valid_loss
+        )
+    )
+
+    # Save model if validation loss has decreased
+    if valid_loss <= valid_loss_min:
+        print(
+            "Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...".format(
+                valid_loss_min, valid_loss
+            )
+        )
+        torch.save(new_model.state_dict(), "model_cifar_2.pt")
+        valid_loss_min = valid_loss
+        epochs_no_improve = 0
+    elif epoch >= min_epochs:
+        epochs_no_improve += 1
+        if epochs_no_improve >= patience:
+            print(f"Validation loss increased for {patience} times consecutives. Applying Early Stop.")
+            break
+```
+
+%% Output
+
+    NewNet(
+      (conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
+      (conv2): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
+      (conv3): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
+      (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
+      (dropout): Dropout(p=0.5, inplace=False)
+      (fc1): Linear(in_features=1024, out_features=512, bias=True)
+      (fc2): Linear(in_features=512, out_features=64, bias=True)
+      (fc3): Linear(in_features=64, out_features=10, bias=True)
+    )
+    Epoch: 0 	Training Loss: 44.934554 	Validation Loss: 40.292926
+    Validation loss decreased (inf --> 40.292926).  Saving model ...
+    Epoch: 1 	Training Loss: 38.547384 	Validation Loss: 34.307230
+    Validation loss decreased (40.292926 --> 34.307230).  Saving model ...
+    Epoch: 2 	Training Loss: 34.167031 	Validation Loss: 30.783441
+    Validation loss decreased (34.307230 --> 30.783441).  Saving model ...
+    Epoch: 3 	Training Loss: 31.514744 	Validation Loss: 29.177271
+    Validation loss decreased (30.783441 --> 29.177271).  Saving model ...
+    Epoch: 4 	Training Loss: 29.490232 	Validation Loss: 26.770098
+    Validation loss decreased (29.177271 --> 26.770098).  Saving model ...
+    Epoch: 5 	Training Loss: 27.982251 	Validation Loss: 25.774428
+    Validation loss decreased (26.770098 --> 25.774428).  Saving model ...
+    Epoch: 6 	Training Loss: 26.515079 	Validation Loss: 24.038370
+    Validation loss decreased (25.774428 --> 24.038370).  Saving model ...
+    Epoch: 7 	Training Loss: 25.259680 	Validation Loss: 23.620053
+    Validation loss decreased (24.038370 --> 23.620053).  Saving model ...
+    Epoch: 8 	Training Loss: 23.969766 	Validation Loss: 22.249926
+    Validation loss decreased (23.620053 --> 22.249926).  Saving model ...
+    Epoch: 9 	Training Loss: 23.044149 	Validation Loss: 21.061266
+    Validation loss decreased (22.249926 --> 21.061266).  Saving model ...
+    Epoch: 10 	Training Loss: 21.929328 	Validation Loss: 20.193573
+    Validation loss decreased (21.061266 --> 20.193573).  Saving model ...
+    Epoch: 11 	Training Loss: 21.162510 	Validation Loss: 19.769918
+    Validation loss decreased (20.193573 --> 19.769918).  Saving model ...
+    Epoch: 12 	Training Loss: 20.163602 	Validation Loss: 19.290062
+    Validation loss decreased (19.769918 --> 19.290062).  Saving model ...
+    Epoch: 13 	Training Loss: 19.370121 	Validation Loss: 18.626375
+    Validation loss decreased (19.290062 --> 18.626375).  Saving model ...
+    Epoch: 14 	Training Loss: 18.558041 	Validation Loss: 18.075628
+    Validation loss decreased (18.626375 --> 18.075628).  Saving model ...
+
+%% Cell type:code id:97355006 tags:
+
+``` python
+model.load_state_dict(torch.load("./model_cifar_2.pt"))
+
+# track test loss
+test_loss = 0.0
+class_correct = list(0.0 for i in range(10))
+class_total = list(0.0 for i in range(10))
+
+model.eval()
+# iterate over test data
+for data, target in test_loader:
+    # move tensors to GPU if CUDA is available
+    if train_on_gpu:
+        data, target = data.cuda(), target.cuda()
+    # forward pass: compute predicted outputs by passing inputs to the model
+    output = model(data)
+    # calculate the batch loss
+    loss = criterion(output, target)
+    # update test loss
+    test_loss += loss.item() * data.size(0)
+    # convert output probabilities to predicted class
+    _, pred = torch.max(output, 1)
+    # compare predictions to true label
+    correct_tensor = pred.eq(target.data.view_as(pred))
+    correct = (
+        np.squeeze(correct_tensor.numpy())
+        if not train_on_gpu
+        else np.squeeze(correct_tensor.cpu().numpy())
+    )
+    # calculate test accuracy for each object class
+    for i in range(batch_size):
+        label = target.data[i]
+        class_correct[label] += correct[i].item()
+        class_total[label] += 1
+
+# average test loss
+test_loss = test_loss / len(test_loader)
+print("Test Loss: {:.6f}\n".format(test_loss))
+
+for i in range(10):
+    if class_total[i] > 0:
+        print(
+            "Test Accuracy of %5s: %2d%% (%2d/%2d)"
+            % (
+                classes[i],
+                100 * class_correct[i] / class_total[i],
+                np.sum(class_correct[i]),
+                np.sum(class_total[i]),
+            )
+        )
+    else:
+        print("Test Accuracy of %5s: N/A (no training examples)" % (classes[i]))
+
+print(
+    "\nTest Accuracy (Overall): %2d%% (%2d/%2d)"
+    % (
+        100.0 * np.sum(class_correct) / np.sum(class_total),
+        np.sum(class_correct),
+        np.sum(class_total),
+    )
+)
+```
+
+%% Output
+
+    /var/folders/qb/94v41qkx157gvjjjv1rchcr00000gn/T/ipykernel_32008/3634208260.py:1: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
+      model.load_state_dict(torch.load("./model_cifar_2.pt"))
+
+    ---------------------------------------------------------------------------
+    FileNotFoundError                         Traceback (most recent call last)
+Cell     In[31], line 1
+    ----> 1 model.load_state_dict(torch.load("./model_cifar_2.pt"))
+          3 # track test loss
+          4 test_loss = 0.0
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/serialization.py:1319, in load(f, map_location, pickle_module, weights_only, mmap, **pickle_load_args)
+       1316 if "encoding" not in pickle_load_args.keys():
+       1317     pickle_load_args["encoding"] = "utf-8"
+    -> 1319 with _open_file_like(f, "rb") as opened_file:
+       1320     if _is_zipfile(opened_file):
+       1321         # The zipfile reader is going to advance the current file position.
+       1322         # If we want to actually tail call to torch.jit.load, we need to
+       1323         # reset back to the original position.
+       1324         orig_position = opened_file.tell()
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/serialization.py:659, in _open_file_like(name_or_buffer, mode)
+        657 def _open_file_like(name_or_buffer, mode):
+        658     if _is_path(name_or_buffer):
+    --> 659         return _open_file(name_or_buffer, mode)
+        660     else:
+        661         if "w" in mode:
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/serialization.py:640, in _open_file.__init__(self, name, mode)
+        639 def __init__(self, name, mode):
+    --> 640     super().__init__(open(name, mode))
+    FileNotFoundError: [Errno 2] No such file or directory: './model_cifar_2.pt'
+
+%% Cell type:markdown id:6245b27f tags:
+
+# Test Accuracy: Model 1 v/s Model 2
+
+## Test Accuracy Model 1:
+* Test Loss: 21.811477
+
+* Test Accuracy of airplane: 71% (716/1000)
+* Test Accuracy of automobile: 75% (750/1000)
+* Test Accuracy of  bird: 55% (558/1000)
+* Test Accuracy of   cat: 44% (442/1000)
+* Test Accuracy of  deer: 60% (604/1000)
+* Test Accuracy of   dog: 52% (521/1000)
+* Test Accuracy of  frog: 64% (644/1000)
+* Test Accuracy of horse: 58% (588/1000)
+* Test Accuracy of  ship: 74% (746/1000)
+* Test Accuracy of truck: 68% (681/1000)
+
+* Test Accuracy (Overall): 62% (6250/10000)
+
+
+## Test Accuracy Model 2:
+* Test Loss: 16.239906
+
+* Test Accuracy of airplane: 78% (784/1000)
+* Test Accuracy of automobile: 88% (889/1000)
+* Test Accuracy of  bird: 61% (618/1000)
+* Test Accuracy of   cat: 61% (615/1000)
+* Test Accuracy of  deer: 66% (662/1000)
+* Test Accuracy of   dog: 50% (509/1000)
+* Test Accuracy of  frog: 82% (823/1000)
+* Test Accuracy of horse: 73% (732/1000)
+* Test Accuracy of  ship: 86% (862/1000)
+* Test Accuracy of truck: 75% (751/1000)
+
+* Test Accuracy (Overall): 72% (7245/10000)
+
 %% Cell type:markdown id:bc381cf4 tags:

 ## Exercise 2: Quantization: try to compress the CNN to save space

 Quantization doc is available from https://pytorch.org/docs/stable/quantization.html#torch.quantization.quantize_dynamic

 The Exercise is to quantize post training the above CNN model. Compare the size reduction and the impact on the classification accuracy


 The size of the model is simply the size of the file.

 %% Cell type:code id:ef623c26 tags:

 ``` python
 import os


 def print_size_of_model(model, label=""):
    torch.save(model.state_dict(), "temp.p")
    size = os.path.getsize("temp.p")
    print("model: ", label, " \t", "Size (KB):", size / 1e3)
    os.remove("temp.p")
    return size


 print_size_of_model(model, "fp32")
 ```

+%% Output
+
+    model:  fp32  	 Size (KB): 2330.946
+
+    2330946
+
 %% Cell type:markdown id:05c4e9ad tags:

 Post training quantization example

 %% Cell type:code id:c4c65d4b tags:

 ``` python
 import torch.quantization

-
+torch.backends.quantized.engine = 'qnnpack'
 quantized_model = torch.quantization.quantize_dynamic(model, dtype=torch.qint8)
 print_size_of_model(quantized_model, "int8")
 ```

+%% Output
+
+    ---------------------------------------------------------------------------
+    RuntimeError                              Traceback (most recent call last)
+Cell     In[30], line 4
+          1 import torch.quantization
+    ----> 4 quantized_model = torch.quantization.quantize_dynamic(model, dtype=torch.qint8)
+          5 print_size_of_model(quantized_model, "int8")
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/ao/quantization/quantize.py:564, in quantize_dynamic(model, qconfig_spec, dtype, mapping, inplace)
+        562 model.eval()
+        563 propagate_qconfig_(model, qconfig_spec)
+    --> 564 convert(model, mapping, inplace=True)
+        565 return model
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/ao/quantization/quantize.py:659, in convert(module, mapping, inplace, remove_qconfig, is_reference, convert_custom_config_dict, use_precomputed_fake_quant)
+        657 if not inplace:
+        658     module = copy.deepcopy(module)
+    --> 659 _convert(
+        660     module,
+        661     mapping,
+        662     inplace=True,
+        663     is_reference=is_reference,
+        664     convert_custom_config_dict=convert_custom_config_dict,
+        665     use_precomputed_fake_quant=use_precomputed_fake_quant,
+        666 )
+        667 if remove_qconfig:
+        668     _remove_qconfig(module)
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/ao/quantization/quantize.py:724, in _convert(module, mapping, inplace, is_reference, convert_custom_config_dict, use_precomputed_fake_quant)
+        712     if (
+        713         not isinstance(mod, _FusedModule)
+        714         and type_before_parametrizations(mod) not in custom_module_class_mapping
+        715     ):
+        716         _convert(
+        717             mod,
+        718             mapping,
+       (...)
+        722             use_precomputed_fake_quant=use_precomputed_fake_quant,
+        723         )
+    --> 724     reassign[name] = swap_module(
+        725         mod, mapping, custom_module_class_mapping, use_precomputed_fake_quant
+        726     )
+        728 for key, value in reassign.items():
+        729     module._modules[key] = value
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/ao/quantization/quantize.py:766, in swap_module(mod, mapping, custom_module_class_mapping, use_precomputed_fake_quant)
+        764 sig = inspect.signature(qmod.from_float)
+        765 if "use_precomputed_fake_quant" in sig.parameters:
+    --> 766     new_mod = qmod.from_float(
+        767         mod, use_precomputed_fake_quant=use_precomputed_fake_quant
+        768     )
+        769 else:
+        770     new_mod = qmod.from_float(mod)
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/ao/nn/quantized/dynamic/modules/linear.py:145, in Linear.from_float(cls, mod, use_precomputed_fake_quant)
+        141 else:
+        142     raise RuntimeError(
+        143         "Unsupported dtype specified for dynamic quantized Linear!"
+        144     )
+    --> 145 qlinear = cls(mod.in_features, mod.out_features, dtype=dtype)
+        146 qlinear.set_weight_bias(qweight, mod.bias)
+        147 return qlinear
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/ao/nn/quantized/dynamic/modules/linear.py:42, in Linear.__init__(self, in_features, out_features, bias_, dtype)
+         41 def __init__(self, in_features, out_features, bias_=True, dtype=torch.qint8):
+    ---> 42     super().__init__(in_features, out_features, bias_, dtype=dtype)
+         43     # We don't muck around with buffers or attributes or anything here
+         44     # to keep the module simple. *everything* is simply a Python attribute.
+         45     # Serialization logic is explicitly handled in the below serialization and
+         46     # deserialization modules
+         47     self.version = 4
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/ao/nn/quantized/modules/linear.py:172, in Linear.__init__(self, in_features, out_features, bias_, dtype)
+        169 else:
+        170     raise RuntimeError("Unsupported dtype specified for quantized Linear!")
+    --> 172 self._packed_params = LinearPackedParams(dtype)
+        173 self._packed_params.set_weight_bias(qweight, bias)
+        174 self.scale = 1.0
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/ao/nn/quantized/modules/linear.py:31, in LinearPackedParams.__init__(self, dtype)
+         29 elif self.dtype == torch.float16:
+         30     wq = torch.zeros([1, 1], dtype=torch.float)
+    ---> 31 self.set_weight_bias(wq, None)
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/ao/nn/quantized/modules/linear.py:38, in LinearPackedParams.set_weight_bias(self, weight, bias)
+         33 @torch.jit.export
+         34 def set_weight_bias(
+         35     self, weight: torch.Tensor, bias: Optional[torch.Tensor]
+         36 ) -> None:
+         37     if self.dtype == torch.qint8:
+    ---> 38         self._packed_params = torch.ops.quantized.linear_prepack(weight, bias)
+         39     elif self.dtype == torch.float16:
+         40         self._packed_params = torch.ops.quantized.linear_prepack_fp16(weight, bias)
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/_ops.py:1116, in OpOverloadPacket.__call__(self, *args, **kwargs)
+       1114 if self._has_torchbind_op_overload and _must_dispatch_in_python(args, kwargs):
+       1115     return _call_overload_packet_from_python(self, args, kwargs)
+    -> 1116 return self._op(*args, **(kwargs or {}))
+    RuntimeError: Didn't find engine for operation quantized::linear_prepack NoQEngine
+
+%% Cell type:markdown id:063d405c tags:
+
+
 %% Cell type:markdown id:7b108e17 tags:

 For each class, compare the classification test accuracy of the initial model and the quantized model. Also give the overall test accuracy for both models.

 %% Cell type:markdown id:a0a34b90 tags:

 Try training aware quantization to mitigate the impact on the accuracy (doc available here https://pytorch.org/docs/stable/quantization.html#torch.quantization.quantize_dynamic)

 %% Cell type:markdown id:201470f9 tags:

 ## Exercise 3: working with pre-trained models.

 PyTorch offers several pre-trained models https://pytorch.org/vision/0.8/models.html
 We will use ResNet50 trained on ImageNet dataset (https://www.image-net.org/index.php). Use the following code with the files `imagenet-simple-labels.json` that contains the imagenet labels and the image dog.png that we will use as test.

 %% Cell type:code id:b4d13080 tags:

 ``` python
 import json
 from PIL import Image

 # Choose an image to pass through the model
 test_image = "dog.png"

 # Configure matplotlib for pretty inline plots
 #%matplotlib inline
 #%config InlineBackend.figure_format = 'retina'

 # Prepare the labels
 with open("imagenet-simple-labels.json") as f:
    labels = json.load(f)

 # First prepare the transformations: resize the image to what the model was trained on and convert it to a tensor
 data_transform = transforms.Compose(
    [
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
    ]
 )
 # Load the image

 image = Image.open(test_image)
 plt.imshow(image), plt.xticks([]), plt.yticks([])

 # Now apply the transformation, expand the batch dimension, and send the image to the GPU
 # image = data_transform(image).unsqueeze(0).cuda()
 image = data_transform(image).unsqueeze(0)

 # Download the model if it's not there already. It will take a bit on the first run, after that it's fast
 model = models.resnet50(pretrained=True)
 # Send the model to the GPU
 # model.cuda()
 # Set layers such as dropout and batchnorm in evaluation mode
 model.eval()

 # Get the 1000-dimensional model output
 out = model(image)
 # Find the predicted class
 print("Predicted class is: {}".format(labels[out.argmax()]))
 ```

 %% Cell type:markdown id:184cfceb tags:

 Experiments:

 Study the code and the results obtained. Possibly add other images downloaded from the internet.

 What is the size of the model? Quantize it and then check if the model is still able to correctly classify the other images.

 Experiment with other pre-trained CNN models.



 %% Cell type:markdown id:5d57da4b tags:

 ## Exercise 4: Transfer Learning


 For this work, we will use a pre-trained model (ResNet18) as a descriptor extractor and will refine the classification by training only the last fully connected layer of the network. Thus, the output layer of the pre-trained network will be replaced by a layer adapted to the new classes to be recognized which will be in our case ants and bees.
 Download and unzip in your working directory the dataset available at the address :

 https://download.pytorch.org/tutorial/hymenoptera_data.zip

 Execute the following code in order to display some images of the dataset.

 %% Cell type:code id:be2d31f5 tags:

 ``` python
 import os

 import matplotlib.pyplot as plt
 import numpy as np
 import torch
 import torchvision
 from torchvision import datasets, transforms

 # Data augmentation and normalization for training
 # Just normalization for validation
 data_transforms = {
    "train": transforms.Compose(
        [
            transforms.RandomResizedCrop(
                224
            ),  # ImageNet models were trained on 224x224 images
            transforms.RandomHorizontalFlip(),  # flip horizontally 50% of the time - increases train set variability
            transforms.ToTensor(),  # convert it to a PyTorch tensor
            transforms.Normalize(
                [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]
            ),  # ImageNet models expect this norm
        ]
    ),
    "val": transforms.Compose(
        [
            transforms.Resize(256),
            transforms.CenterCrop(224),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
        ]
    ),
 }

 data_dir = "hymenoptera_data"
 # Create train and validation datasets and loaders
 image_datasets = {
    x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x])
    for x in ["train", "val"]
 }
 dataloaders = {
    x: torch.utils.data.DataLoader(
        image_datasets[x], batch_size=4, shuffle=True, num_workers=0
    )
    for x in ["train", "val"]
 }
 dataset_sizes = {x: len(image_datasets[x]) for x in ["train", "val"]}
 class_names = image_datasets["train"].classes
 device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

 # Helper function for displaying images
 def imshow(inp, title=None):
    """Imshow for Tensor."""
    inp = inp.numpy().transpose((1, 2, 0))
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])

    # Un-normalize the images
    inp = std * inp + mean
    # Clip just in case
    inp = np.clip(inp, 0, 1)
    plt.imshow(inp)
    if title is not None:
        plt.title(title)
    plt.pause(0.001)  # pause a bit so that plots are updated
    plt.show()


 # Get a batch of training data
 inputs, classes = next(iter(dataloaders["train"]))

 # Make a grid from batch
 out = torchvision.utils.make_grid(inputs)

 imshow(out, title=[class_names[x] for x in classes])

 ```

 %% Cell type:markdown id:bbd48800 tags:

 Now, execute the following code which uses a pre-trained model ResNet18 having replaced the output layer for the ants/bees classification and performs the model training by only changing the weights of this output layer.

 %% Cell type:code id:572d824c tags:

 ``` python
 import copy
 import os
 import time

 import matplotlib.pyplot as plt
 import numpy as np
 import torch
 import torch.nn as nn
 import torch.optim as optim
 import torchvision
 from torch.optim import lr_scheduler
 from torchvision import datasets, transforms

 # Data augmentation and normalization for training
 # Just normalization for validation
 data_transforms = {
    "train": transforms.Compose(
        [
            transforms.RandomResizedCrop(
                224
            ),  # ImageNet models were trained on 224x224 images
            transforms.RandomHorizontalFlip(),  # flip horizontally 50% of the time - increases train set variability
            transforms.ToTensor(),  # convert it to a PyTorch tensor
            transforms.Normalize(
                [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]
            ),  # ImageNet models expect this norm
        ]
    ),
    "val": transforms.Compose(
        [
            transforms.Resize(256),
            transforms.CenterCrop(224),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
        ]
    ),
 }

 data_dir = "hymenoptera_data"
 # Create train and validation datasets and loaders
 image_datasets = {
    x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x])
    for x in ["train", "val"]
 }
 dataloaders = {
    x: torch.utils.data.DataLoader(
        image_datasets[x], batch_size=4, shuffle=True, num_workers=4
    )
    for x in ["train", "val"]
 }
 dataset_sizes = {x: len(image_datasets[x]) for x in ["train", "val"]}
 class_names = image_datasets["train"].classes
 device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

 # Helper function for displaying images
 def imshow(inp, title=None):
    """Imshow for Tensor."""
    inp = inp.numpy().transpose((1, 2, 0))
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])

    # Un-normalize the images
    inp = std * inp + mean
    # Clip just in case
    inp = np.clip(inp, 0, 1)
    plt.imshow(inp)
    if title is not None:
        plt.title(title)
    plt.pause(0.001)  # pause a bit so that plots are updated
    plt.show()


 # Get a batch of training data
 # inputs, classes = next(iter(dataloaders['train']))

 # Make a grid from batch
 # out = torchvision.utils.make_grid(inputs)

 # imshow(out, title=[class_names[x] for x in classes])
 # training


 def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
    since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    epoch_time = []  # we'll keep track of the time needed for each epoch

    for epoch in range(num_epochs):
        epoch_start = time.time()
        print("Epoch {}/{}".format(epoch + 1, num_epochs))
        print("-" * 10)

        # Each epoch has a training and validation phase
        for phase in ["train", "val"]:
            if phase == "train":
                scheduler.step()
                model.train()  # Set model to training mode
            else:
                model.eval()  # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # Forward
                # Track history if only in training phase
                with torch.set_grad_enabled(phase == "train"):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)

                    # backward + optimize only if in training phase
                    if phase == "train":
                        loss.backward()
                        optimizer.step()

                # Statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            print("{} Loss: {:.4f} Acc: {:.4f}".format(phase, epoch_loss, epoch_acc))

            # Deep copy the model
            if phase == "val" and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())

        # Add the epoch time
        t_epoch = time.time() - epoch_start
        epoch_time.append(t_epoch)
        print()

    time_elapsed = time.time() - since
    print(
        "Training complete in {:.0f}m {:.0f}s".format(
            time_elapsed // 60, time_elapsed % 60
        )
    )
    print("Best val Acc: {:4f}".format(best_acc))

    # Load best model weights
    model.load_state_dict(best_model_wts)
    return model, epoch_time


 # Download a pre-trained ResNet18 model and freeze its weights
 model = torchvision.models.resnet18(pretrained=True)
 for param in model.parameters():
    param.requires_grad = False

 # Replace the final fully connected layer
 # Parameters of newly constructed modules have requires_grad=True by default
 num_ftrs = model.fc.in_features
 model.fc = nn.Linear(num_ftrs, 2)
 # Send the model to the GPU
 model = model.to(device)
 # Set the loss function
 criterion = nn.CrossEntropyLoss()

 # Observe that only the parameters of the final layer are being optimized
 optimizer_conv = optim.SGD(model.fc.parameters(), lr=0.001, momentum=0.9)
 exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)
 model, epoch_time = train_model(
    model, criterion, optimizer_conv, exp_lr_scheduler, num_epochs=10
 )
 ```

 %% Cell type:markdown id:bbd48800 tags:

 Experiments:
 Study the code and the results obtained.

 Modify the code and add an "eval_model" function to allow
 the evaluation of the model on a test set (different from the learning and validation sets used during the learning phase). Study the results obtained.

 Now modify the code to replace the current classification layer with a set of two layers using a "relu" activation function for the middle layer, and the "dropout" mechanism for both layers. Renew the experiments and study the results obtained.

 Apply ther quantization (post and quantization aware) and evaluate impact on model size and accuracy.

 %% Cell type:markdown id:04a263f0 tags:

 ## Optional

 Try this at home!!


 Pytorch offers a framework to export a given CNN to your selfphone (either android or iOS). Have a look at the tutorial https://pytorch.org/mobile/home/

 The Exercise consists in deploying the CNN of Exercise 4 in your phone and then test it on live.


 %% Cell type:markdown id:fe954ce4 tags:

 ## Author

 Alberto BOSIO - Ph. D.

 %% Cell type:markdown id:7edf7168 tags:

 # TD2: Deep learning

 %% Cell type:markdown id:fbb8c8df tags:

 In this TD, you must modify this notebook to answer the questions. To do this,

 1. Fork this repository
 2. Clone your forked repository on your local computer
 3. Answer the questions
 4. Commit and push regularly

 The last commit is due on Wednesday, December 4, 11:59 PM. Later commits will not be taken into account.

 %% Cell type:markdown id:3d167a29 tags:

 Install and test PyTorch from  https://pytorch.org/get-started/locally.

 %% Cell type:code id:330a42f5 tags:

 ``` python
 %pip install torch torchvision
 ```

+%% Output
+
+    Requirement already satisfied: torch in /Users/heber/.pyenv/versions/3.11.7/lib/python3.11/site-packages (2.2.0)
+    Collecting torchvision
+      Downloading torchvision-0.20.1-cp311-cp311-macosx_11_0_arm64.whl.metadata (6.1 kB)
+    Requirement already satisfied: filelock in /Users/heber/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from torch) (3.13.1)
+    Requirement already satisfied: typing-extensions>=4.8.0 in /Users/heber/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from torch) (4.9.0)
+    Requirement already satisfied: sympy in /Users/heber/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from torch) (1.12)
+    Requirement already satisfied: networkx in /Users/heber/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from torch) (3.2.1)
+    Requirement already satisfied: jinja2 in /Users/heber/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from torch) (3.1.3)
+    Requirement already satisfied: fsspec in /Users/heber/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from torch) (2024.2.0)
+    Requirement already satisfied: numpy in /Users/heber/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from torchvision) (1.26.3)
+    Collecting torch
+      Downloading torch-2.5.1-cp311-none-macosx_11_0_arm64.whl.metadata (28 kB)
+    Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /Users/heber/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from torchvision) (10.2.0)
+    Collecting sympy==1.13.1 (from torch)
+      Downloading sympy-1.13.1-py3-none-any.whl.metadata (12 kB)
+    Requirement already satisfied: mpmath<1.4,>=1.1.0 in /Users/heber/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from sympy==1.13.1->torch) (1.3.0)
+    Requirement already satisfied: MarkupSafe>=2.0 in /Users/heber/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from jinja2->torch) (2.1.5)
+    Downloading torchvision-0.20.1-cp311-cp311-macosx_11_0_arm64.whl (1.8 MB)
+    [2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m827.0 kB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
+    [?25hDownloading torch-2.5.1-cp311-none-macosx_11_0_arm64.whl (63.9 MB)
+    [2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m63.9/63.9 MB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0mm
+    [?25hDownloading sympy-1.13.1-py3-none-any.whl (6.2 MB)
+    [2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.2/6.2 MB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0mm
+    [?25hInstalling collected packages: sympy, torch, torchvision
+      Attempting uninstall: sympy
+        Found existing installation: sympy 1.12
+        Uninstalling sympy-1.12:
+          Successfully uninstalled sympy-1.12
+      Attempting uninstall: torch
+        Found existing installation: torch 2.2.0
+        Uninstalling torch-2.2.0:
+          Successfully uninstalled torch-2.2.0
+    Successfully installed sympy-1.13.1 torch-2.5.1 torchvision-0.20.1
+    
+    [1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.3.1[0m
+    [1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
+    Note: you may need to restart the kernel to use updated packages.
+
 %% Cell type:markdown id:0882a636 tags:


 To test run the following code

 %% Cell type:code id:b1950f0a tags:

 ``` python
 import torch

 N, D = 14, 10
 x = torch.randn(N, D).type(torch.FloatTensor)
 print(x)

 from torchvision import models

 alexnet = models.alexnet()
 print(alexnet)
 ```

+%% Output
+
+    tensor([[-0.4614,  0.2167,  1.3662,  0.5457,  2.7665,  0.8728, -0.1837,  0.0607,
+              1.5946, -0.7726],
+            [-0.8952,  0.7103, -0.7606,  0.9257, -0.1401,  0.5907,  0.7204,  1.3177,
+             -0.4342,  0.4527],
+            [ 0.7967,  0.1907, -0.5346,  1.4139, -0.5380, -2.1966,  0.4751,  1.4743,
+              1.2449, -0.8389],
+            [ 0.0833,  0.5977, -0.7399, -0.4702, -0.6887,  1.1328, -1.1584,  0.3544,
+              1.0611, -0.0325],
+            [ 0.5764, -0.5985, -1.0803, -0.7565, -1.0020,  1.7249, -0.6647,  0.7847,
+              1.7402,  0.8243],
+            [-0.9695,  0.5117,  1.9237,  1.7299,  1.0193,  0.3211, -0.5839,  0.5866,
+              1.0019, -0.2681],
+            [-0.4172, -2.3619, -1.1206, -0.7292,  0.9231, -0.3644,  0.6110,  1.3185,
+              1.2674, -1.5235],
+            [ 0.2213, -0.5554, -0.4785,  0.9106,  0.1333,  1.1237,  0.2859, -1.6737,
+             -0.8616, -2.5445],
+            [ 0.2351,  1.3325,  0.1848,  0.1473,  1.3133, -0.7523,  0.6736,  1.8610,
+             -0.1847,  1.0223],
+            [-0.6824, -0.0298, -0.1910,  1.4017, -1.9937,  0.4087,  0.0165,  1.7551,
+             -0.6690, -0.7425],
+            [-1.3005, -0.5498, -1.3494, -1.2090,  0.3210,  0.7386,  0.5926, -0.6941,
+             -0.1688, -0.6065],
+            [ 0.4044,  0.6994, -0.9141, -0.3529,  1.0734, -0.9639,  0.0657, -0.2253,
+              0.3391,  0.5039],
+            [-2.1911,  1.6130, -0.7344, -1.0796, -0.3465, -0.9285, -0.5405, -0.0072,
+             -0.1058, -1.7597],
+            [-1.4770,  0.3449,  0.6489,  1.7304, -0.0802, -0.0332, -0.2949,  0.2265,
+             -0.7456,  0.8549]])
+    AlexNet(
+      (features): Sequential(
+        (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
+        (1): ReLU(inplace=True)
+        (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
+        (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
+        (4): ReLU(inplace=True)
+        (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
+        (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
+        (7): ReLU(inplace=True)
+        (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
+        (9): ReLU(inplace=True)
+        (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
+        (11): ReLU(inplace=True)
+        (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
+      )
+      (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
+      (classifier): Sequential(
+        (0): Dropout(p=0.5, inplace=False)
+        (1): Linear(in_features=9216, out_features=4096, bias=True)
+        (2): ReLU(inplace=True)
+        (3): Dropout(p=0.5, inplace=False)
+        (4): Linear(in_features=4096, out_features=4096, bias=True)
+        (5): ReLU(inplace=True)
+        (6): Linear(in_features=4096, out_features=1000, bias=True)
+      )
+    )
+
 %% Cell type:markdown id:23f266da tags:

 ## Exercise 1: CNN on CIFAR10

 The goal is to apply a Convolutional Neural Net (CNN) model on the CIFAR10 image dataset and test the accuracy of the model on the basis of image classification. Compare the Accuracy VS the neural network implemented during TD1.

 Have a look at the following documentation to be familiar with PyTorch.

 https://pytorch.org/tutorials/beginner/pytorch_with_examples.html

 https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html

 %% Cell type:markdown id:4ba1c82d tags:

 You can test if GPU is available on your machine and thus train on it to speed up the process

 %% Cell type:code id:6e18f2fd tags:

 ``` python
 import torch

 # check if CUDA is available
 train_on_gpu = torch.cuda.is_available()

 if not train_on_gpu:
    print("CUDA is not available.  Training on CPU ...")
 else:
    print("CUDA is available!  Training on GPU ...")
 ```

+%% Output
+
+    CUDA is not available.  Training on CPU ...
+
+%% Cell type:code id:abb4553c tags:
+
+``` python
+if torch.backends.mps.is_available():
+    mps_device = torch.device("mps")
+    x = torch.ones(1, device=mps_device)
+    print (x)
+else:
+    print ("MPS device not found.")
+```
+
+%% Output
+
+    tensor([1.], device='mps:0')
+
 %% Cell type:markdown id:5cf214eb tags:

 Next we load the CIFAR10 dataset

+%% Cell type:code id:711b0b8e tags:
+
+``` python
+import numpy as np
+from torchvision import datasets, transforms
+from torch.utils.data.sampler import SubsetRandomSampler
+```
+
 %% Cell type:code id:462666a2 tags:

 ``` python
 import numpy as np
 from torchvision import datasets, transforms
 from torch.utils.data.sampler import SubsetRandomSampler

 # number of subprocesses to use for data loading
 num_workers = 0
 # how many samples per batch to load
 batch_size = 20
 # percentage of training set to use as validation
 valid_size = 0.2

 # convert data to a normalized torch.FloatTensor
 transform = transforms.Compose(
    [transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
 )

 # choose the training and test datasets
 train_data = datasets.CIFAR10("data", train=True, download=True, transform=transform)
 test_data = datasets.CIFAR10("data", train=False, download=True, transform=transform)

 # obtain training indices that will be used for validation
 num_train = len(train_data)
 indices = list(range(num_train))
 np.random.shuffle(indices)
 split = int(np.floor(valid_size * num_train))
 train_idx, valid_idx = indices[split:], indices[:split]

 # define samplers for obtaining training and validation batches
 train_sampler = SubsetRandomSampler(train_idx)
 valid_sampler = SubsetRandomSampler(valid_idx)

 # prepare data loaders (combine dataset and sampler)
 train_loader = torch.utils.data.DataLoader(
    train_data, batch_size=batch_size, sampler=train_sampler, num_workers=num_workers
 )
 valid_loader = torch.utils.data.DataLoader(
    train_data, batch_size=batch_size, sampler=valid_sampler, num_workers=num_workers
 )
 test_loader = torch.utils.data.DataLoader(
    test_data, batch_size=batch_size, num_workers=num_workers
 )

 # specify the image classes
 classes = [
    "airplane",
    "automobile",
    "bird",
    "cat",
    "deer",
    "dog",
    "frog",
    "horse",
    "ship",
    "truck",
 ]
 ```

+%% Output
+
+    Files already downloaded and verified
+    Files already downloaded and verified
+
 %% Cell type:markdown id:58ec3903 tags:

 CNN definition (this one is an example)

 %% Cell type:code id:317bf070 tags:

 ``` python
 import torch.nn as nn
 import torch.nn.functional as F

 # define the CNN architecture


 class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


 # create a complete CNN
 model = Net()
 print(model)
 # move tensors to GPU if CUDA is available
 if train_on_gpu:
    model.cuda()
 ```

+%% Output
+
+    Net(
+      (conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
+      (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
+      (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
+      (fc1): Linear(in_features=400, out_features=120, bias=True)
+      (fc2): Linear(in_features=120, out_features=84, bias=True)
+      (fc3): Linear(in_features=84, out_features=10, bias=True)
+    )
+
 %% Cell type:markdown id:a2dc4974 tags:

 Loss function and training using SGD (Stochastic Gradient Descent) optimizer

 %% Cell type:code id:4b53f229 tags:

 ``` python
 import torch.optim as optim

 criterion = nn.CrossEntropyLoss()  # specify loss function
 optimizer = optim.SGD(model.parameters(), lr=0.01)  # specify optimizer

 n_epochs = 30  # number of epochs to train the model
 train_loss_list = []  # list to store loss to visualize
 valid_loss_min = np.Inf  # track change in validation loss

 for epoch in range(n_epochs):
    # Keep track of training and validation loss
    train_loss = 0.0
    valid_loss = 0.0

    # Train the model
    model.train()
    for data, target in train_loader:
        # Move tensors to GPU if CUDA is available
        if train_on_gpu:
            data, target = data.cuda(), target.cuda()
        # Clear the gradients of all optimized variables
        optimizer.zero_grad()
        # Forward pass: compute predicted outputs by passing inputs to the model
        output = model(data)
        # Calculate the batch loss
        loss = criterion(output, target)
        # Backward pass: compute gradient of the loss with respect to model parameters
        loss.backward()
        # Perform a single optimization step (parameter update)
        optimizer.step()
        # Update training loss
        train_loss += loss.item() * data.size(0)

    # Validate the model
    model.eval()
    for data, target in valid_loader:
        # Move tensors to GPU if CUDA is available
        if train_on_gpu:
            data, target = data.cuda(), target.cuda()
        # Forward pass: compute predicted outputs by passing inputs to the model
        output = model(data)
        # Calculate the batch loss
        loss = criterion(output, target)
        # Update average validation loss
        valid_loss += loss.item() * data.size(0)

    # Calculate average losses
    train_loss = train_loss / len(train_loader)
    valid_loss = valid_loss / len(valid_loader)
    train_loss_list.append(train_loss)

    # Print training/validation statistics
    print(
        "Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}".format(
            epoch, train_loss, valid_loss
        )
    )

    # Save model if validation loss has decreased
    if valid_loss <= valid_loss_min:
        print(
            "Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...".format(
                valid_loss_min, valid_loss
            )
        )
        torch.save(model.state_dict(), "model_cifar.pt")
        valid_loss_min = valid_loss
 ```

+%% Output
+
+    Epoch: 0 	Training Loss: 28.707199 	Validation Loss: 28.363214
+    Validation loss decreased (inf --> 28.363214).  Saving model ...
+    Epoch: 1 	Training Loss: 27.053440 	Validation Loss: 26.921309
+    Validation loss decreased (28.363214 --> 26.921309).  Saving model ...
+    Epoch: 2 	Training Loss: 25.798181 	Validation Loss: 25.484369
+    Validation loss decreased (26.921309 --> 25.484369).  Saving model ...
+    Epoch: 3 	Training Loss: 24.616021 	Validation Loss: 25.825257
+    Epoch: 4 	Training Loss: 23.607140 	Validation Loss: 24.406983
+    Validation loss decreased (25.484369 --> 24.406983).  Saving model ...
+    Epoch: 5 	Training Loss: 22.641223 	Validation Loss: 23.463277
+    Validation loss decreased (24.406983 --> 23.463277).  Saving model ...
+    Epoch: 6 	Training Loss: 21.727461 	Validation Loss: 23.323754
+    Validation loss decreased (23.463277 --> 23.323754).  Saving model ...
+    Epoch: 7 	Training Loss: 20.908013 	Validation Loss: 22.815489
+    Validation loss decreased (23.323754 --> 22.815489).  Saving model ...
+    Epoch: 8 	Training Loss: 20.072570 	Validation Loss: 22.468899
+    Validation loss decreased (22.815489 --> 22.468899).  Saving model ...
+    Epoch: 9 	Training Loss: 19.337123 	Validation Loss: 23.307148
+    Epoch: 10 	Training Loss: 18.578279 	Validation Loss: 22.322720
+    Validation loss decreased (22.468899 --> 22.322720).  Saving model ...
+    Epoch: 11 	Training Loss: 17.925301 	Validation Loss: 22.491466
+    Epoch: 12 	Training Loss: 17.266396 	Validation Loss: 22.145613
+    Validation loss decreased (22.322720 --> 22.145613).  Saving model ...
+    Epoch: 13 	Training Loss: 16.644972 	Validation Loss: 21.923327
+    Validation loss decreased (22.145613 --> 21.923327).  Saving model ...
+    Epoch: 14 	Training Loss: 16.097757 	Validation Loss: 22.242258
+    Epoch: 15 	Training Loss: 15.522903 	Validation Loss: 22.269535
+    Epoch: 16 	Training Loss: 14.930308 	Validation Loss: 23.073589
+    Epoch: 17 	Training Loss: 14.374154 	Validation Loss: 23.190186
+    Epoch: 18 	Training Loss: 13.829007 	Validation Loss: 23.638800
+    Epoch: 19 	Training Loss: 13.414001 	Validation Loss: 25.147587
+    Epoch: 20 	Training Loss: 12.890743 	Validation Loss: 24.385583
+    Epoch: 21 	Training Loss: 12.456227 	Validation Loss: 24.933902
+    Epoch: 22 	Training Loss: 11.993389 	Validation Loss: 25.289021
+    Epoch: 23 	Training Loss: 11.565563 	Validation Loss: 26.004760
+    Epoch: 24 	Training Loss: 11.188692 	Validation Loss: 26.451757
+    Epoch: 25 	Training Loss: 10.716678 	Validation Loss: 27.236794
+    Epoch: 26 	Training Loss: 10.315807 	Validation Loss: 27.493770
+    Epoch: 27 	Training Loss: 9.975283 	Validation Loss: 27.571290
+    Epoch: 28 	Training Loss: 9.440035 	Validation Loss: 29.006522
+    Epoch: 29 	Training Loss: 9.220511 	Validation Loss: 29.190469
+
 %% Cell type:markdown id:13e1df74 tags:

 Does overfit occur? If so, do an early stopping.

+%% Cell type:markdown id:4e567158 tags:
+
+Yes, overfitting occurs. This is evident starting around Epoch 15, where the Validation Loss stops decreasing and begins to oscillate or increase, while the Training Loss continues to decrease.
+This indicates the model is fitting too closely to the training data and failling to generalize well to the validation data.
+By doing an early stopping, the training should stop around Epoch 15, where the Validation Loss reaches its minimum value of 21.882406. Continuing beyond this point does not improve validation performance and increases the risk of overfitting.
+
+%% Cell type:code id:11952c52 tags:
+
+``` python
+# EARLY STOP
+import torch.optim as optim
+
+min_epochs = 10
+patience = 3 # Nb of epochs to wait after no improvement
+epochs_no_improve = 0
+
+
+criterion = nn.CrossEntropyLoss()  # specify loss function
+optimizer = optim.SGD(model.parameters(), lr=0.01)  # specify optimizer
+
+n_epochs = 30  # number of epochs to train the model
+valid_loss_list = []  # list to store validation loss to visualize
+train_loss_list = []  # list to store trainloss to visualize
+valid_loss_min = np.Inf  # track change in validation loss
+
+for epoch in range(n_epochs):
+    # Keep track of training and validation loss
+    train_loss = 0.0
+    valid_loss = 0.0
+
+    # Train the model
+    model.train()
+    for data, target in train_loader:
+        # Move tensors to GPU if CUDA is available
+        if train_on_gpu:
+            data, target = data.cuda(), target.cuda()
+        # Clear the gradients of all optimized variables
+        optimizer.zero_grad()
+        # Forward pass: compute predicted outputs by passing inputs to the model
+        output = model(data)
+        # Calculate the batch loss
+        loss = criterion(output, target)
+        # Backward pass: compute gradient of the loss with respect to model parameters
+        loss.backward()
+        # Perform a single optimization step (parameter update)
+        optimizer.step()
+        # Update training loss
+        train_loss += loss.item() * data.size(0)
+
+    # Validate the model
+    model.eval()
+    for data, target in valid_loader:
+        # Move tensors to GPU if CUDA is available
+        if train_on_gpu:
+            data, target = data.cuda(), target.cuda()
+        # Forward pass: compute predicted outputs by passing inputs to the model
+        output = model(data)
+        # Calculate the batch loss
+        loss = criterion(output, target)
+        # Update average validation loss
+        valid_loss += loss.item() * data.size(0)
+
+    # Calculate average losses
+    train_loss = train_loss / len(train_loader)
+    valid_loss = valid_loss / len(valid_loader)
+    train_loss_list.append(train_loss)
+    valid_loss_list.append(valid_loss)
+
+    # Print training/validation statistics
+    print(
+        "Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}".format(
+            epoch, train_loss, valid_loss
+        )
+    )
+
+    # Save model if validation loss has decreased
+    if valid_loss <= valid_loss_min:
+        print(
+            "Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...".format(
+                valid_loss_min, valid_loss
+            )
+        )
+        torch.save(model.state_dict(), "model_cifar_1_early_stop.pt")
+        valid_loss_min = valid_loss
+        epochs_no_improve = 0
+    elif epoch >= min_epochs:
+        epochs_no_improve += 1
+        if epochs_no_improve >= patience:
+            print(f"Validation loss increased for {patience} times consecutives. Applying Early Stop.")
+            break
+```
+
+%% Output
+
+    Epoch: 0 	Training Loss: 8.891932 	Validation Loss: 30.875338
+    Validation loss decreased (inf --> 30.875338).  Saving model ...
+
+    ---------------------------------------------------------------------------
+    KeyboardInterrupt                         Traceback (most recent call last)
+Cell     In[35], line 35
+         33 loss = criterion(output, target)
+         34 # Backward pass: compute gradient of the loss with respect to model parameters
+    ---> 35 loss.backward()
+         36 # Perform a single optimization step (parameter update)
+         37 optimizer.step()
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/_tensor.py:581, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
+        571 if has_torch_function_unary(self):
+        572     return handle_torch_function(
+        573         Tensor.backward,
+        574         (self,),
+       (...)
+        579         inputs=inputs,
+        580     )
+    --> 581 torch.autograd.backward(
+        582     self, gradient, retain_graph, create_graph, inputs=inputs
+        583 )
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/autograd/__init__.py:347, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
+        342     retain_graph = create_graph
+        344 # The reason we repeat the same comment below is that
+        345 # some Python versions print out the first line of a multi-line function
+        346 # calls in the traceback and some print out the last line
+    --> 347 _engine_run_backward(
+        348     tensors,
+        349     grad_tensors_,
+        350     retain_graph,
+        351     create_graph,
+        352     inputs,
+        353     allow_unreachable=True,
+        354     accumulate_grad=True,
+        355 )
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/autograd/graph.py:825, in _engine_run_backward(t_outputs, *args, **kwargs)
+        823     unregister_hooks = _register_logging_hooks_on_whole_graph(t_outputs)
+        824 try:
+    --> 825     return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
+        826         t_outputs, *args, **kwargs
+        827     )  # Calls into the C++ engine to run the backward pass
+        828 finally:
+        829     if attach_logging_hooks:
+    KeyboardInterrupt:
+
 %% Cell type:code id:d39df818 tags:

 ``` python
 import matplotlib.pyplot as plt

-plt.plot(range(n_epochs), train_loss_list)
+plt.plot(range(len(train_loss_list)), train_loss_list)
+plt.xlabel("Epoch")
+plt.ylabel("Train Loss")
+plt.title("Performance of Model 1")
+plt.show()
+```
+
+%% Output
+
+
+
+%% Cell type:code id:2111dfe9 tags:
+
+``` python
+import matplotlib.pyplot as plt
+
+plt.plot(range(len(valid_loss_list)), valid_loss_list)
 plt.xlabel("Epoch")
-plt.ylabel("Loss")
+plt.ylabel("Validation Loss")
 plt.title("Performance of Model 1")
 plt.show()
 ```

+%% Output
+
+
+
 %% Cell type:markdown id:11df8fd4 tags:

 Now loading the model with the lowest validation loss value

 %% Cell type:code id:e93efdfc tags:

 ``` python
-model.load_state_dict(torch.load("./model_cifar.pt"))
+# model.load_state_dict(torch.load("./model_cifar.pt"))
+model.load_state_dict(torch.load("./model_cifar_1_early_stop.pt"))

 # track test loss
 test_loss = 0.0
 class_correct = list(0.0 for i in range(10))
 class_total = list(0.0 for i in range(10))

 model.eval()
 # iterate over test data
 for data, target in test_loader:
    # move tensors to GPU if CUDA is available
    if train_on_gpu:
        data, target = data.cuda(), target.cuda()
    # forward pass: compute predicted outputs by passing inputs to the model
    output = model(data)
    # calculate the batch loss
    loss = criterion(output, target)
    # update test loss
    test_loss += loss.item() * data.size(0)
    # convert output probabilities to predicted class
    _, pred = torch.max(output, 1)
    # compare predictions to true label
    correct_tensor = pred.eq(target.data.view_as(pred))
    correct = (
        np.squeeze(correct_tensor.numpy())
        if not train_on_gpu
        else np.squeeze(correct_tensor.cpu().numpy())
    )
    # calculate test accuracy for each object class
    for i in range(batch_size):
        label = target.data[i]
        class_correct[label] += correct[i].item()
        class_total[label] += 1

 # average test loss
 test_loss = test_loss / len(test_loader)
 print("Test Loss: {:.6f}\n".format(test_loss))

 for i in range(10):
    if class_total[i] > 0:
        print(
            "Test Accuracy of %5s: %2d%% (%2d/%2d)"
            % (
                classes[i],
                100 * class_correct[i] / class_total[i],
                np.sum(class_correct[i]),
                np.sum(class_total[i]),
            )
        )
    else:
        print("Test Accuracy of %5s: N/A (no training examples)" % (classes[i]))

 print(
    "\nTest Accuracy (Overall): %2d%% (%2d/%2d)"
    % (
        100.0 * np.sum(class_correct) / np.sum(class_total),
        np.sum(class_correct),
        np.sum(class_total),
    )
 )
 ```

+%% Output
+
+    /var/folders/qb/94v41qkx157gvjjjv1rchcr00000gn/T/ipykernel_25820/3291884398.py:1: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
+      model.load_state_dict(torch.load("./model_cifar.pt"))
+
+    Test Loss: 21.811477
+    
+    Test Accuracy of airplane: 71% (716/1000)
+    Test Accuracy of automobile: 75% (750/1000)
+    Test Accuracy of  bird: 55% (558/1000)
+    Test Accuracy of   cat: 44% (442/1000)
+    Test Accuracy of  deer: 60% (604/1000)
+    Test Accuracy of   dog: 52% (521/1000)
+    Test Accuracy of  frog: 64% (644/1000)
+    Test Accuracy of horse: 58% (588/1000)
+    Test Accuracy of  ship: 74% (746/1000)
+    Test Accuracy of truck: 68% (681/1000)
+    
+    Test Accuracy (Overall): 62% (6250/10000)
+
 %% Cell type:markdown id:944991a2 tags:

 Build a new network with the following structure.

 - It has 3 convolutional layers of kernel size 3 and padding of 1.
 - The first convolutional layer must output 16 channels, the second 32 and the third 64.
 - At each convolutional layer output, we apply a ReLU activation then a MaxPool with kernel size of 2.
 - Then, three fully connected layers, the first two being followed by a ReLU activation and a dropout whose value you will suggest.
 - The first fully connected layer will have an output size of 512.
 - The second fully connected layer will have an output size of 64.

 Compare the results obtained with this new network to those obtained previously.

+%% Cell type:code id:8b67c2c6 tags:
+
+``` python
+import torch.nn as nn
+import torch.nn.functional as F
+
+# define the CNN architecture
+
+class NewNet(nn.Module):
+    def __init__(self, dropout_value=0.5):
+        super(NewNet, self).__init__()
+        # Convolutional layers
+        self.conv1 = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, padding=1)
+        self.conv2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, padding=1)
+        self.conv3 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1)
+
+        # MaxPool
+        self.pool = nn.MaxPool2d(kernel_size=2)
+
+        # Dropout
+        self.dropout = nn.Dropout(p=dropout_value)
+
+        # Fully connected layers
+        # self.fc1 = nn.Linear(in_features=64 * (input_size // 8) * (input_size // 8), out_features=512)
+        self.fc1 = nn.Linear(in_features=64 * 4 * 4, out_features=512)
+        self.fc2 = nn.Linear(in_features=512, out_features=64)
+        self.fc3 = nn.Linear(64, 10)
+
+    def forward(self, x):
+        # Convolutional layers with ReLU and MaxPool
+        x = self.pool(F.relu(self.conv1(x)))
+        x = self.pool(F.relu(self.conv2(x)))
+        x = self.pool(F.relu(self.conv3(x)))
+
+        x = x.view(x.size(0), -1)
+        x = self.dropout(F.relu(self.fc1(x)))
+        x = self.dropout(F.relu(self.fc2(x)))
+        x = self.fc3(x)
+        return x
+
+
+# # create a complete CNN
+# new_model = NewNet()
+# print(new_model)
+# # move tensors to GPU if CUDA is available
+# if train_on_gpu:
+#     new_model.cuda()
+```
+
+%% Cell type:code id:3cc6cc8a tags:
+
+``` python
+
+# create a complete CNN
+new_model = NewNet()
+print(new_model)
+# move tensors to GPU if CUDA is available
+if train_on_gpu:
+    new_model.cuda()
+
+
+import torch.optim as optim
+
+min_epochs = 10
+patience = 3 # Nb of epochs to wait after no improvement
+epochs_no_improve = 0
+
+
+criterion = nn.CrossEntropyLoss()  # specify loss function
+optimizer = optim.SGD(new_model.parameters(), lr=0.01)  # specify optimizer
+
+n_epochs = 30  # number of epochs to train the model
+valid_loss_list = []  # list to store validation loss to visualize
+train_loss_list = []  # list to store trainloss to visualize
+valid_loss_min = np.Inf  # track change in validation loss
+
+for epoch in range(n_epochs):
+    # Keep track of training and validation loss
+    train_loss = 0.0
+    valid_loss = 0.0
+
+    # Train the model
+    new_model.train()
+    for data, target in train_loader:
+        # Move tensors to GPU if CUDA is available
+        if train_on_gpu:
+            data, target = data.cuda(), target.cuda()
+        # Clear the gradients of all optimized variables
+        optimizer.zero_grad()
+        # Forward pass: compute predicted outputs by passing inputs to the model
+        output = new_model(data)
+        # Calculate the batch loss
+        loss = criterion(output, target)
+        # Backward pass: compute gradient of the loss with respect to model parameters
+        loss.backward()
+        # Perform a single optimization step (parameter update)
+        optimizer.step()
+        # Update training loss
+        train_loss += loss.item() * data.size(0)
+
+    # Validate the model
+    new_model.eval()
+    for data, target in valid_loader:
+        # Move tensors to GPU if CUDA is available
+        if train_on_gpu:
+            data, target = data.cuda(), target.cuda()
+        # Forward pass: compute predicted outputs by passing inputs to the model
+        output = new_model(data)
+        # Calculate the batch loss
+        loss = criterion(output, target)
+        # Update average validation loss
+        valid_loss += loss.item() * data.size(0)
+
+    # Calculate average losses
+    train_loss = train_loss / len(train_loader)
+    valid_loss = valid_loss / len(valid_loader)
+    train_loss_list.append(train_loss)
+    valid_loss_list.append(valid_loss)
+
+    # Print training/validation statistics
+    print(
+        "Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}".format(
+            epoch, train_loss, valid_loss
+        )
+    )
+
+    # Save model if validation loss has decreased
+    if valid_loss <= valid_loss_min:
+        print(
+            "Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...".format(
+                valid_loss_min, valid_loss
+            )
+        )
+        torch.save(new_model.state_dict(), "model_cifar_2.pt")
+        valid_loss_min = valid_loss
+        epochs_no_improve = 0
+    elif epoch >= min_epochs:
+        epochs_no_improve += 1
+        if epochs_no_improve >= patience:
+            print(f"Validation loss increased for {patience} times consecutives. Applying Early Stop.")
+            break
+```
+
+%% Output
+
+    NewNet(
+      (conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
+      (conv2): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
+      (conv3): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
+      (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
+      (dropout): Dropout(p=0.5, inplace=False)
+      (fc1): Linear(in_features=1024, out_features=512, bias=True)
+      (fc2): Linear(in_features=512, out_features=64, bias=True)
+      (fc3): Linear(in_features=64, out_features=10, bias=True)
+    )
+    Epoch: 0 	Training Loss: 44.934554 	Validation Loss: 40.292926
+    Validation loss decreased (inf --> 40.292926).  Saving model ...
+    Epoch: 1 	Training Loss: 38.547384 	Validation Loss: 34.307230
+    Validation loss decreased (40.292926 --> 34.307230).  Saving model ...
+    Epoch: 2 	Training Loss: 34.167031 	Validation Loss: 30.783441
+    Validation loss decreased (34.307230 --> 30.783441).  Saving model ...
+    Epoch: 3 	Training Loss: 31.514744 	Validation Loss: 29.177271
+    Validation loss decreased (30.783441 --> 29.177271).  Saving model ...
+    Epoch: 4 	Training Loss: 29.490232 	Validation Loss: 26.770098
+    Validation loss decreased (29.177271 --> 26.770098).  Saving model ...
+    Epoch: 5 	Training Loss: 27.982251 	Validation Loss: 25.774428
+    Validation loss decreased (26.770098 --> 25.774428).  Saving model ...
+    Epoch: 6 	Training Loss: 26.515079 	Validation Loss: 24.038370
+    Validation loss decreased (25.774428 --> 24.038370).  Saving model ...
+    Epoch: 7 	Training Loss: 25.259680 	Validation Loss: 23.620053
+    Validation loss decreased (24.038370 --> 23.620053).  Saving model ...
+    Epoch: 8 	Training Loss: 23.969766 	Validation Loss: 22.249926
+    Validation loss decreased (23.620053 --> 22.249926).  Saving model ...
+    Epoch: 9 	Training Loss: 23.044149 	Validation Loss: 21.061266
+    Validation loss decreased (22.249926 --> 21.061266).  Saving model ...
+    Epoch: 10 	Training Loss: 21.929328 	Validation Loss: 20.193573
+    Validation loss decreased (21.061266 --> 20.193573).  Saving model ...
+    Epoch: 11 	Training Loss: 21.162510 	Validation Loss: 19.769918
+    Validation loss decreased (20.193573 --> 19.769918).  Saving model ...
+    Epoch: 12 	Training Loss: 20.163602 	Validation Loss: 19.290062
+    Validation loss decreased (19.769918 --> 19.290062).  Saving model ...
+    Epoch: 13 	Training Loss: 19.370121 	Validation Loss: 18.626375
+    Validation loss decreased (19.290062 --> 18.626375).  Saving model ...
+    Epoch: 14 	Training Loss: 18.558041 	Validation Loss: 18.075628
+    Validation loss decreased (18.626375 --> 18.075628).  Saving model ...
+
+%% Cell type:code id:97355006 tags:
+
+``` python
+model.load_state_dict(torch.load("./model_cifar_2.pt"))
+
+# track test loss
+test_loss = 0.0
+class_correct = list(0.0 for i in range(10))
+class_total = list(0.0 for i in range(10))
+
+model.eval()
+# iterate over test data
+for data, target in test_loader:
+    # move tensors to GPU if CUDA is available
+    if train_on_gpu:
+        data, target = data.cuda(), target.cuda()
+    # forward pass: compute predicted outputs by passing inputs to the model
+    output = model(data)
+    # calculate the batch loss
+    loss = criterion(output, target)
+    # update test loss
+    test_loss += loss.item() * data.size(0)
+    # convert output probabilities to predicted class
+    _, pred = torch.max(output, 1)
+    # compare predictions to true label
+    correct_tensor = pred.eq(target.data.view_as(pred))
+    correct = (
+        np.squeeze(correct_tensor.numpy())
+        if not train_on_gpu
+        else np.squeeze(correct_tensor.cpu().numpy())
+    )
+    # calculate test accuracy for each object class
+    for i in range(batch_size):
+        label = target.data[i]
+        class_correct[label] += correct[i].item()
+        class_total[label] += 1
+
+# average test loss
+test_loss = test_loss / len(test_loader)
+print("Test Loss: {:.6f}\n".format(test_loss))
+
+for i in range(10):
+    if class_total[i] > 0:
+        print(
+            "Test Accuracy of %5s: %2d%% (%2d/%2d)"
+            % (
+                classes[i],
+                100 * class_correct[i] / class_total[i],
+                np.sum(class_correct[i]),
+                np.sum(class_total[i]),
+            )
+        )
+    else:
+        print("Test Accuracy of %5s: N/A (no training examples)" % (classes[i]))
+
+print(
+    "\nTest Accuracy (Overall): %2d%% (%2d/%2d)"
+    % (
+        100.0 * np.sum(class_correct) / np.sum(class_total),
+        np.sum(class_correct),
+        np.sum(class_total),
+    )
+)
+```
+
+%% Output
+
+    /var/folders/qb/94v41qkx157gvjjjv1rchcr00000gn/T/ipykernel_32008/3634208260.py:1: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
+      model.load_state_dict(torch.load("./model_cifar_2.pt"))
+
+    ---------------------------------------------------------------------------
+    FileNotFoundError                         Traceback (most recent call last)
+Cell     In[31], line 1
+    ----> 1 model.load_state_dict(torch.load("./model_cifar_2.pt"))
+          3 # track test loss
+          4 test_loss = 0.0
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/serialization.py:1319, in load(f, map_location, pickle_module, weights_only, mmap, **pickle_load_args)
+       1316 if "encoding" not in pickle_load_args.keys():
+       1317     pickle_load_args["encoding"] = "utf-8"
+    -> 1319 with _open_file_like(f, "rb") as opened_file:
+       1320     if _is_zipfile(opened_file):
+       1321         # The zipfile reader is going to advance the current file position.
+       1322         # If we want to actually tail call to torch.jit.load, we need to
+       1323         # reset back to the original position.
+       1324         orig_position = opened_file.tell()
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/serialization.py:659, in _open_file_like(name_or_buffer, mode)
+        657 def _open_file_like(name_or_buffer, mode):
+        658     if _is_path(name_or_buffer):
+    --> 659         return _open_file(name_or_buffer, mode)
+        660     else:
+        661         if "w" in mode:
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/serialization.py:640, in _open_file.__init__(self, name, mode)
+        639 def __init__(self, name, mode):
+    --> 640     super().__init__(open(name, mode))
+    FileNotFoundError: [Errno 2] No such file or directory: './model_cifar_2.pt'
+
+%% Cell type:markdown id:6245b27f tags:
+
+# Test Accuracy: Model 1 v/s Model 2
+
+## Test Accuracy Model 1:
+* Test Loss: 21.811477
+
+* Test Accuracy of airplane: 71% (716/1000)
+* Test Accuracy of automobile: 75% (750/1000)
+* Test Accuracy of  bird: 55% (558/1000)
+* Test Accuracy of   cat: 44% (442/1000)
+* Test Accuracy of  deer: 60% (604/1000)
+* Test Accuracy of   dog: 52% (521/1000)
+* Test Accuracy of  frog: 64% (644/1000)
+* Test Accuracy of horse: 58% (588/1000)
+* Test Accuracy of  ship: 74% (746/1000)
+* Test Accuracy of truck: 68% (681/1000)
+
+* Test Accuracy (Overall): 62% (6250/10000)
+
+
+## Test Accuracy Model 2:
+* Test Loss: 16.239906
+
+* Test Accuracy of airplane: 78% (784/1000)
+* Test Accuracy of automobile: 88% (889/1000)
+* Test Accuracy of  bird: 61% (618/1000)
+* Test Accuracy of   cat: 61% (615/1000)
+* Test Accuracy of  deer: 66% (662/1000)
+* Test Accuracy of   dog: 50% (509/1000)
+* Test Accuracy of  frog: 82% (823/1000)
+* Test Accuracy of horse: 73% (732/1000)
+* Test Accuracy of  ship: 86% (862/1000)
+* Test Accuracy of truck: 75% (751/1000)
+
+* Test Accuracy (Overall): 72% (7245/10000)
+
 %% Cell type:markdown id:bc381cf4 tags:

 ## Exercise 2: Quantization: try to compress the CNN to save space

 Quantization doc is available from https://pytorch.org/docs/stable/quantization.html#torch.quantization.quantize_dynamic

 The Exercise is to quantize post training the above CNN model. Compare the size reduction and the impact on the classification accuracy


 The size of the model is simply the size of the file.

 %% Cell type:code id:ef623c26 tags:

 ``` python
 import os


 def print_size_of_model(model, label=""):
    torch.save(model.state_dict(), "temp.p")
    size = os.path.getsize("temp.p")
    print("model: ", label, " \t", "Size (KB):", size / 1e3)
    os.remove("temp.p")
    return size


 print_size_of_model(model, "fp32")
 ```

+%% Output
+
+    model:  fp32  	 Size (KB): 2330.946
+
+    2330946
+
 %% Cell type:markdown id:05c4e9ad tags:

 Post training quantization example

 %% Cell type:code id:c4c65d4b tags:

 ``` python
 import torch.quantization

-
+torch.backends.quantized.engine = 'qnnpack'
 quantized_model = torch.quantization.quantize_dynamic(model, dtype=torch.qint8)
 print_size_of_model(quantized_model, "int8")
 ```

+%% Output
+
+    ---------------------------------------------------------------------------
+    RuntimeError                              Traceback (most recent call last)
+Cell     In[30], line 4
+          1 import torch.quantization
+    ----> 4 quantized_model = torch.quantization.quantize_dynamic(model, dtype=torch.qint8)
+          5 print_size_of_model(quantized_model, "int8")
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/ao/quantization/quantize.py:564, in quantize_dynamic(model, qconfig_spec, dtype, mapping, inplace)
+        562 model.eval()
+        563 propagate_qconfig_(model, qconfig_spec)
+    --> 564 convert(model, mapping, inplace=True)
+        565 return model
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/ao/quantization/quantize.py:659, in convert(module, mapping, inplace, remove_qconfig, is_reference, convert_custom_config_dict, use_precomputed_fake_quant)
+        657 if not inplace:
+        658     module = copy.deepcopy(module)
+    --> 659 _convert(
+        660     module,
+        661     mapping,
+        662     inplace=True,
+        663     is_reference=is_reference,
+        664     convert_custom_config_dict=convert_custom_config_dict,
+        665     use_precomputed_fake_quant=use_precomputed_fake_quant,
+        666 )
+        667 if remove_qconfig:
+        668     _remove_qconfig(module)
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/ao/quantization/quantize.py:724, in _convert(module, mapping, inplace, is_reference, convert_custom_config_dict, use_precomputed_fake_quant)
+        712     if (
+        713         not isinstance(mod, _FusedModule)
+        714         and type_before_parametrizations(mod) not in custom_module_class_mapping
+        715     ):
+        716         _convert(
+        717             mod,
+        718             mapping,
+       (...)
+        722             use_precomputed_fake_quant=use_precomputed_fake_quant,
+        723         )
+    --> 724     reassign[name] = swap_module(
+        725         mod, mapping, custom_module_class_mapping, use_precomputed_fake_quant
+        726     )
+        728 for key, value in reassign.items():
+        729     module._modules[key] = value
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/ao/quantization/quantize.py:766, in swap_module(mod, mapping, custom_module_class_mapping, use_precomputed_fake_quant)
+        764 sig = inspect.signature(qmod.from_float)
+        765 if "use_precomputed_fake_quant" in sig.parameters:
+    --> 766     new_mod = qmod.from_float(
+        767         mod, use_precomputed_fake_quant=use_precomputed_fake_quant
+        768     )
+        769 else:
+        770     new_mod = qmod.from_float(mod)
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/ao/nn/quantized/dynamic/modules/linear.py:145, in Linear.from_float(cls, mod, use_precomputed_fake_quant)
+        141 else:
+        142     raise RuntimeError(
+        143         "Unsupported dtype specified for dynamic quantized Linear!"
+        144     )
+    --> 145 qlinear = cls(mod.in_features, mod.out_features, dtype=dtype)
+        146 qlinear.set_weight_bias(qweight, mod.bias)
+        147 return qlinear
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/ao/nn/quantized/dynamic/modules/linear.py:42, in Linear.__init__(self, in_features, out_features, bias_, dtype)
+         41 def __init__(self, in_features, out_features, bias_=True, dtype=torch.qint8):
+    ---> 42     super().__init__(in_features, out_features, bias_, dtype=dtype)
+         43     # We don't muck around with buffers or attributes or anything here
+         44     # to keep the module simple. *everything* is simply a Python attribute.
+         45     # Serialization logic is explicitly handled in the below serialization and
+         46     # deserialization modules
+         47     self.version = 4
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/ao/nn/quantized/modules/linear.py:172, in Linear.__init__(self, in_features, out_features, bias_, dtype)
+        169 else:
+        170     raise RuntimeError("Unsupported dtype specified for quantized Linear!")
+    --> 172 self._packed_params = LinearPackedParams(dtype)
+        173 self._packed_params.set_weight_bias(qweight, bias)
+        174 self.scale = 1.0
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/ao/nn/quantized/modules/linear.py:31, in LinearPackedParams.__init__(self, dtype)
+         29 elif self.dtype == torch.float16:
+         30     wq = torch.zeros([1, 1], dtype=torch.float)
+    ---> 31 self.set_weight_bias(wq, None)
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/ao/nn/quantized/modules/linear.py:38, in LinearPackedParams.set_weight_bias(self, weight, bias)
+         33 @torch.jit.export
+         34 def set_weight_bias(
+         35     self, weight: torch.Tensor, bias: Optional[torch.Tensor]
+         36 ) -> None:
+         37     if self.dtype == torch.qint8:
+    ---> 38         self._packed_params = torch.ops.quantized.linear_prepack(weight, bias)
+         39     elif self.dtype == torch.float16:
+         40         self._packed_params = torch.ops.quantized.linear_prepack_fp16(weight, bias)
+File     ~/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/_ops.py:1116, in OpOverloadPacket.__call__(self, *args, **kwargs)
+       1114 if self._has_torchbind_op_overload and _must_dispatch_in_python(args, kwargs):
+       1115     return _call_overload_packet_from_python(self, args, kwargs)
+    -> 1116 return self._op(*args, **(kwargs or {}))
+    RuntimeError: Didn't find engine for operation quantized::linear_prepack NoQEngine
+
+%% Cell type:markdown id:063d405c tags:
+
+
 %% Cell type:markdown id:7b108e17 tags:

 For each class, compare the classification test accuracy of the initial model and the quantized model. Also give the overall test accuracy for both models.

 %% Cell type:markdown id:a0a34b90 tags:

 Try training aware quantization to mitigate the impact on the accuracy (doc available here https://pytorch.org/docs/stable/quantization.html#torch.quantization.quantize_dynamic)

 %% Cell type:markdown id:201470f9 tags:

 ## Exercise 3: working with pre-trained models.

 PyTorch offers several pre-trained models https://pytorch.org/vision/0.8/models.html
 We will use ResNet50 trained on ImageNet dataset (https://www.image-net.org/index.php). Use the following code with the files `imagenet-simple-labels.json` that contains the imagenet labels and the image dog.png that we will use as test.

 %% Cell type:code id:b4d13080 tags:

 ``` python
 import json
 from PIL import Image

 # Choose an image to pass through the model
 test_image = "dog.png"

 # Configure matplotlib for pretty inline plots
 #%matplotlib inline
 #%config InlineBackend.figure_format = 'retina'

 # Prepare the labels
 with open("imagenet-simple-labels.json") as f:
    labels = json.load(f)

 # First prepare the transformations: resize the image to what the model was trained on and convert it to a tensor
 data_transform = transforms.Compose(
    [
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
    ]
 )
 # Load the image

 image = Image.open(test_image)
 plt.imshow(image), plt.xticks([]), plt.yticks([])

 # Now apply the transformation, expand the batch dimension, and send the image to the GPU
 # image = data_transform(image).unsqueeze(0).cuda()
 image = data_transform(image).unsqueeze(0)

 # Download the model if it's not there already. It will take a bit on the first run, after that it's fast
 model = models.resnet50(pretrained=True)
 # Send the model to the GPU
 # model.cuda()
 # Set layers such as dropout and batchnorm in evaluation mode
 model.eval()

 # Get the 1000-dimensional model output
 out = model(image)
 # Find the predicted class
 print("Predicted class is: {}".format(labels[out.argmax()]))
 ```

 %% Cell type:markdown id:184cfceb tags:

 Experiments:

 Study the code and the results obtained. Possibly add other images downloaded from the internet.

 What is the size of the model? Quantize it and then check if the model is still able to correctly classify the other images.

 Experiment with other pre-trained CNN models.



 %% Cell type:markdown id:5d57da4b tags:

 ## Exercise 4: Transfer Learning


 For this work, we will use a pre-trained model (ResNet18) as a descriptor extractor and will refine the classification by training only the last fully connected layer of the network. Thus, the output layer of the pre-trained network will be replaced by a layer adapted to the new classes to be recognized which will be in our case ants and bees.
 Download and unzip in your working directory the dataset available at the address :

 https://download.pytorch.org/tutorial/hymenoptera_data.zip

 Execute the following code in order to display some images of the dataset.

 %% Cell type:code id:be2d31f5 tags:

 ``` python
 import os

 import matplotlib.pyplot as plt
 import numpy as np
 import torch
 import torchvision
 from torchvision import datasets, transforms

 # Data augmentation and normalization for training
 # Just normalization for validation
 data_transforms = {
    "train": transforms.Compose(
        [
            transforms.RandomResizedCrop(
                224
            ),  # ImageNet models were trained on 224x224 images
            transforms.RandomHorizontalFlip(),  # flip horizontally 50% of the time - increases train set variability
            transforms.ToTensor(),  # convert it to a PyTorch tensor
            transforms.Normalize(
                [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]
            ),  # ImageNet models expect this norm
        ]
    ),
    "val": transforms.Compose(
        [
            transforms.Resize(256),
            transforms.CenterCrop(224),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
        ]
    ),
 }

 data_dir = "hymenoptera_data"
 # Create train and validation datasets and loaders
 image_datasets = {
    x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x])
    for x in ["train", "val"]
 }
 dataloaders = {
    x: torch.utils.data.DataLoader(
        image_datasets[x], batch_size=4, shuffle=True, num_workers=0
    )
    for x in ["train", "val"]
 }
 dataset_sizes = {x: len(image_datasets[x]) for x in ["train", "val"]}
 class_names = image_datasets["train"].classes
 device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

 # Helper function for displaying images
 def imshow(inp, title=None):
    """Imshow for Tensor."""
    inp = inp.numpy().transpose((1, 2, 0))
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])

    # Un-normalize the images
    inp = std * inp + mean
    # Clip just in case
    inp = np.clip(inp, 0, 1)
    plt.imshow(inp)
    if title is not None:
        plt.title(title)
    plt.pause(0.001)  # pause a bit so that plots are updated
    plt.show()


 # Get a batch of training data
 inputs, classes = next(iter(dataloaders["train"]))

 # Make a grid from batch
 out = torchvision.utils.make_grid(inputs)

 imshow(out, title=[class_names[x] for x in classes])

 ```

 %% Cell type:markdown id:bbd48800 tags:

 Now, execute the following code which uses a pre-trained model ResNet18 having replaced the output layer for the ants/bees classification and performs the model training by only changing the weights of this output layer.

 %% Cell type:code id:572d824c tags:

 ``` python
 import copy
 import os
 import time

 import matplotlib.pyplot as plt
 import numpy as np
 import torch
 import torch.nn as nn
 import torch.optim as optim
 import torchvision
 from torch.optim import lr_scheduler
 from torchvision import datasets, transforms

 # Data augmentation and normalization for training
 # Just normalization for validation
 data_transforms = {
    "train": transforms.Compose(
        [
            transforms.RandomResizedCrop(
                224
            ),  # ImageNet models were trained on 224x224 images
            transforms.RandomHorizontalFlip(),  # flip horizontally 50% of the time - increases train set variability
            transforms.ToTensor(),  # convert it to a PyTorch tensor
            transforms.Normalize(
                [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]
            ),  # ImageNet models expect this norm
        ]
    ),
    "val": transforms.Compose(
        [
            transforms.Resize(256),
            transforms.CenterCrop(224),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
        ]
    ),
 }

 data_dir = "hymenoptera_data"
 # Create train and validation datasets and loaders
 image_datasets = {
    x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x])
    for x in ["train", "val"]
 }
 dataloaders = {
    x: torch.utils.data.DataLoader(
        image_datasets[x], batch_size=4, shuffle=True, num_workers=4
    )
    for x in ["train", "val"]
 }
 dataset_sizes = {x: len(image_datasets[x]) for x in ["train", "val"]}
 class_names = image_datasets["train"].classes
 device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

 # Helper function for displaying images
 def imshow(inp, title=None):
    """Imshow for Tensor."""
    inp = inp.numpy().transpose((1, 2, 0))
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])

    # Un-normalize the images
    inp = std * inp + mean
    # Clip just in case
    inp = np.clip(inp, 0, 1)
    plt.imshow(inp)
    if title is not None:
        plt.title(title)
    plt.pause(0.001)  # pause a bit so that plots are updated
    plt.show()


 # Get a batch of training data
 # inputs, classes = next(iter(dataloaders['train']))

 # Make a grid from batch
 # out = torchvision.utils.make_grid(inputs)

 # imshow(out, title=[class_names[x] for x in classes])
 # training


 def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
    since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    epoch_time = []  # we'll keep track of the time needed for each epoch

    for epoch in range(num_epochs):
        epoch_start = time.time()
        print("Epoch {}/{}".format(epoch + 1, num_epochs))
        print("-" * 10)

        # Each epoch has a training and validation phase
        for phase in ["train", "val"]:
            if phase == "train":
                scheduler.step()
                model.train()  # Set model to training mode
            else:
                model.eval()  # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # Forward
                # Track history if only in training phase
                with torch.set_grad_enabled(phase == "train"):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)

                    # backward + optimize only if in training phase
                    if phase == "train":
                        loss.backward()
                        optimizer.step()

                # Statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            print("{} Loss: {:.4f} Acc: {:.4f}".format(phase, epoch_loss, epoch_acc))

            # Deep copy the model
            if phase == "val" and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())

        # Add the epoch time
        t_epoch = time.time() - epoch_start
        epoch_time.append(t_epoch)
        print()

    time_elapsed = time.time() - since
    print(
        "Training complete in {:.0f}m {:.0f}s".format(
            time_elapsed // 60, time_elapsed % 60
        )
    )
    print("Best val Acc: {:4f}".format(best_acc))

    # Load best model weights
    model.load_state_dict(best_model_wts)
    return model, epoch_time


 # Download a pre-trained ResNet18 model and freeze its weights
 model = torchvision.models.resnet18(pretrained=True)
 for param in model.parameters():
    param.requires_grad = False

 # Replace the final fully connected layer
 # Parameters of newly constructed modules have requires_grad=True by default
 num_ftrs = model.fc.in_features
 model.fc = nn.Linear(num_ftrs, 2)
 # Send the model to the GPU
 model = model.to(device)
 # Set the loss function
 criterion = nn.CrossEntropyLoss()

 # Observe that only the parameters of the final layer are being optimized
 optimizer_conv = optim.SGD(model.fc.parameters(), lr=0.001, momentum=0.9)
 exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)
 model, epoch_time = train_model(
    model, criterion, optimizer_conv, exp_lr_scheduler, num_epochs=10
 )
 ```

 %% Cell type:markdown id:bbd48800 tags:

 Experiments:
 Study the code and the results obtained.

 Modify the code and add an "eval_model" function to allow
 the evaluation of the model on a test set (different from the learning and validation sets used during the learning phase). Study the results obtained.

 Now modify the code to replace the current classification layer with a set of two layers using a "relu" activation function for the middle layer, and the "dropout" mechanism for both layers. Renew the experiments and study the results obtained.

 Apply ther quantization (post and quantization aware) and evaluate impact on model size and accuracy.

 %% Cell type:markdown id:04a263f0 tags:

 ## Optional

 Try this at home!!


 Pytorch offers a framework to export a given CNN to your selfphone (either android or iOS). Have a look at the tutorial https://pytorch.org/mobile/home/

 The Exercise consists in deploying the CNN of Exercise 4 in your phone and then test it on live.


 %% Cell type:markdown id:fe954ce4 tags:

 ## Author

 Alberto BOSIO - Ph. D.
No results found