Skip to content
Snippets Groups Projects
Select Git revision
  • b43cb569dc63a214797da24a449570ee88c535ac
  • master default protected
2 results

addition.py

Blame
  • Forked from Vuillemot Romain / INF-TC1
    Source project has a limited visibility.
    TD2 Deep Learning.ipynb 1.27 MiB

    TD2: Deep learning

    In this TD, you must modify this notebook to answer the questions. To do this,

    1. Fork this repository
    2. Clone your forked repository on your local computer
    3. Answer the questions
    4. Commit and push regularly

    The last commit is due on Sunday, December 1, 11:59 PM. Later commits will not be taken into account.

    Install and test PyTorch from https://pytorch.org/get-started/locally.

    %pip install torch torchvision

    To test run the following code

    In [12]:
    import torch
    
    N, D = 14, 10
    x = torch.randn(N, D).type(torch.FloatTensor)
    print(x)
    
    from torchvision import models
    
    alexnet = models.alexnet()
    print(alexnet)
    Out [12]:
    tensor([[-0.9423,  0.9708, -1.1173,  0.7517,  0.1651,  0.1440,  0.3395, -1.5821,
             -1.2083, -0.1834],
            [-0.1060, -1.2338,  0.3896, -0.6367, -2.6377,  0.0821, -0.6745, -0.2476,
             -1.6165, -0.2387],
            [ 0.6264, -1.4330, -0.4977, -0.4685, -0.5423, -0.3698, -0.0252,  1.0910,
              1.0854,  0.2728],
            [-0.2262, -0.9628,  0.2071, -0.3877,  0.7292,  0.0707,  0.6892, -0.2527,
              0.6129, -2.5222],
            [-0.7208, -0.1664,  0.1882,  0.3032, -0.3780, -0.2051,  0.6495,  0.1030,
             -0.3246,  2.1220],
            [-2.2701,  0.0399,  0.7866,  1.0246, -0.9594,  1.1121,  0.2151,  0.3468,
             -1.3409, -0.5332],
            [ 0.9423, -1.1214,  2.2758, -0.6316, -1.4389,  0.6033, -0.6149, -0.4695,
             -0.5999, -0.0356],
            [ 0.3975, -1.3820, -0.2830, -0.1399,  0.3447, -1.4293, -0.1438, -0.0358,
              0.8166, -1.3432],
            [-0.4973, -0.7376,  0.1388, -0.5577, -0.1812,  0.7578, -1.0928, -0.1980,
              0.4528,  0.1032],
            [ 0.3483,  0.1561, -0.6872,  1.2322, -0.9672,  2.4793,  0.1647,  1.8180,
              1.7788,  0.2212],
            [ 1.0694,  0.1998, -0.2160,  0.1633, -1.1462,  0.9828,  2.9743,  0.2247,
              0.8128,  0.5803],
            [-1.9120,  1.5484,  0.8139, -0.1669, -0.6376,  0.1648, -0.1058, -1.0445,
             -1.0512,  0.7069],
            [-0.9353,  1.9754,  0.0052, -0.3826,  2.2824, -0.9010,  1.4606, -1.7802,
              1.6978, -0.4301],
            [ 0.7528, -0.9648,  0.8284, -1.0311,  0.3247, -2.3639,  1.5370,  1.5331,
             -0.7809,  0.0950]])
    AlexNet(
      (features): Sequential(
        (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
        (1): ReLU(inplace=True)
        (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
        (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
        (4): ReLU(inplace=True)
        (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
        (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (7): ReLU(inplace=True)
        (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (9): ReLU(inplace=True)
        (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (11): ReLU(inplace=True)
        (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
      )
      (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
      (classifier): Sequential(
        (0): Dropout(p=0.5, inplace=False)
        (1): Linear(in_features=9216, out_features=4096, bias=True)
        (2): ReLU(inplace=True)
        (3): Dropout(p=0.5, inplace=False)
        (4): Linear(in_features=4096, out_features=4096, bias=True)
        (5): ReLU(inplace=True)
        (6): Linear(in_features=4096, out_features=1000, bias=True)
      )
    )
    

    Exercise 1: CNN on CIFAR10

    The goal is to apply a Convolutional Neural Net (CNN) model on the CIFAR10 image dataset and test the accuracy of the model on the basis of image classification. Compare the Accuracy VS the neural network implemented during TD1.

    Have a look at the following documentation to be familiar with PyTorch.

    https://pytorch.org/tutorials/beginner/pytorch_with_examples.html

    https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html

    You can test if GPU is available on your machine and thus train on it to speed up the process

    In [13]:
    import torch
    
    # check if CUDA is available
    train_on_gpu = torch.cuda.is_available()
    
    if not train_on_gpu:
        print("CUDA is not available.  Training on CPU ...")
    else:
        print("CUDA is available!  Training on GPU ...")
    Out [13]:
    CUDA is not available.  Training on CPU ...
    

    Next we load the CIFAR10 dataset

    In [14]:
    import numpy as np
    from torchvision import datasets, transforms
    from torch.utils.data.sampler import SubsetRandomSampler
    
    # number of subprocesses to use for data loading
    num_workers = 0
    # how many samples per batch to load
    batch_size = 20
    # percentage of training set to use as validation
    valid_size = 0.2
    
    # convert data to a normalized torch.FloatTensor
    transform = transforms.Compose(
        [transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
    )
    
    # choose the training and test datasets
    train_data = datasets.CIFAR10("data", train=True, download=True, transform=transform)
    test_data = datasets.CIFAR10("data", train=False, download=True, transform=transform)
    
    # obtain training indices that will be used for validation
    num_train = len(train_data)
    indices = list(range(num_train))
    np.random.shuffle(indices)
    split = int(np.floor(valid_size * num_train))
    train_idx, valid_idx = indices[split:], indices[:split]
    
    # define samplers for obtaining training and validation batches
    train_sampler = SubsetRandomSampler(train_idx)
    valid_sampler = SubsetRandomSampler(valid_idx)
    
    # prepare data loaders (combine dataset and sampler)
    train_loader = torch.utils.data.DataLoader(
        train_data, batch_size=batch_size, sampler=train_sampler, num_workers=num_workers
    )
    valid_loader = torch.utils.data.DataLoader(
        train_data, batch_size=batch_size, sampler=valid_sampler, num_workers=num_workers
    )
    test_loader = torch.utils.data.DataLoader(
        test_data, batch_size=batch_size, num_workers=num_workers
    )
    
    # specify the image classes
    classes = [
        "airplane",
        "automobile",
        "bird",
        "cat",
        "deer",
        "dog",
        "frog",
        "horse",
        "ship",
        "truck",
    ]
    Out [14]:
    Files already downloaded and verified
    Files already downloaded and verified
    

    CNN definition (this one is an example)

    In [16]:
    import torch.nn as nn
    import torch.nn.functional as F
    
    # define the CNN architecture
    
    
    class Net(nn.Module):
        def __init__(self):
            super(Net, self).__init__()
            self.conv1 = nn.Conv2d(3, 6, 5)
            self.pool = nn.MaxPool2d(2, 2)
            self.conv2 = nn.Conv2d(6, 16, 5)
            self.fc1 = nn.Linear(16 * 5 * 5, 120)
            self.fc2 = nn.Linear(120, 84)
            self.fc3 = nn.Linear(84, 10)
    
        def forward(self, x):
            x = self.pool(F.relu(self.conv1(x)))
            x = self.pool(F.relu(self.conv2(x)))
            x = x.view(-1, 16 * 5 * 5)
            x = F.relu(self.fc1(x))
            x = F.relu(self.fc2(x))
            x = self.fc3(x)
            return x
    
    
    # create a complete CNN
    model = Net()
    print(model)
    # move tensors to GPU if CUDA is available
    if train_on_gpu:
        model.cuda()
    Out [16]:
    Net(
      (conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
      (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
      (fc1): Linear(in_features=400, out_features=120, bias=True)
      (fc2): Linear(in_features=120, out_features=84, bias=True)
      (fc3): Linear(in_features=84, out_features=10, bias=True)
    )
    

    Loss function and training using SGD (Stochastic Gradient Descent) optimizer

    In [17]:
    import torch.optim as optim
    
    criterion = nn.CrossEntropyLoss()  # specify loss function
    optimizer = optim.SGD(model.parameters(), lr=0.01)  # specify optimizer
    
    n_epochs = 30  # number of epochs to train the model
    train_loss_list = []  # list to store loss to visualize
    valid_loss_min = np.Inf  # track change in validation loss
    
    for epoch in range(n_epochs):
        # Keep track of training and validation loss
        train_loss = 0.0
        valid_loss = 0.0
    
        # Train the model
        model.train()
        for data, target in train_loader:
            # Move tensors to GPU if CUDA is available
            if train_on_gpu:
                data, target = data.cuda(), target.cuda()
            # Clear the gradients of all optimized variables
            optimizer.zero_grad()
            # Forward pass: compute predicted outputs by passing inputs to the model
            output = model(data)
            # Calculate the batch loss
            loss = criterion(output, target)
            # Backward pass: compute gradient of the loss with respect to model parameters
            loss.backward()
            # Perform a single optimization step (parameter update)
            optimizer.step()
            # Update training loss
            train_loss += loss.item() * data.size(0)
    
        # Validate the model
        model.eval()
        for data, target in valid_loader:
            # Move tensors to GPU if CUDA is available
            if train_on_gpu:
                data, target = data.cuda(), target.cuda()
            # Forward pass: compute predicted outputs by passing inputs to the model
            output = model(data)
            # Calculate the batch loss
            loss = criterion(output, target)
            # Update average validation loss
            valid_loss += loss.item() * data.size(0)
    
        # Calculate average losses
        train_loss = train_loss / len(train_loader)
        valid_loss = valid_loss / len(valid_loader)
        train_loss_list.append(train_loss)
    
        # Print training/validation statistics
        print(
            "Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}".format(
                epoch, train_loss, valid_loss
            )
        )
    
        # Save model if validation loss has decreased
        if valid_loss <= valid_loss_min:
            print(
                "Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...".format(
                    valid_loss_min, valid_loss
                )
            )
            torch.save(model.state_dict(), "model_cifar.pt")
            valid_loss_min = valid_loss
    Out [17]:
    Epoch: 0 	Training Loss: 44.726907 	Validation Loss: 39.071647
    Validation loss decreased (inf --> 39.071647).  Saving model ...
    Epoch: 1 	Training Loss: 35.054405 	Validation Loss: 32.110584
    Validation loss decreased (39.071647 --> 32.110584).  Saving model ...
    Epoch: 2 	Training Loss: 30.749012 	Validation Loss: 29.360720
    Validation loss decreased (32.110584 --> 29.360720).  Saving model ...
    Epoch: 3 	Training Loss: 28.391951 	Validation Loss: 27.967922
    Validation loss decreased (29.360720 --> 27.967922).  Saving model ...
    Epoch: 4 	Training Loss: 26.568487 	Validation Loss: 26.119769
    Validation loss decreased (27.967922 --> 26.119769).  Saving model ...
    Epoch: 5 	Training Loss: 25.169793 	Validation Loss: 25.031413
    Validation loss decreased (26.119769 --> 25.031413).  Saving model ...
    Epoch: 6 	Training Loss: 23.912804 	Validation Loss: 24.453721
    Validation loss decreased (25.031413 --> 24.453721).  Saving model ...
    Epoch: 7 	Training Loss: 22.897765 	Validation Loss: 24.236649
    Validation loss decreased (24.453721 --> 24.236649).  Saving model ...
    Epoch: 8 	Training Loss: 21.957622 	Validation Loss: 23.964489
    Validation loss decreased (24.236649 --> 23.964489).  Saving model ...
    Epoch: 9 	Training Loss: 21.138539 	Validation Loss: 22.705581
    Validation loss decreased (23.964489 --> 22.705581).  Saving model ...
    Epoch: 10 	Training Loss: 20.329734 	Validation Loss: 22.290488
    Validation loss decreased (22.705581 --> 22.290488).  Saving model ...
    Epoch: 11 	Training Loss: 19.645637 	Validation Loss: 23.373632
    Epoch: 12 	Training Loss: 18.901699 	Validation Loss: 21.661441
    Validation loss decreased (22.290488 --> 21.661441).  Saving model ...
    Epoch: 13 	Training Loss: 18.234404 	Validation Loss: 21.704771
    Epoch: 14 	Training Loss: 17.646186 	Validation Loss: 22.160250
    Epoch: 15 	Training Loss: 17.069965 	Validation Loss: 22.348481
    Epoch: 16 	Training Loss: 16.463965 	Validation Loss: 21.731248
    Epoch: 17 	Training Loss: 15.982605 	Validation Loss: 21.859619
    Epoch: 18 	Training Loss: 15.390740 	Validation Loss: 22.549561
    Epoch: 19 	Training Loss: 14.906437 	Validation Loss: 22.900866
    Epoch: 20 	Training Loss: 14.399535 	Validation Loss: 22.831292
    Epoch: 21 	Training Loss: 13.890568 	Validation Loss: 23.340409
    Epoch: 22 	Training Loss: 13.447925 	Validation Loss: 23.446479
    Epoch: 23 	Training Loss: 12.976771 	Validation Loss: 23.933286
    Epoch: 24 	Training Loss: 12.543276 	Validation Loss: 24.310838
    Epoch: 25 	Training Loss: 12.146771 	Validation Loss: 25.697770
    Epoch: 26 	Training Loss: 11.725751 	Validation Loss: 25.141558
    Epoch: 27 	Training Loss: 11.372244 	Validation Loss: 26.506025
    Epoch: 28 	Training Loss: 10.863760 	Validation Loss: 26.072285
    Epoch: 29 	Training Loss: 10.430784 	Validation Loss: 26.664543
    

    Does overfit occur? If so, do an early stopping.

    In [18]:
    import matplotlib.pyplot as plt
    
    plt.plot(range(n_epochs), train_loss_list)
    plt.xlabel("Epoch")
    plt.ylabel("Loss")
    plt.title("Performance of Model 1")
    plt.show()
    out [18]:

    Now loading the model with the lowest validation loss value

    In [19]:
    model.load_state_dict(torch.load("./model_cifar.pt"))
    
    # track test loss
    test_loss = 0.0
    class_correct = list(0.0 for i in range(10))
    class_total = list(0.0 for i in range(10))
    
    model.eval()
    # iterate over test data
    for data, target in test_loader:
        # move tensors to GPU if CUDA is available
        if train_on_gpu:
            data, target = data.cuda(), target.cuda()
        # forward pass: compute predicted outputs by passing inputs to the model
        output = model(data)
        # calculate the batch loss
        loss = criterion(output, target)
        # update test loss
        test_loss += loss.item() * data.size(0)
        # convert output probabilities to predicted class
        _, pred = torch.max(output, 1)
        # compare predictions to true label
        correct_tensor = pred.eq(target.data.view_as(pred))
        correct = (
            np.squeeze(correct_tensor.numpy())
            if not train_on_gpu
            else np.squeeze(correct_tensor.cpu().numpy())
        )
        # calculate test accuracy for each object class
        for i in range(batch_size):
            label = target.data[i]
            class_correct[label] += correct[i].item()
            class_total[label] += 1
    
    # average test loss
    test_loss = test_loss / len(test_loader)
    print("Test Loss: {:.6f}\n".format(test_loss))
    
    for i in range(10):
        if class_total[i] > 0:
            print(
                "Test Accuracy of %5s: %2d%% (%2d/%2d)"
                % (
                    classes[i],
                    100 * class_correct[i] / class_total[i],
                    np.sum(class_correct[i]),
                    np.sum(class_total[i]),
                )
            )
        else:
            print("Test Accuracy of %5s: N/A (no training examples)" % (classes[i]))
    
    print(
        "\nTest Accuracy (Overall): %2d%% (%2d/%2d)"
        % (
            100.0 * np.sum(class_correct) / np.sum(class_total),
            np.sum(class_correct),
            np.sum(class_total),
        )
    )
    Out [19]:
    Test Loss: 21.163006
    
    Test Accuracy of airplane: 66% (661/1000)
    Test Accuracy of automobile: 75% (754/1000)
    Test Accuracy of  bird: 45% (457/1000)
    Test Accuracy of   cat: 41% (414/1000)
    Test Accuracy of  deer: 55% (556/1000)
    Test Accuracy of   dog: 56% (564/1000)
    Test Accuracy of  frog: 67% (678/1000)
    Test Accuracy of horse: 72% (726/1000)
    Test Accuracy of  ship: 74% (746/1000)
    Test Accuracy of truck: 74% (741/1000)
    
    Test Accuracy (Overall): 62% (6297/10000)
    

    Build a new network with the following structure.

    • It has 3 convolutional layers of kernel size 3 and padding of 1.
    • The first convolutional layer must output 16 channels, the second 32 and the third 64.
    • At each convolutional layer output, we apply a ReLU activation then a MaxPool with kernel size of 2.
    • Then, three fully connected layers, the first two being followed by a ReLU activation and a dropout whose value you will suggest.
    • The first fully connected layer will have an output size of 512.
    • The second fully connected layer will have an output size of 64.

    Compare the results obtained with this new network to those obtained previously.

    In [25]:
    import torch.nn as nn
    import torch.nn.functional as F
    
    # define the CNN architecture
    
    
    class second_Net(nn.Module):
        def __init__(self):
            super(second_Net, self).__init__()
            self.conv1 = nn.Conv2d(3, 16, 3, padding=1)
            self.pool = nn.MaxPool2d(2, 2)
            self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
            self.conv3 = nn.Conv2d(32, 64, 3, padding=1)
            #fc1, fc2 and fc3 are the fully connected layers
            self.fc1 = nn.Linear(64 * 4 * 4, 512)
            self.fc2 = nn.Linear(512, 64)
            self.fc3 = nn.Linear(64, 10)
            self.dropout = nn.Dropout(0.2)
    
        def forward(self, x):
            x = self.pool(F.relu(self.conv1(x)))
            x = self.pool(F.relu(self.conv2(x)))
            x = self.pool(F.relu(self.conv3(x)))
            x = x.view(-1, 64 * 4 * 4)#to reshape the outpout tensor x into a shape that can be fed into a subsequent fully connected layer
            x = self.dropout(F.relu(self.fc1(x)))
            x = self.dropout(F.relu(self.fc2(x)))
            x = self.fc3(x)#we apply only the fully connected layer
            return x
    
    
    # create a complete CNN
    model = second_Net()
    print(model)
    # move tensors to GPU if CUDA is available
    if train_on_gpu:
        model.cuda()
    Out [25]:
    second_Net(
      (conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (conv2): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (conv3): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (fc1): Linear(in_features=1024, out_features=512, bias=True)
      (fc2): Linear(in_features=512, out_features=64, bias=True)
      (fc3): Linear(in_features=64, out_features=10, bias=True)
      (dropout): Dropout(p=0.2, inplace=False)
    )
    
    In [26]:
    import torch.optim as optim
    
    criterion = nn.CrossEntropyLoss()  # specify loss function
    optimizer = optim.SGD(model.parameters(), lr=0.01)  # specify optimizer
    
    n_epochs = 30  # number of epochs to train the model
    train_loss_list = []  # list to store loss to visualize
    valid_loss_min = np.Inf  # track change in validation loss
    
    for epoch in range(n_epochs):
        # Keep track of training and validation loss
        train_loss = 0.0
        valid_loss = 0.0
    
        # Train the model
        model.train()
        for data, target in train_loader:
            # Move tensors to GPU if CUDA is available
            if train_on_gpu:
                data, target = data.cuda(), target.cuda()
            # Clear the gradients of all optimized variables
            optimizer.zero_grad()
            # Forward pass: compute predicted outputs by passing inputs to the model
            output = model(data)
            # Calculate the batch loss
            loss = criterion(output, target)
            # Backward pass: compute gradient of the loss with respect to model parameters
            loss.backward()
            # Perform a single optimization step (parameter update)
            optimizer.step()
            # Update training loss
            train_loss += loss.item() * data.size(0)
    
        # Validate the model
        model.eval()
        for data, target in valid_loader:
            # Move tensors to GPU if CUDA is available
            if train_on_gpu:
                data, target = data.cuda(), target.cuda()
            # Forward pass: compute predicted outputs by passing inputs to the model
            output = model(data)
            # Calculate the batch loss
            loss = criterion(output, target)
            # Update average validation loss
            valid_loss += loss.item() * data.size(0)
    
        # Calculate average losses
        train_loss = train_loss / len(train_loader)
        valid_loss = valid_loss / len(valid_loader)
        train_loss_list.append(train_loss)
    
        # Print training/validation statistics
        print(
            "Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}".format(
                epoch, train_loss, valid_loss
            )
        )
    
        # Save model if validation loss has decreased
        if valid_loss <= valid_loss_min:
            print(
                "Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...".format(
                    valid_loss_min, valid_loss
                )
            )
            torch.save(model.state_dict(), "model_cifar.pt")
            valid_loss_min = valid_loss
    Out [26]:
    Epoch: 0 	Training Loss: 45.290243 	Validation Loss: 41.165060
    Validation loss decreased (inf --> 41.165060).  Saving model ...
    Epoch: 1 	Training Loss: 37.510178 	Validation Loss: 32.934895
    Validation loss decreased (41.165060 --> 32.934895).  Saving model ...
    Epoch: 2 	Training Loss: 32.064958 	Validation Loss: 29.373906
    Validation loss decreased (32.934895 --> 29.373906).  Saving model ...
    Epoch: 3 	Training Loss: 29.183704 	Validation Loss: 27.515824
    Validation loss decreased (29.373906 --> 27.515824).  Saving model ...
    Epoch: 4 	Training Loss: 26.968789 	Validation Loss: 25.998306
    Validation loss decreased (27.515824 --> 25.998306).  Saving model ...
    Epoch: 5 	Training Loss: 25.027486 	Validation Loss: 23.763529
    Validation loss decreased (25.998306 --> 23.763529).  Saving model ...
    Epoch: 6 	Training Loss: 23.155620 	Validation Loss: 22.014464
    Validation loss decreased (23.763529 --> 22.014464).  Saving model ...
    Epoch: 7 	Training Loss: 21.475198 	Validation Loss: 20.990993
    Validation loss decreased (22.014464 --> 20.990993).  Saving model ...
    Epoch: 8 	Training Loss: 19.979418 	Validation Loss: 19.800872
    Validation loss decreased (20.990993 --> 19.800872).  Saving model ...
    Epoch: 9 	Training Loss: 18.691629 	Validation Loss: 19.004915
    Validation loss decreased (19.800872 --> 19.004915).  Saving model ...
    Epoch: 10 	Training Loss: 17.424382 	Validation Loss: 18.288136
    Validation loss decreased (19.004915 --> 18.288136).  Saving model ...
    Epoch: 11 	Training Loss: 16.307390 	Validation Loss: 18.244697
    Validation loss decreased (18.288136 --> 18.244697).  Saving model ...
    Epoch: 12 	Training Loss: 15.167211 	Validation Loss: 18.015783
    Validation loss decreased (18.244697 --> 18.015783).  Saving model ...
    Epoch: 13 	Training Loss: 14.113395 	Validation Loss: 16.667591
    Validation loss decreased (18.015783 --> 16.667591).  Saving model ...
    Epoch: 14 	Training Loss: 13.236849 	Validation Loss: 16.125387
    Validation loss decreased (16.667591 --> 16.125387).  Saving model ...
    Epoch: 15 	Training Loss: 12.238603 	Validation Loss: 17.085894
    Epoch: 16 	Training Loss: 11.353403 	Validation Loss: 16.390383
    Epoch: 17 	Training Loss: 10.481450 	Validation Loss: 16.541884
    Epoch: 18 	Training Loss: 9.706518 	Validation Loss: 15.796362
    Validation loss decreased (16.125387 --> 15.796362).  Saving model ...
    Epoch: 19 	Training Loss: 8.832767 	Validation Loss: 16.756571
    Epoch: 20 	Training Loss: 8.153805 	Validation Loss: 16.796708
    Epoch: 21 	Training Loss: 7.322154 	Validation Loss: 17.710617
    Epoch: 22 	Training Loss: 6.726637 	Validation Loss: 18.109654
    Epoch: 23 	Training Loss: 6.115624 	Validation Loss: 18.905652
    Epoch: 24 	Training Loss: 5.584887 	Validation Loss: 19.411039
    Epoch: 25 	Training Loss: 5.083100 	Validation Loss: 19.586951
    Epoch: 26 	Training Loss: 4.624071 	Validation Loss: 19.442007
    Epoch: 27 	Training Loss: 4.313399 	Validation Loss: 21.781538
    Epoch: 28 	Training Loss: 4.065315 	Validation Loss: 21.534176
    Epoch: 29 	Training Loss: 3.495624 	Validation Loss: 21.116070
    
    In [27]:
    import matplotlib.pyplot as plt
    
    plt.plot(range(n_epochs), train_loss_list)
    plt.xlabel("Epoch")
    plt.ylabel("Loss")
    plt.title("Performance of Model 2")
    plt.show()
    out [27]:
    In [28]:
    model.load_state_dict(torch.load("./model_cifar.pt"))
    
    # track test loss
    test_loss = 0.0
    class_correct = list(0.0 for i in range(10))
    class_total = list(0.0 for i in range(10))
    
    model.eval()
    # iterate over test data
    for data, target in test_loader:
        # move tensors to GPU if CUDA is available
        if train_on_gpu:
            data, target = data.cuda(), target.cuda()
        # forward pass: compute predicted outputs by passing inputs to the model
        output = model(data)
        # calculate the batch loss
        loss = criterion(output, target)
        # update test loss
        test_loss += loss.item() * data.size(0)
        # convert output probabilities to predicted class
        _, pred = torch.max(output, 1)
        # compare predictions to true label
        correct_tensor = pred.eq(target.data.view_as(pred))
        correct = (
            np.squeeze(correct_tensor.numpy())
            if not train_on_gpu
            else np.squeeze(correct_tensor.cpu().numpy())
        )
        # calculate test accuracy for each object class
        for i in range(batch_size):
            label = target.data[i]
            class_correct[label] += correct[i].item()
            class_total[label] += 1
    
    # average test loss
    test_loss = test_loss / len(test_loader)
    print("Test Loss: {:.6f}\n".format(test_loss))
    
    for i in range(10):
        if class_total[i] > 0:
            print(
                "Test Accuracy of %5s: %2d%% (%2d/%2d)"
                % (
                    classes[i],
                    100 * class_correct[i] / class_total[i],
                    np.sum(class_correct[i]),
                    np.sum(class_total[i]),
                )
            )
        else:
            print("Test Accuracy of %5s: N/A (no training examples)" % (classes[i]))
    
    print(
        "\nTest Accuracy (Overall): %2d%% (%2d/%2d)"
        % (
            100.0 * np.sum(class_correct) / np.sum(class_total),
            np.sum(class_correct),
            np.sum(class_total),
        )
    )
    Out [28]:
    Test Loss: 15.936195
    
    Test Accuracy of airplane: 78% (789/1000)
    Test Accuracy of automobile: 89% (898/1000)
    Test Accuracy of  bird: 57% (572/1000)
    Test Accuracy of   cat: 60% (605/1000)
    Test Accuracy of  deer: 73% (733/1000)
    Test Accuracy of   dog: 63% (635/1000)
    Test Accuracy of  frog: 76% (767/1000)
    Test Accuracy of horse: 75% (751/1000)
    Test Accuracy of  ship: 83% (831/1000)
    Test Accuracy of truck: 76% (762/1000)
    
    Test Accuracy (Overall): 73% (7343/10000)
    

    Exercise 2: Quantization: try to compress the CNN to save space

    Quantization doc is available from https://pytorch.org/docs/stable/quantization.html#torch.quantization.quantize_dynamic

    The Exercise is to quantize post training the above CNN model. Compare the size reduction and the impact on the classification accuracy

    The size of the model is simply the size of the file.

    In [29]:
    import os
    
    
    def print_size_of_model(model, label=""):
        torch.save(model.state_dict(), "temp.p")
        size = os.path.getsize("temp.p")
        print("model: ", label, " \t", "Size (KB):", size / 1e3)
        os.remove("temp.p")
        return size
    
    
    print_size_of_model(model, "fp32")
    Out [29]:
    model:  fp32  	 Size (KB): 2330.946
    
    Out [29]:
    2330946

    Post training quantization example

    In [30]:
    import torch.quantization
    
    
    quantized_model = torch.quantization.quantize_dynamic(model, dtype=torch.qint8)
    print_size_of_model(quantized_model, "int8")
    Out [30]:
    model:  int8  	 Size (KB): 659.806
    
    Out [30]:
    659806

    We observe a significant decline in model size from 2330.946KB to 659.806KB which proves the effectivness of quantization in reducing the model's size

    For each class, compare the classification test accuracy of the initial model and the quantized model. Also give the overall test accuracy for both models.

    In [33]:
    
    def compare():
        # track test loss
        test_loss = 0.0
        test_loss_quantized= 0.0
        class_correct = list(0.0 for i in range(10))
        class_total = list(0.0 for i in range(10))
        class_correct_quantized = list(0.0 for i in range(10))
        class_total_quantized = list(0.0 for i in range(10))
    
        model.eval()
        quantized_model.eval()
        # iterate over test data
        for data, target in test_loader:
            # move tensors to GPU if CUDA is available
            if train_on_gpu:
                data, target = data.cuda(), target.cuda()
            # forward pass: compute predicted outputs by passing inputs to the model
            output = model(data)
            quantized_output=quantized_model(data)
            # calculate the batch loss
            loss = criterion(output, target)
            quantized_loss=criterion(quantized_output, target)
            # update test loss
            test_loss += loss.item() * data.size(0)
            test_loss_quantized+= quantized_loss.item() * data.size(0)
            # convert output probabilities to predicted class
            _, pred = torch.max(output, 1)
            _, quantized_pred = torch.max(quantized_output, 1)
            # compare predictions to true label
            correct_tensor = pred.eq(target.data.view_as(pred))
            quantized_correct_tensor = quantized_pred.eq(target.data.view_as(quantized_pred))
            correct = (
                np.squeeze(correct_tensor.numpy())
                if not train_on_gpu
                else np.squeeze(correct_tensor.cpu().numpy())
            )
            quantized_correct = (
                np.squeeze(quantized_correct_tensor.numpy())
                if not train_on_gpu
                else np.squeeze(quantized_correct_tensor.cpu().numpy())
            )
            # calculate test accuracy for each object class for the non quantized model
            for i in range(batch_size):
                label = target.data[i]
                class_correct[label] += correct[i].item()
                class_total[label] += 1
            # calculate test accuracy for each object class for the quantized model
            for i in range(batch_size):
                label = target.data[i]
                class_correct_quantized[label] += quantized_correct[i].item()
                class_total_quantized[label] += 1    
        # average test loss for the non quantized model 
        test_loss = test_loss / len(test_loader)
        print("Test Loss: {:.6f}\n".format(test_loss))
        # average test loss for the  quantized model 
        test_loss_quantized = test_loss_quantized / len(test_loader)
        print("Test Loss quantized: {:.6f}\n".format(test_loss_quantized))
        for i in range(10):
            if class_total[i] > 0:
                print(
                    "Test Accuracy of %5s: %2d%% (%2d/%2d)"
                    % (
                        classes[i],
                        100 * class_correct[i] / class_total[i],
                        np.sum(class_correct[i]),
                        np.sum(class_total[i]),
                    )
                )
            else:
                print("Test Accuracy of %5s: N/A (no training examples)" % (classes[i]))
    
        print(
            "\nTest Accuracy (Overall): %2d%% (%2d/%2d)"
            % (
                100.0 * np.sum(class_correct) / np.sum(class_total),
                np.sum(class_correct),
                np.sum(class_total),
            )
        )
    
        # average test loss for the quantized model 
        test_loss_quantized = test_loss_quantized / len(test_loader)
        print("Test Loss: {:.6f}\n".format(test_loss_quantized))
    
        for i in range(10):
            if class_total_quantized[i] > 0:
                print(
                    "Test Accuracy of %5s: %2d%% (%2d/%2d)"
                    % (
                        classes[i],
                        100 * class_correct_quantized[i] / class_total_quantized[i],
                        np.sum(class_correct_quantized[i]),
                        np.sum(class_total_quantized[i]),
                    )
                )
            else:
                print("Test Accuracy of %5s: N/A (no training examples)" % (classes[i]))
    
        print(
            "\nTest Accuracy (Overall): %2d%% (%2d/%2d)"
            % (
                100.0 * np.sum(class_correct_quantized) / np.sum(class_total_quantized),
                np.sum(class_correct_quantized),
                np.sum(class_total_quantized),
            )
        )
        # the comparision between the two models
        print('\n \n')
        quant_class_model=0
        class_model=0
        for i in range(10):
            quant_class_model=100 * class_correct_quantized[i] / class_total_quantized[i]
            class_model=100 * class_correct[i] / class_total[i]
            if quant_class_model>class_model:
                print('the quantized model performed better for the class',classes[i],'with accuracy equal to',quant_class_model)
            elif quant_class_model<class_model:
                print('the non quantized model performed better for the class',classes[i],'with accuracy equal to',quant_class_model)
            elif quant_class_model==class_model:
                print('the models performed the same for the class',classes[i])
    In [34]:
    compare()
    Out [34]:
    Test Loss: 15.936195
    
    Test Loss quantized: 15.949273
    
    Test Accuracy of airplane: 78% (789/1000)
    Test Accuracy of automobile: 89% (898/1000)
    Test Accuracy of  bird: 57% (572/1000)
    Test Accuracy of   cat: 60% (605/1000)
    Test Accuracy of  deer: 73% (733/1000)
    Test Accuracy of   dog: 63% (635/1000)
    Test Accuracy of  frog: 76% (767/1000)
    Test Accuracy of horse: 75% (751/1000)
    Test Accuracy of  ship: 83% (831/1000)
    Test Accuracy of truck: 76% (762/1000)
    
    Test Accuracy (Overall): 73% (7343/10000)
    Test Loss: 0.031899
    
    Test Accuracy of airplane: 78% (789/1000)
    Test Accuracy of automobile: 89% (898/1000)
    Test Accuracy of  bird: 57% (574/1000)
    Test Accuracy of   cat: 60% (600/1000)
    Test Accuracy of  deer: 73% (733/1000)
    Test Accuracy of   dog: 63% (632/1000)
    Test Accuracy of  frog: 76% (765/1000)
    Test Accuracy of horse: 75% (752/1000)
    Test Accuracy of  ship: 83% (835/1000)
    Test Accuracy of truck: 76% (761/1000)
    
    Test Accuracy (Overall): 73% (7339/10000)
    
     
    
    the models performed the same for the class airplane
    the models performed the same for the class automobile
    the quantized model performed better for the class bird with accuracy equal to 57.4
    the non quantized model performed better for the class cat with accuracy equal to 60.0
    the models performed the same for the class deer
    the non quantized model performed better for the class dog with accuracy equal to 63.2
    the non quantized model performed better for the class frog with accuracy equal to 76.5
    the quantized model performed better for the class horse with accuracy equal to 75.2
    the quantized model performed better for the class ship with accuracy equal to 83.5
    the non quantized model performed better for the class truck with accuracy equal to 76.1
    

    We can observe that the non quantized model performed better than the quantized model for several classes. There seems to be a noticeable accuracy reduction in the quantized model for certain classes. Still, the quantized model performed better for other classes.Yet, the negative effect is more abundent than the positive one (6 classes out of 9). Thus, we can assume that quantization effect negatively the accuracy.

    Try training aware quantization to mitigate the impact on the accuracy (doc available here https://pytorch.org/docs/stable/quantization.html#torch.quantization.quantize_dynamic)

    In [35]:
    class Second_Net(nn.Module):
        def __init__(self):
            super(Second_Net, self).__init__()
            self.conv1 = nn.Conv2d(3, 16, 3, padding=1)
            self.pool = nn.MaxPool2d(2, 2)
            self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
            self.conv3 = nn.Conv2d(32, 64, 3, padding=1)
            self.fc1 = nn.Linear(64 * 4 * 4, 512)
            self.fc2 = nn.Linear(512, 64)
            self.fc3 = nn.Linear(64, 10)
            self.dropout = nn.Dropout(0.2)
            self.quant = torch.quantization.QuantStub()
            self.dequant = torch.quantization.DeQuantStub()
    
        def forward(self, x):
            x = self.quant(x)
            x = self.pool(F.relu(self.conv1(x)))
            x = self.pool(F.relu(self.conv2(x)))
            x = self.pool(F.relu(self.conv3(x)))
            x = x.view(-1, 64 * 4 * 4)
            x = self.dropout(F.relu(self.fc1(x)))
            x = self.dropout(F.relu(self.fc2(x)))
            x = self.fc3(x)
            x = self.dequant(x)
            return x
    In [36]:
    '''implementing the aware quantization'''
    from torch.ao.quantization import QConfigMapping
    import torch.quantization.quantize_fx as quantize_fx
    import copy
    
    model_fp=Second_Net()
    
    model_fp.train()
    model_to_quantize = copy.deepcopy(model_fp)
    model.qconfig = torch.quantization.get_default_qat_qconfig("qnnpack")
    model_qat = torch.quantization.prepare_qat(model_fp, inplace=False)
    # quantization aware training goes here
    model_qat = torch.quantization.convert(model_qat.eval(), inplace=False)
    n_epochs=30
    criterion = nn.CrossEntropyLoss()  # specify loss function
    optimizer = optim.SGD(model_qat.parameters(), lr=0.01)  # specify optimizer
    for epoch in range(n_epochs):
        # Keep track of training and validation loss
        train_loss = 0.0
        valid_loss = 0.0
    
        # Train the model
        model_qat.train()
        for data, target in train_loader:
            # Move tensors to GPU if CUDA is available
            if train_on_gpu:
                data, target = data.cuda(), target.cuda()
            # Clear the gradients of all optimized variables
            optimizer.zero_grad()
            # Forward pass: compute predicted outputs by passing inputs to the model
            output = model_qat(data)
            # Calculate the batch loss
            loss = criterion(output, target)
            # Backward pass: compute gradient of the loss with respect to model parameters
            loss.backward()
            # Perform a single optimization step (parameter update)
            optimizer.step()
            # Update training loss
            train_loss += loss.item() * data.size(0)
    
        # Validate the model
        model_qat.eval()
        for data, target in valid_loader:
            # Move tensors to GPU if CUDA is available
            if train_on_gpu:
                data, target = data.cuda(), target.cuda()
            # Forward pass: compute predicted outputs by passing inputs to the model
            output = model_qat(data)
            # Calculate the batch loss
            loss = criterion(output, target)
            # Update average validation loss
            valid_loss += loss.item() * data.size(0)
    
        # Calculate average losses
        train_loss = train_loss / len(train_loader)
        valid_loss = valid_loss / len(valid_loader)
        train_loss_list.append(train_loss)
    
        # Print training/validation statistics
        print(
            "Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}".format(
                epoch, train_loss, valid_loss
            )
        )
    Out [36]:
    C:\Users\Maha Kosksi\AppData\Roaming\Python\Python311\site-packages\torch\ao\quantization\quantize.py:309: UserWarning: None of the submodule got qconfig applied. Make sure you passed correct configuration through `qconfig_dict` or by assigning the `.qconfig` attribute directly on submodules
      warnings.warn("None of the submodule got qconfig applied. Make sure you "
    
    Out [36]:
    Epoch: 0 	Training Loss: 45.423676 	Validation Loss: 42.686249
    Epoch: 1 	Training Loss: 39.450979 	Validation Loss: 34.980716
    Epoch: 2 	Training Loss: 33.441739 	Validation Loss: 30.549556
    Epoch: 3 	Training Loss: 30.410467 	Validation Loss: 28.980548
    Epoch: 4 	Training Loss: 28.103745 	Validation Loss: 26.466495
    Epoch: 5 	Training Loss: 26.096422 	Validation Loss: 25.030080
    Epoch: 6 	Training Loss: 24.164691 	Validation Loss: 24.016260
    Epoch: 7 	Training Loss: 22.486316 	Validation Loss: 21.297213
    Epoch: 8 	Training Loss: 21.019347 	Validation Loss: 20.847245
    Epoch: 9 	Training Loss: 19.674915 	Validation Loss: 19.561787
    Epoch: 10 	Training Loss: 18.569554 	Validation Loss: 19.161279
    Epoch: 11 	Training Loss: 17.338444 	Validation Loss: 18.754486
    Epoch: 12 	Training Loss: 16.313604 	Validation Loss: 17.663940
    Epoch: 13 	Training Loss: 15.244958 	Validation Loss: 17.191757
    Epoch: 14 	Training Loss: 14.236570 	Validation Loss: 17.347870
    Epoch: 15 	Training Loss: 13.233929 	Validation Loss: 17.197369
    Epoch: 16 	Training Loss: 12.332780 	Validation Loss: 17.899901
    Epoch: 17 	Training Loss: 11.477178 	Validation Loss: 16.836718
    Epoch: 18 	Training Loss: 10.524045 	Validation Loss: 16.791954
    Epoch: 19 	Training Loss: 9.667919 	Validation Loss: 17.603604
    Epoch: 20 	Training Loss: 8.968887 	Validation Loss: 18.133368
    Epoch: 21 	Training Loss: 8.172313 	Validation Loss: 18.145785
    Epoch: 22 	Training Loss: 7.446161 	Validation Loss: 18.876433
    Epoch: 23 	Training Loss: 6.840206 	Validation Loss: 18.421874
    Epoch: 24 	Training Loss: 6.165971 	Validation Loss: 18.937731
    Epoch: 25 	Training Loss: 5.730384 	Validation Loss: 19.711346
    Epoch: 26 	Training Loss: 5.146225 	Validation Loss: 19.878864
    Epoch: 27 	Training Loss: 4.777133 	Validation Loss: 21.688155
    Epoch: 28 	Training Loss: 4.323573 	Validation Loss: 21.270596
    Epoch: 29 	Training Loss: 4.000249 	Validation Loss: 24.816025
    
    In [37]:
    print(model_qat)
    Out [37]:
    Second_Net(
      (conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (conv2): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (conv3): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (fc1): Linear(in_features=1024, out_features=512, bias=True)
      (fc2): Linear(in_features=512, out_features=64, bias=True)
      (fc3): Linear(in_features=64, out_features=10, bias=True)
      (dropout): Dropout(p=0.2, inplace=False)
      (quant): QuantStub()
      (dequant): DeQuantStub()
    )
    
    In [38]:
    # track test loss
    test_loss = 0.0
    class_correct = list(0.0 for i in range(10))
    class_total = list(0.0 for i in range(10))
    quantized_model = torch.ao.quantization.convert(model_qat.eval(), inplace=False)
    model_quantized=quantized_model.eval()
    # iterate over test data
    for data, target in test_loader:
        # move tensors to GPU if CUDA is available
        if train_on_gpu:
            data, target = data.cuda(), target.cuda()
        # forward pass: compute predicted outputs by passing inputs to the model
        output = model_quantized(data)
        # calculate the batch loss
        loss = criterion(output, target)
        # update test loss
        test_loss += loss.item() * data.size(0)
        # convert output probabilities to predicted class
        _, pred = torch.max(output, 1)
        # compare predictions to true label
        correct_tensor = pred.eq(target.data.view_as(pred))
        correct = (
            np.squeeze(correct_tensor.numpy())
            if not train_on_gpu
            else np.squeeze(correct_tensor.cpu().numpy())
        )
        # calculate test accuracy for each object class
        for i in range(batch_size):
            label = target.data[i]
            class_correct[label] += correct[i].item()
            class_total[label] += 1
    
    # average test loss
    test_loss = test_loss / len(test_loader)
    print("Test Loss: {:.6f}\n".format(test_loss))
    
    for i in range(10):
        if class_total[i] > 0:
            print(
                "Test Accuracy of %5s: %2d%% (%2d/%2d)"
                % (
                    classes[i],
                    100 * class_correct[i] / class_total[i],
                    np.sum(class_correct[i]),
                    np.sum(class_total[i]),
                )
            )
        else:
            print("Test Accuracy of %5s: N/A (no training examples)" % (classes[i]))
    
    print(
        "\nTest Accuracy (Overall): %2d%% (%2d/%2d)"
        % (
            100.0 * np.sum(class_correct) / np.sum(class_total),
            np.sum(class_correct),
            np.sum(class_total),
        )
    )
    Out [38]:
    Test Loss: 24.669794
    
    Test Accuracy of airplane: 73% (737/1000)
    Test Accuracy of automobile: 73% (730/1000)
    Test Accuracy of  bird: 62% (622/1000)
    Test Accuracy of   cat: 47% (472/1000)
    Test Accuracy of  deer: 76% (768/1000)
    Test Accuracy of   dog: 64% (645/1000)
    Test Accuracy of  frog: 83% (832/1000)
    Test Accuracy of horse: 78% (782/1000)
    Test Accuracy of  ship: 76% (769/1000)
    Test Accuracy of truck: 83% (833/1000)
    
    Test Accuracy (Overall): 71% (7190/10000)
    
    In [39]:
    quantized_model=model_quantized
    compare()
    Out [39]:
    Test Loss: 15.936195
    
    Test Loss quantized: 24.669794
    
    Test Accuracy of airplane: 78% (789/1000)
    Test Accuracy of automobile: 89% (898/1000)
    Test Accuracy of  bird: 57% (572/1000)
    Test Accuracy of   cat: 60% (605/1000)
    Test Accuracy of  deer: 73% (733/1000)
    Test Accuracy of   dog: 63% (635/1000)
    Test Accuracy of  frog: 76% (767/1000)
    Test Accuracy of horse: 75% (751/1000)
    Test Accuracy of  ship: 83% (831/1000)
    Test Accuracy of truck: 76% (762/1000)
    
    Test Accuracy (Overall): 73% (7343/10000)
    Test Loss: 0.049340
    
    Test Accuracy of airplane: 73% (737/1000)
    Test Accuracy of automobile: 73% (730/1000)
    Test Accuracy of  bird: 62% (622/1000)
    Test Accuracy of   cat: 47% (472/1000)
    Test Accuracy of  deer: 76% (768/1000)
    Test Accuracy of   dog: 64% (645/1000)
    Test Accuracy of  frog: 83% (832/1000)
    Test Accuracy of horse: 78% (782/1000)
    Test Accuracy of  ship: 76% (769/1000)
    Test Accuracy of truck: 83% (833/1000)
    
    Test Accuracy (Overall): 71% (7190/10000)
    
     
    
    the non quantized model performed better for the class airplane with accuracy equal to 73.7
    the non quantized model performed better for the class automobile with accuracy equal to 73.0
    the quantized model performed better for the class bird with accuracy equal to 62.2
    the non quantized model performed better for the class cat with accuracy equal to 47.2
    the quantized model performed better for the class deer with accuracy equal to 76.8
    the quantized model performed better for the class dog with accuracy equal to 64.5
    the quantized model performed better for the class frog with accuracy equal to 83.2
    the quantized model performed better for the class horse with accuracy equal to 78.2
    the non quantized model performed better for the class ship with accuracy equal to 76.9
    the quantized model performed better for the class truck with accuracy equal to 83.3
    

    Even with quantization aware training, the model's accuracy dropped significantly, we can observe a significant loss in the accuracy of the model. We can try to matigate that by retraining the quantized model.

    Exercise 3: working with pre-trained models.

    PyTorch offers several pre-trained models https://pytorch.org/vision/0.8/models.html
    We will use ResNet50 trained on ImageNet dataset (https://www.image-net.org/index.php). Use the following code with the files imagenet-simple-labels.json that contains the imagenet labels and the image dog.png that we will use as test.

    In [89]:
    import json
    from PIL import Image
    
    # Choose an image to pass through the model
    test_image = "dog.png"
    
    # Configure matplotlib for pretty inline plots
    #%matplotlib inline
    #%config InlineBackend.figure_format = 'retina'
    
    # Prepare the labels
    with open("imagenet-simple-labels.json") as f:
        labels = json.load(f)
    
    # First prepare the transformations: resize the image to what the model was trained on and convert it to a tensor
    data_transform = transforms.Compose(
        [
            transforms.Resize((224, 224)),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
        ]
    )
    # Load the image
    
    image = Image.open(test_image)
    plt.imshow(image), plt.xticks([]), plt.yticks([])
    
    # Now apply the transformation, expand the batch dimension, and send the image to the GPU
    # image = data_transform(image).unsqueeze(0).cuda()
    image = data_transform(image).unsqueeze(0)
    
    # Download the model if it's not there already. It will take a bit on the first run, after that it's fast
    model = models.resnet50(pretrained=True)
    # Send the model to the GPU
    # model.cuda()
    # Set layers such as dropout and batchnorm in evaluation mode
    model.eval()
    
    # Get the 1000-dimensional model output
    out = model(image)
    # Find the predicted class
    print("Predicted class is: {}".format(labels[out.argmax()]))
    Out [89]:
    Predicted class is: Golden Retriever
    

    Experiments:

    Study the code and the results obtained. Possibly add other images downloaded from the internet.

    What is the size of the model? Quantize it and then check if the model is still able to correctly classify the other images.

    Experiment with other pre-trained CNN models.

    In [90]:
    print(models.resnet50(pretrained=True))
    Out [90]:
    ResNet(
      (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
      (layer1): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): Bottleneck(
          (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
        )
        (2): Bottleneck(
          (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
        )
      )
      (layer2): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
            (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): Bottleneck(
          (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
        )
        (2): Bottleneck(
          (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
        )
        (3): Bottleneck(
          (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
        )
      )
      (layer3): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
            (1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
        )
        (2): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
        )
        (3): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
        )
        (4): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
        )
        (5): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
        )
      )
      (layer4): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
            (1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): Bottleneck(
          (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
        )
        (2): Bottleneck(
          (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplace=True)
        )
      )
      (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
      (fc): Linear(in_features=2048, out_features=1000, bias=True)
    )
    

    Model size

    In [91]:
    print_size_of_model(model, label="fp32")
    Out [91]:
    model:  fp32  	 Size (KB): 102523.238
    
    Out [91]:
    102523238

    Model Quantized

    In [92]:
    import torch.quantization
    quantized_model = torch.quantization.quantize_dynamic(model, dtype=torch.qint8)
    print_size_of_model(quantized_model, label="int8")
    Out [92]:
    model:  int8  	 Size (KB): 96379.996
    
    Out [92]:
    96379996

    Testing with model quantized

    In [93]:
    
    # Set layers such as dropout and batchnorm in evaluation mode
    quantized_model.eval()
    
    # Get the 1000-dimensional model output
    out = quantized_model(image)
    # Find the predicted class
    print("Predicted class is: {}".format(labels[out.argmax()]))
    Out [93]:
    Predicted class is: Golden Retriever
    

    Testing with a new image

    In [101]:
    import json
    from PIL import Image
    
    # Choose an image to pass through the model
    test_image = "cat.jpg"
    
    # Configure matplotlib for pretty inline plots
    #%matplotlib inline
    #%config InlineBackend.figure_format = 'retina'
    
    # Prepare the labels
    with open("imagenet-simple-labels.json") as f:
        labels = json.load(f)
    
    # First prepare the transformations: resize the image to what the model was trained on and convert it to a tensor
    data_transform = transforms.Compose(
        [
            transforms.Resize((224, 224)),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
        ]
    )
    # Load the image
    
    image = Image.open(test_image)
    plt.imshow(image), plt.xticks([]), plt.yticks([])
    
    # Now apply the transformation, expand the batch dimension, and send the image to the GPU
    # image = data_transform(image).unsqueeze(0).cuda()
    image = data_transform(image).unsqueeze(0)
    
    # Download the model if it's not there already. It will take a bit on the first run, after that it's fast
    model = models.resnet50(pretrained=True)
    quantized_model = torch.quantization.quantize_dynamic(model, dtype=torch.qint8)
    # Send the model to the GPU
    # model.cuda()
    # Set layers such as dropout and batchnorm in evaluation mode
    quantized_model.eval()
    
    # Get the 1000-dimensional model output
    out = quantized_model(image)
    # Find the predicted class
    print("Predicted class is: {}".format(labels[out.argmax()]))
    Out [101]:
    Predicted class is: tabby cat
    

    Experiment with Rnest18

    In [104]:
    import json
    from PIL import Image
    
    # Choose an image to pass through the model
    test_image2= "cat.jpg"
    
    # Configure matplotlib for pretty inline plots
    #%matplotlib inline
    #%config InlineBackend.figure_format = 'retina'
    
    # Prepare the labels
    with open("imagenet-simple-labels.json") as f:
        labels = json.load(f)
    
    # First prepare the transformations: resize the image to what the model was trained on and convert it to a tensor
    data_transform = transforms.Compose(
        [
            transforms.Resize((224, 224)),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
        ]
    )
    # Load the image
    
    image = Image.open(test_image2)
    plt.imshow(image), plt.xticks([]), plt.yticks([])
    
    # Now apply the transformation, expand the batch dimension, and send the image to the GPU
    # image = data_transform(image).unsqueeze(0).cuda()
    image = data_transform(image).unsqueeze(0)
    
    # Download the model if it's not there already. It will take a bit on the first run, after that it's fast
    model = models.resnet18(pretrained=True)
    # Send the model to the GPU
    # model.cuda()
    # Set layers such as dropout and batchnorm in evaluation mode
    model.eval()
    
    # Get the 1000-dimensional model output
    out = model(image)
    # Find the predicted class
    print("Predicted class is: {}".format(labels[out.argmax()]))
    Out [104]:
    Predicted class is: tabby cat
    

    Even with ResNet18 we can see that we get always the correct prediction

    Exercise 4: Transfer Learning

    For this work, we will use a pre-trained model (ResNet18) as a descriptor extractor and will refine the classification by training only the last fully connected layer of the network. Thus, the output layer of the pre-trained network will be replaced by a layer adapted to the new classes to be recognized which will be in our case ants and bees. Download and unzip in your working directory the dataset available at the address :

    https://download.pytorch.org/tutorial/hymenoptera_data.zip

    Execute the following code in order to display some images of the dataset.

    In [105]:
    import os
    
    import matplotlib.pyplot as plt
    import numpy as np
    import torch
    import torchvision
    from torchvision import datasets, transforms
    
    # Data augmentation and normalization for training
    # Just normalization for validation
    data_transforms = {
        "train": transforms.Compose(
            [
                transforms.RandomResizedCrop(
                    224
                ),  # ImageNet models were trained on 224x224 images
                transforms.RandomHorizontalFlip(),  # flip horizontally 50% of the time - increases train set variability
                transforms.ToTensor(),  # convert it to a PyTorch tensor
                transforms.Normalize(
                    [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]
                ),  # ImageNet models expect this norm
            ]
        ),
        "val": transforms.Compose(
            [
                transforms.Resize(256),
                transforms.CenterCrop(224),
                transforms.ToTensor(),
                transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
            ]
        ),
    }
    
    data_dir = "hymenoptera_data"
    # Create train and validation datasets and loaders
    image_datasets = {
        x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x])
        for x in ["train", "val"]
    }
    dataloaders = {
        x: torch.utils.data.DataLoader(
            image_datasets[x], batch_size=4, shuffle=True, num_workers=0
        )
        for x in ["train", "val"]
    }
    dataset_sizes = {x: len(image_datasets[x]) for x in ["train", "val"]}
    class_names = image_datasets["train"].classes
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    
    # Helper function for displaying images
    def imshow(inp, title=None):
        """Imshow for Tensor."""
        inp = inp.numpy().transpose((1, 2, 0))
        mean = np.array([0.485, 0.456, 0.406])
        std = np.array([0.229, 0.224, 0.225])
    
        # Un-normalize the images
        inp = std * inp + mean
        # Clip just in case
        inp = np.clip(inp, 0, 1)
        plt.imshow(inp)
        if title is not None:
            plt.title(title)
        plt.pause(0.001)  # pause a bit so that plots are updated
        plt.show()
    
    
    # Get a batch of training data
    inputs, classes = next(iter(dataloaders["train"]))
    
    # Make a grid from batch
    out = torchvision.utils.make_grid(inputs)
    
    imshow(out, title=[class_names[x] for x in classes])
    
    
    out [105]:

    Now, execute the following code which uses a pre-trained model ResNet18 having replaced the output layer for the ants/bees classification and performs the model training by only changing the weights of this output layer.

    In [106]:
    import copy
    import os
    import time
    
    import matplotlib.pyplot as plt
    import numpy as np
    import torch
    import torch.nn as nn
    import torch.optim as optim
    import torchvision
    from torch.optim import lr_scheduler
    from torchvision import datasets, transforms
    
    # Data augmentation and normalization for training
    # Just normalization for validation
    data_transforms = {
        "train": transforms.Compose(
            [
                transforms.RandomResizedCrop(
                    224
                ),  # ImageNet models were trained on 224x224 images
                transforms.RandomHorizontalFlip(),  # flip horizontally 50% of the time - increases train set variability
                transforms.ToTensor(),  # convert it to a PyTorch tensor
                transforms.Normalize(
                    [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]
                ),  # ImageNet models expect this norm
            ]
        ),
        "val": transforms.Compose(
            [
                transforms.Resize(256),
                transforms.CenterCrop(224),
                transforms.ToTensor(),
                transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
            ]
        ),
    }
    
    data_dir = "hymenoptera_data"
    # Create train and validation datasets and loaders
    image_datasets = {
        x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x])
        for x in ["train", "val"]
    }
    dataloaders = {
        x: torch.utils.data.DataLoader(
            image_datasets[x], batch_size=4, shuffle=True, num_workers=4
        )
        for x in ["train", "val"]
    }
    dataset_sizes = {x: len(image_datasets[x]) for x in ["train", "val"]}
    class_names = image_datasets["train"].classes
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    
    # Helper function for displaying images
    def imshow(inp, title=None):
        """Imshow for Tensor."""
        inp = inp.numpy().transpose((1, 2, 0))
        mean = np.array([0.485, 0.456, 0.406])
        std = np.array([0.229, 0.224, 0.225])
    
        # Un-normalize the images
        inp = std * inp + mean
        # Clip just in case
        inp = np.clip(inp, 0, 1)
        plt.imshow(inp)
        if title is not None:
            plt.title(title)
        plt.pause(0.001)  # pause a bit so that plots are updated
        plt.show()
    
    
    # Get a batch of training data
    # inputs, classes = next(iter(dataloaders['train']))
    
    # Make a grid from batch
    # out = torchvision.utils.make_grid(inputs)
    
    # imshow(out, title=[class_names[x] for x in classes])
    # training
    
    
    def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
        since = time.time()
    
        best_model_wts = copy.deepcopy(model.state_dict())
        best_acc = 0.0
    
        epoch_time = []  # we'll keep track of the time needed for each epoch
    
        for epoch in range(num_epochs):
            epoch_start = time.time()
            print("Epoch {}/{}".format(epoch + 1, num_epochs))
            print("-" * 10)
    
            # Each epoch has a training and validation phase
            for phase in ["train", "val"]:
                if phase == "train":
                    scheduler.step()
                    model.train()  # Set model to training mode
                else:
                    model.eval()  # Set model to evaluate mode
    
                running_loss = 0.0
                running_corrects = 0
    
                # Iterate over data.
                for inputs, labels in dataloaders[phase]:
                    inputs = inputs.to(device)
                    labels = labels.to(device)
    
                    # zero the parameter gradients
                    optimizer.zero_grad()
    
                    # Forward
                    # Track history if only in training phase
                    with torch.set_grad_enabled(phase == "train"):
                        outputs = model(inputs)
                        _, preds = torch.max(outputs, 1)
                        loss = criterion(outputs, labels)
    
                        # backward + optimize only if in training phase
                        if phase == "train":
                            loss.backward()
                            optimizer.step()
    
                    # Statistics
                    running_loss += loss.item() * inputs.size(0)
                    running_corrects += torch.sum(preds == labels.data)
    
                epoch_loss = running_loss / dataset_sizes[phase]
                epoch_acc = running_corrects.double() / dataset_sizes[phase]
    
                print("{} Loss: {:.4f} Acc: {:.4f}".format(phase, epoch_loss, epoch_acc))
    
                # Deep copy the model
                if phase == "val" and epoch_acc > best_acc:
                    best_acc = epoch_acc
                    best_model_wts = copy.deepcopy(model.state_dict())
    
            # Add the epoch time
            t_epoch = time.time() - epoch_start
            epoch_time.append(t_epoch)
            print()
    
        time_elapsed = time.time() - since
        print(
            "Training complete in {:.0f}m {:.0f}s".format(
                time_elapsed // 60, time_elapsed % 60
            )
        )
        print("Best val Acc: {:4f}".format(best_acc))
    
        # Load best model weights
        model.load_state_dict(best_model_wts)
        return model, epoch_time
    
    
    # Download a pre-trained ResNet18 model and freeze its weights
    model = torchvision.models.resnet18(pretrained=True)
    for param in model.parameters():
        param.requires_grad = False
    
    # Replace the final fully connected layer
    # Parameters of newly constructed modules have requires_grad=True by default
    num_ftrs = model.fc.in_features
    model.fc = nn.Linear(num_ftrs, 2)
    # Send the model to the GPU
    model = model.to(device)
    # Set the loss function
    criterion = nn.CrossEntropyLoss()
    
    # Observe that only the parameters of the final layer are being optimized
    optimizer_conv = optim.SGD(model.fc.parameters(), lr=0.001, momentum=0.9)
    exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)
    model, epoch_time = train_model(
        model, criterion, optimizer_conv, exp_lr_scheduler, num_epochs=10
    )
    
    Out [106]:
    Epoch 1/10
    ----------
    
    Out [106]:
    C:\Users\Maha Kosksi\AppData\Roaming\Python\Python311\site-packages\torch\optim\lr_scheduler.py:136: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
      warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
    
    Out [106]:
    train Loss: 0.5899 Acc: 0.6885
    val Loss: 0.3176 Acc: 0.8693
    
    Epoch 2/10
    ----------
    train Loss: 0.4949 Acc: 0.7541
    val Loss: 0.2099 Acc: 0.9216
    
    Epoch 3/10
    ----------
    train Loss: 0.5228 Acc: 0.7746
    val Loss: 0.2384 Acc: 0.9216
    
    Epoch 4/10
    ----------
    train Loss: 0.5035 Acc: 0.8074
    val Loss: 0.1737 Acc: 0.9542
    
    Epoch 5/10
    ----------
    train Loss: 0.5170 Acc: 0.7828
    val Loss: 0.2585 Acc: 0.9150
    
    Epoch 6/10
    ----------
    train Loss: 0.4609 Acc: 0.8197
    val Loss: 0.3260 Acc: 0.8758
    
    Epoch 7/10
    ----------
    train Loss: 0.4133 Acc: 0.8115
    val Loss: 0.2076 Acc: 0.9412
    
    Epoch 8/10
    ----------
    train Loss: 0.3287 Acc: 0.8525
    val Loss: 0.2227 Acc: 0.9216
    
    Epoch 9/10
    ----------
    train Loss: 0.4231 Acc: 0.7828
    val Loss: 0.2084 Acc: 0.9346
    
    Epoch 10/10
    ----------
    train Loss: 0.3747 Acc: 0.8279
    val Loss: 0.1865 Acc: 0.9477
    
    Training complete in 6m 15s
    Best val Acc: 0.954248
    

    Experiments: Study the code and the results obtained.

    Modify the code and add an "eval_model" function to allow the evaluation of the model on a test set (different from the learning and validation sets used during the learning phase). Study the results obtained.

    Now modify the code to replace the current classification layer with a set of two layers using a "relu" activation function for the middle layer, and the "dropout" mechanism for both layers. Renew the experiments and study the results obtained.

    Apply ther quantization (post and quantization aware) and evaluate impact on model size and accuracy.

    We observe a study growth in accuracy and decline in loss through the epochs.At the end,we obtained a good accuracy equal to 94%

    Adding the test dataset We acquired the test set from the following link: https://www.kaggle.com/datasets/lys620/ants-and-beesWe, after that we add the test set to the repo.

    In [114]:
    import copy
    import os
    import time
    
    import matplotlib.pyplot as plt
    import numpy as np
    import torch
    import torch.nn as nn
    import torch.optim as optim
    import torchvision
    from torch.optim import lr_scheduler
    from torchvision import datasets, transforms
    
    # Data augmentation and normalization for training
    # Just normalization for validation
    data_transforms = {
        "train": transforms.Compose(
            [
                transforms.RandomResizedCrop(
                    224
                ),  # ImageNet models were trained on 224x224 images
                transforms.RandomHorizontalFlip(),  # flip horizontally 50% of the time - increases train set variability
                transforms.ToTensor(),  # convert it to a PyTorch tensor
                transforms.Normalize(
                    [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]
                ),  # ImageNet models expect this norm
            ]
        ),
        "val": transforms.Compose(
            [
                transforms.Resize(256),
                transforms.CenterCrop(224),
                transforms.ToTensor(),
                transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
            ]
        ),
        "test": transforms.Compose(
            [
                transforms.Resize(256),
                transforms.CenterCrop(224),
                transforms.ToTensor(),
                transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
            ]
        ),
    }
    
    data_dir = "hymenoptera_data"
    # Create train and validation datasets and loaders
    image_datasets = {
        x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x])
        for x in ["train", "val","test"]
    }
    dataloaders = {
        x: torch.utils.data.DataLoader(
            image_datasets[x], batch_size=4, shuffle=True, num_workers=4
        )
        for x in ["train", "val","test"]
    }
    dataset_sizes = {x: len(image_datasets[x]) for x in ["train", "val"]}
    class_names = image_datasets["train"].classes
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    
    # Helper function for displaying images
    def imshow(inp, title=None):
        """Imshow for Tensor."""
        inp = inp.numpy().transpose((1, 2, 0))
        mean = np.array([0.485, 0.456, 0.406])
        std = np.array([0.229, 0.224, 0.225])
    
        # Un-normalize the images
        inp = std * inp + mean
        # Clip just in case
        inp = np.clip(inp, 0, 1)
        plt.imshow(inp)
        if title is not None:
            plt.title(title)
        plt.pause(0.001)  # pause a bit so that plots are updated
        plt.show()
    
    
    # Get a batch of training data
    # inputs, classes = next(iter(dataloaders['train']))
    
    # Make a grid from batch
    # out = torchvision.utils.make_grid(inputs)
    
    # imshow(out, title=[class_names[x] for x in classes])
    # training
    
    
    def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
        since = time.time()
    
        best_model_wts = copy.deepcopy(model.state_dict())
        best_acc = 0.0
    
        epoch_time = []  # we'll keep track of the time needed for each epoch
    
        for epoch in range(num_epochs):
            epoch_start = time.time()
            print("Epoch {}/{}".format(epoch + 1, num_epochs))
            print("-" * 10)
    
            # Each epoch has a training and validation phase
            for phase in ["train", "val"]:
                if phase == "train":
                    scheduler.step()
                    model.train()  # Set model to training mode
                else:
                    model.eval()  # Set model to evaluate mode
    
                running_loss = 0.0
                running_corrects = 0
    
                # Iterate over data.
                for inputs, labels in dataloaders[phase]:
                    inputs = inputs.to(device)
                    labels = labels.to(device)
    
                    # zero the parameter gradients
                    optimizer.zero_grad()
    
                    # Forward
                    # Track history if only in training phase
                    with torch.set_grad_enabled(phase == "train"):
                        outputs = model(inputs)
                        _, preds = torch.max(outputs, 1)
                        loss = criterion(outputs, labels)
    
                        # backward + optimize only if in training phase
                        if phase == "train":
                            loss.backward()
                            optimizer.step()
    
                    # Statistics
                    running_loss += loss.item() * inputs.size(0)
                    running_corrects += torch.sum(preds == labels.data)
    
                epoch_loss = running_loss / dataset_sizes[phase]
                epoch_acc = running_corrects.double() / dataset_sizes[phase]
    
                print("{} Loss: {:.4f} Acc: {:.4f}".format(phase, epoch_loss, epoch_acc))
    
                # Deep copy the model
                if phase == "val" and epoch_acc > best_acc:
                    best_acc = epoch_acc
                    best_model_wts = copy.deepcopy(model.state_dict())
    
            # Add the epoch time
            t_epoch = time.time() - epoch_start
            epoch_time.append(t_epoch)
            print()
    
        time_elapsed = time.time() - since
        print(
            "Training complete in {:.0f}m {:.0f}s".format(
                time_elapsed // 60, time_elapsed % 60
            )
        )
        print("Best val Acc: {:4f}".format(best_acc))
    
        # Load best model weights
        model.load_state_dict(best_model_wts)
        return model, epoch_time
    
    
    # Download a pre-trained ResNet18 model and freeze its weights
    model = torchvision.models.resnet18(pretrained=True)
    for param in model.parameters():
        param.requires_grad = False
    
    # Replace the final fully connected layer
    # Parameters of newly constructed modules have requires_grad=True by default
    num_ftrs = model.fc.in_features
    model.fc = nn.Linear(num_ftrs, 2)
    # Send the model to the GPU
    model = model.to(device)
    # Set the loss function
    criterion = nn.CrossEntropyLoss()
    
    # Observe that only the parameters of the final layer are being optimized
    optimizer_conv = optim.SGD(model.fc.parameters(), lr=0.001, momentum=0.9)
    exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)
    model, epoch_time = train_model(
        model, criterion, optimizer_conv, exp_lr_scheduler, num_epochs=10
    )
    Out [114]:
    Epoch 1/10
    ----------
    train Loss: 0.5808 Acc: 0.7008
    val Loss: 0.2968 Acc: 0.8889
    
    Epoch 2/10
    ----------
    train Loss: 0.5647 Acc: 0.7377
    val Loss: 0.1818 Acc: 0.9412
    
    Epoch 3/10
    ----------
    train Loss: 0.5458 Acc: 0.7869
    val Loss: 0.2528 Acc: 0.9150
    
    Epoch 4/10
    ----------
    train Loss: 0.5978 Acc: 0.7459
    val Loss: 0.2750 Acc: 0.9085
    
    Epoch 5/10
    ----------
    train Loss: 0.5053 Acc: 0.7746
    val Loss: 0.1689 Acc: 0.9477
    
    Epoch 6/10
    ----------
    train Loss: 0.3449 Acc: 0.8484
    val Loss: 0.2807 Acc: 0.8954
    
    Epoch 7/10
    ----------
    train Loss: 0.3465 Acc: 0.8361
    val Loss: 0.1707 Acc: 0.9542
    
    Epoch 8/10
    ----------
    train Loss: 0.3620 Acc: 0.8443
    val Loss: 0.1705 Acc: 0.9412
    
    Epoch 9/10
    ----------
    train Loss: 0.3452 Acc: 0.8320
    val Loss: 0.1705 Acc: 0.9477
    
    Epoch 10/10
    ----------
    train Loss: 0.3531 Acc: 0.8279
    val Loss: 0.1655 Acc: 0.9542
    
    Training complete in 6m 7s
    Best val Acc: 0.954248
    

    Test function

    In [122]:
    def test_model(model,criterion,optimizer):
        was_training = model.training
        model.eval()
        
        class_correct = list(0.0 for i in range(2))
        class_total = list(0.0 for i in range(2))
    
        with torch.no_grad():
            for i, (inputs, labels) in enumerate(dataloaders['test']):
                inputs = inputs.to(device)
                labels = labels.to(device)
    
                outputs = model(inputs)
                _, preds = torch.max(outputs, 1)
                        
                correct_tensor = preds.eq(labels.data.view_as(preds))
                correct = (
                    np.squeeze(correct_tensor.numpy())
                    if not train_on_gpu
                    else np.squeeze(correct_tensor.cpu().numpy())
                )
                # calculate test accuracy for each object class
                for i in range(3):
                    label = labels.data[i]
                    class_correct[label] += correct[i].item()
                    class_total[label] += 1
    
            model.train(mode=was_training)
        
        for i in range(2):
            if class_total[i] > 0:
                print(
                    "Test Accuracy of %5s: %2d%% (%2d/%2d)"
                    % (
                        class_names[i],
                        100 * class_correct[i] / class_total[i],
                        np.sum(class_correct[i]),
                        np.sum(class_total[i]),
                    )
                )
            else:
                print("Test Accuracy of %5s: N/A (no training examples)" % (class_names[i]))
    
        print(
            "\nTest Accuracy (Overall): %2d%% (%2d/%2d)"
            % (
                100.0 * np.sum(class_correct) / np.sum(class_total),
                np.sum(class_correct),
                np.sum(class_total),
            )
        )
    In [123]:
    ##test the model on the dataset used for test part 
    
    test_model(model,criterion,optimizer_conv)
    Out [123]:
    Test Accuracy of  ants: 100% (19/19)
    Test Accuracy of  bees: 98% (61/62)
    
    Test Accuracy (Overall): 98% (80/81)
    

    Modification of the FC

    In [124]:
    model = torchvision.models.resnet18(pretrained=True)
    for param in model.parameters():
        param.requires_grad = False
    
    # Replace the final fully connected layer
    # Parameters of newly constructed modules have requires_grad=True by default
    num_ftrs = model.fc.in_features
    model.fc = nn.Sequential(
              nn.Linear(num_ftrs, 10),
              nn.ReLU(),
              nn.Dropout(0.4),
              nn.Linear(10, 2),
              nn.Dropout(0.4)
            )
    # Send the model to the GPU
    model = model.to(device)
    # Set the loss function
    criterion = nn.CrossEntropyLoss()
    
    # Observe that only the parameters of the final layer are being optimized
    optimizer_conv = optim.SGD(model.fc.parameters(), lr=0.001, momentum=0.9)
    exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)
    model, epoch_time = train_model(
        model, criterion, optimizer_conv, exp_lr_scheduler, num_epochs=10
    )
    Out [124]:
    Epoch 1/10
    ----------
    train Loss: 0.7118 Acc: 0.5615
    val Loss: 0.6862 Acc: 0.5294
    
    Epoch 2/10
    ----------
    train Loss: 0.6555 Acc: 0.5656
    val Loss: 0.5576 Acc: 0.8105
    
    Epoch 3/10
    ----------
    train Loss: 0.6732 Acc: 0.5410
    val Loss: 0.5467 Acc: 0.8889
    
    Epoch 4/10
    ----------
    train Loss: 0.6386 Acc: 0.6148
    val Loss: 0.5397 Acc: 0.8824
    
    Epoch 5/10
    ----------
    train Loss: 0.6324 Acc: 0.6230
    val Loss: 0.5316 Acc: 0.8824
    
    Epoch 6/10
    ----------
    train Loss: 0.6388 Acc: 0.6025
    val Loss: 0.4614 Acc: 0.8954
    
    Epoch 7/10
    ----------
    train Loss: 0.5687 Acc: 0.6803
    val Loss: 0.4190 Acc: 0.9150
    
    Epoch 8/10
    ----------
    train Loss: 0.5868 Acc: 0.6598
    val Loss: 0.4122 Acc: 0.9085
    
    Epoch 9/10
    ----------
    train Loss: 0.6109 Acc: 0.5943
    val Loss: 0.4159 Acc: 0.9150
    
    Epoch 10/10
    ----------
    train Loss: 0.5935 Acc: 0.6516
    val Loss: 0.4049 Acc: 0.9216
    
    Training complete in 6m 7s
    Best val Acc: 0.921569
    
    In [125]:
    test_model(model,criterion,optimizer_conv)
    Out [125]:
    Test Accuracy of  ants: 100% (17/17)
    Test Accuracy of  bees: 92% (59/64)
    
    Test Accuracy (Overall): 93% (76/81)
    

    Post Quantization

    In [126]:
    import os
    
    def print_size_of_model(model, label=""):
        torch.save(model.state_dict(), "temp.p")
        size = os.path.getsize("temp.p")
        print("model: ", label, " \t", "Size (KB):", size / 1e3)
        os.remove("temp.p")
        return size
    
    
    print_size_of_model(model, "fp32")
    Out [126]:
    model:  fp32  	 Size (KB): 44797.562
    
    Out [126]:
    44797562
    In [127]:
    import torch.quantization
    
    
    quantized_model = torch.quantization.quantize_dynamic(model, dtype=torch.qint8)
    print_size_of_model(quantized_model, "int8")
    test_model(quantized_model,criterion,optimizer_conv)
    Out [127]:
    model:  int8  	 Size (KB): 44783.654
    Test Accuracy of  ants: 100% (18/18)
    Test Accuracy of  bees: 95% (60/63)
    
    Test Accuracy (Overall): 96% (78/81)
    

    We can observe that the test accuracy has been decreased.

    Aware Quantization

    In [128]:
    class QuantizedResNet18(nn.Module):
        def __init__(self, model_fp32):
    
            super(QuantizedResNet18, self).__init__()
            # QuantStub converts tensors from floating point to quantized.
            # This will only be used for inputs.
            self.quant = torch.quantization.QuantStub()
            # DeQuantStub converts tensors from quantized to floating point.
            # This will only be used for outputs.
            self.dequant = torch.quantization.DeQuantStub()
            # FP32 model
            self.model_fp32 = model_fp32
    
        def forward(self, x):
            # manually specify where tensors will be converted from floating
            # point to quantized in the quantized model
            x = self.quant(x)
            x = self.model_fp32(x)
            # manually specify where tensors will be converted from quantized
            # to floating point in the quantized model
            x = self.dequant(x)
            return x
    In [129]:
    import copy
    import torch.quantization.quantize_fx as quantize_fx
    model = torchvision.models.resnet18(pretrained=True)
    
    model_fp=QuantizedResNet18(model)
    
    model_fp.train()
    model_to_quantize = copy.deepcopy(model_fp)
    model.qconfig = torch.quantization.get_default_qat_qconfig("qnnpack")
    model_qat = torch.quantization.prepare_qat(model_fp, inplace=False)
    # quantization aware training goes here
    model_qat = torch.quantization.convert(model_qat.eval(), inplace=False)
    n_epochs=30
    criterion = nn.CrossEntropyLoss()  # specify loss function
    optimizer = optim.SGD(model_qat.parameters(), lr=0.01)  # specify optimizer
    optimizer_conv = optim.SGD(model.fc.parameters(), lr=0.001, momentum=0.9)
    exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)
    model, epoch_time = train_model(
        model, criterion, optimizer_conv, exp_lr_scheduler, num_epochs=10
    )
    Out [129]:
    C:\Users\Maha Kosksi\AppData\Roaming\Python\Python311\site-packages\torch\ao\quantization\utils.py:317: UserWarning: must run observer before calling calculate_qparams. Returning default values.
      warnings.warn(
    
    Out [129]:
    Epoch 1/10
    ----------
    train Loss: 1.7641 Acc: 0.5574
    val Loss: 0.3584 Acc: 0.8889
    
    Epoch 2/10
    ----------
    train Loss: 0.6037 Acc: 0.7254
    val Loss: 0.2845 Acc: 0.9150
    
    Epoch 3/10
    ----------
    train Loss: 0.4551 Acc: 0.7910
    val Loss: 0.2543 Acc: 0.9216
    
    Epoch 4/10
    ----------
    train Loss: 0.5546 Acc: 0.8033
    val Loss: 0.2647 Acc: 0.9216
    
    Epoch 5/10
    ----------
    train Loss: 0.5776 Acc: 0.7705
    val Loss: 0.3787 Acc: 0.8758
    
    Epoch 6/10
    ----------
    train Loss: 0.3114 Acc: 0.8689
    val Loss: 0.2264 Acc: 0.9346
    
    Epoch 7/10
    ----------
    train Loss: 0.3441 Acc: 0.8607
    val Loss: 0.2690 Acc: 0.9281
    
    Epoch 8/10
    ----------
    train Loss: 0.4496 Acc: 0.8197
    val Loss: 0.2658 Acc: 0.9346
    
    Epoch 9/10
    ----------
    train Loss: 0.4247 Acc: 0.8279
    val Loss: 0.2423 Acc: 0.9346
    
    Epoch 10/10
    ----------
    train Loss: 0.3851 Acc: 0.8484
    val Loss: 0.2451 Acc: 0.9477
    
    Training complete in 9m 15s
    Best val Acc: 0.947712
    

    Optional

    Try this at home!!

    Pytorch offers a framework to export a given CNN to your selfphone (either android or iOS). Have a look at the tutorial https://pytorch.org/mobile/home/

    The Exercise consists in deploying the CNN of Exercise 4 in your phone and then test it on live.

    Author

    Alberto BOSIO - Ph. D.