Part 1
In this section we will use a GAN to generate fake numbers using the MNIST dataset.
Changes
- Following the tutorial, the first aspect to change was the
ncvariable, the variable that determines the number of channels of the images, that should be changed to 1. - Then, to import the dataset, use
torchvision.datasets.MNIST, and for the same reason as before, change the transformationtransforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),totransforms.Normalize((0.5,), (0.5,)),to suit the number of channels of the images. - Now the variable
devicewas defined adding anifclause for ARM mac GPUs.
Results
The plot of the losses has a unexpected behavior: the generator loss increases in the first ~50 iterations, while the discriminator has nearly no loss, then there is a peak in iteration ~250. That said, the overall result is pretty ok, with some images turning out recognizable as number.
Part 2
Question 1
Knowing the input and output images will be 256x256, what will be the dimension of the encoded vector x8 ?
Since this parameter is hard coded, the size of x8 will always be 512x512
Question 2
As you can see, U-net has an encoder-decoder architecture with skip connections. Explain why it works better than a traditional encoder-decoder.
The standard encoder-decoder architecture has a limitation that the decoder may not be able to reconstruct details from the compressed image, leading to information loss. Thus, the U-net introduces skip connections that allow the decoder to better recover details and spacial information from the source image.