From 27903bcbf1eb534cadffecffbe2b17634d2b1d7e Mon Sep 17 00:00:00 2001 From: Benyahia Mohammed Oussama <mohammed.benyahia@etu.ec-lyon.fr> Date: Mon, 31 Mar 2025 12:20:05 +0000 Subject: [PATCH] Edit README.md --- README.md | 222 ++++++++++++++++++++++++++++-------------------------- 1 file changed, 114 insertions(+), 108 deletions(-) diff --git a/README.md b/README.md index c19e834..9b8fa61 100644 --- a/README.md +++ b/README.md @@ -71,125 +71,131 @@ To enhance image generation and reduce ambiguities between similar digits (e.g., - [DCGAN Tutorial](https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html) - [MNIST Dataset](https://pytorch.org/vision/stable/generated/torchvision.datasets.MNIST.html#torchvision.datasets.MNIST) -## Part 2: Conditional GAN (cGAN) with U-Net +Here is the corrected version with only the necessary adjustments: -### **Generator** -In the cGAN architecture, the generator chosen is a U-Net. +--- -#### **U-Net Overview:** -- A U-Net takes an image as input and outputs another image. -- It consists of two main parts: an encoder and a decoder. - - The encoder reduces the image dimension to extract main features. - - The decoder reconstructs the image using these features. -- Unlike a simple encoder-decoder model, U-Net has skip connections that link encoder layers to corresponding decoder layers. These allow the decoder to use both high-frequency and low-frequency information. +## Part 2: Conditional GAN (cGAN) with U-Net -#### **Architecture & Implementation:** -The encoder takes a colored picture (3 channels: RGB), processes it through a series of convolutional layers, and encodes the features. The decoder then reconstructs the image using transposed convolutional layers, utilizing skip connections to enhance details. +### **Generator** +In the cGAN architecture, the generator chosen is a U-Net. - +#### **U-Net Overview:** +- A U-Net takes an image as input and outputs another image. +- It consists of two main parts: an encoder and a decoder. + - The encoder reduces the image dimension to extract main features. + - The decoder reconstructs the image using these features. +- Unlike a simple encoder-decoder model, U-Net has skip connections that link encoder layers to corresponding decoder layers. These allow the decoder to use both high-frequency and low-frequency information. +#### **Architecture & Implementation:** +The encoder takes a colored picture (3 channels: RGB), processes it through a series of convolutional layers, and encodes the features. The decoder then reconstructs the image using transposed convolutional layers, utilizing skip connections to enhance details. + + -### **Question:** -Knowing that the input and output images have a shape of 256x256 with 3 channels, what will be the dimension of the feature map "x8"? +### **Question:** +Knowing that the input and output images have a shape of 256x256 with 3 channels, what will be the dimension of the feature map "x8"? -**Answer:** The dimension of the feature map x8 is **[numBatch, 512, 1, 1]**. +**Answer:** The dimension of the feature map x8 is **[numBatch, 512, 32, 32]**. -### **Question:** -Why are skip connections important in the U-Net architecture? +### **Question:** +Why are skip connections important in the U-Net architecture? -#### **Explanation:** -Skip connections link encoder and decoder layers, improving the model in several ways: -- **Preserving Spatial Resolution:** Helps retain fine details that may be lost during encoding. -- **Preventing Information Loss:** Transfers important features from the encoder to the decoder. -- **Improving Performance:** Combines high-level and low-level features for better reconstruction. -- **Mitigating Vanishing Gradient:** Eases training by allowing gradient flow through deeper layers. +#### **Explanation:** +Skip connections link encoder and decoder layers, improving the model in several ways: +- **Preserving Spatial Resolution:** Helps retain fine details that may be lost during encoding. +- **Preventing Information Loss:** Transfers important features from the encoder to the decoder. +- **Improving Performance:** Combines high-level and low-level features for better reconstruction. +- **Mitigating Vanishing Gradient:** Eases training by allowing gradient flow through deeper layers. ---- +--- + +### **Discriminator** +In the cGAN architecture, we use a **PatchGAN** discriminator instead of a traditional binary classifier. + +#### **PatchGAN Overview:** +- Instead of classifying the entire image as real or fake, PatchGAN classifies **N × N patches** of the image. +- The size **N** depends on the number of convolutional layers in the network: + +| Layers | Patch Size | +|--------|------------| +| 1 | 16×16 | +| 2 | 34×34 | +| 3 | 70×70 | +| 4 | 142×142 | +| 5 | 286×286 | +| 6 | 574×574 | + +For this project, we use a **70×70 PatchGAN**. + + + +### **Question:** +How many learnable parameters does this neural network have? + +1. **conv1:** + - Input channels: 6 + - Output channels: 64 + - Kernel size: 4×4 + - Parameters in conv1 = (4×4×6+1(bias))×64 = **6208** + +2. **conv2:** + - Weights: 4 × 4 × 64 × 128 = **131072** + - Biases: **128** + - BatchNorm: (scale + shift) for 128 channels: 2 × 128 = **256** + - Parameters in conv2: **131072 + 128 + 256 = 131456** + +3. **conv3:** + - Weights: 4 × 4 × 128 × 256 = **524288** + - Biases: **256** + - BatchNorm: (scale + shift) for 256 channels: 2 × 256 = **512** + - Parameters in conv3: **524288 + 256 + 512 = 525056** + +4. **conv4:** + - Weights: 4 × 4 × 256 × 512 = **2097152** + - Biases: **512** + - BatchNorm: (scale + shift) for 512 channels: 2 × 512 = **1024** + - Parameters in conv4: **2097152 + 512 + 1024 = 2098688** + +5. **out:** + - Weights: 4 × 4 × 512 × 1 = **8192** + - Biases: **1** + - Parameters in out: **8192 + 1 = 8193** + +**Total Learnable Parameters:** + +**6208 + 131456 + 525056 + 2098688 + 8193 = 2,769,601** + +--- + +### **Results Comparison: 100 vs. 200 Epochs** + +#### **1. Training Performance** +- **100 Epochs:** + - The generator produces images that resemble the target facades. + - Some fine details may be missing, and slight noise is present. +- **200 Epochs:** + - The generated facades have more details and refined structures. + - Improved high-frequency details make outputs closer to target images. + - Less noise, but minor artifacts may still exist. + +#### **2. Evaluation Performance** +- **100 Epochs:** + - Struggles with realistic facades on unseen masks. + - Noticeable noise, but sometimes less than the 200-epoch model. + - Some structures exist but lack consistency. +- **200 Epochs:** + - Overfits to training data, leading to poor generalization. + - Instead of realistic facades, it reuses training patches, causing noisy outputs. -### **Discriminator** -In the cGAN architecture, we use a **PatchGAN** discriminator instead of a traditional binary classifier. - -#### **PatchGAN Overview:** -- Instead of classifying the entire image as real or fake, PatchGAN classifies **N × N patches** of the image. -- The size **N** depends on the number of convolutional layers in the network: - -| Layers | Patch Size | -|--------|------------| -| 1 | 16×16 | -| 2 | 34×34 | -| 3 | 70×70 | -| 4 | 142×142 | -| 5 | 286×286 | -| 6 | 574×574 | - -For this project, we use a **70×70 PatchGAN**. - - - -question : how many learnable parameters this neural network has ?: - -1. conv1: -- Input channels: 6 -- Output channels: 64 -- Kernel size: 4*4 -- Parameters in conv1 = (4×4×6+1(bais))×64=6208 - -2. conv2: -- Weights: 4 × 4 × 64 × 128 = 131072 -- Biases: 128 -- BatchNorm: (scale + shift) for 128 channels: 2 × 128 = 256 -- Parameters in conv2: 131072 + 128 + 256 = 131456 - -3. conv3: -- Weights: 4 × 4 × 128 × 256 = 524288 -- Biases: 256 -- BatchNorm: (scale + shift) for 256 channels: 2 × 256 = 512 -- Parameters in conv3: 524288 + 256 + 512= 525056 - -4. conv4: -- Weights: 4 × 4 × 256 × 512 = 2097152 -- Biases: 512 -- BatchNorm: (scale + shift) for 512 channels: 2 × 512 = 1024 -- Parameters in conv4: 2097152+512+1,024=2098688 - -5. out: -- Weights: 4 × 4 × 512 × 1 = 8192 -- Biases: 1 -- Parameters in out: 8192 + 1=8193 - -**Total Learnable Parameters** - -**6,208 + 131,456 + 525,056 + 2,098,688 + 8,193 = 2,769,601** - -### **Results Comparison: 100 vs. 200 Epochs** - -#### **1. Training Performance** -- **100 Epochs:** - - The generator produces images that resemble the target facades. - - Some fine details may be missing, and slight noise is present. -- **200 Epochs:** - - The generated facades have more details and refined structures. - - Improved high-frequency details make outputs closer to target images. - - Less noise, but minor artifacts may still exist. - -#### **2. Evaluation Performance** -- **100 Epochs:** - - Struggles with realistic facades on unseen masks. - - Noticeable noise, but sometimes less than the 200-epoch model. - - Some structures exist but lack consistency. -- **200 Epochs:** - - Overfits to training data, leading to poor generalization. - - Instead of realistic facades, it reuses training patches, causing noisy outputs. - -#### **Conclusion & Observations** -- **Improved Detail at 200 Epochs:** Better training mask generation. -- **Overfitting Issue:** Generalization is poor beyond 100 epochs. -- **Limited Dataset Size (378 Images):** Restricts model’s diversity and quality. - -#### Example image of training set at 100 and 200 epochs: - - -#### Example images of evaluation set at 100 and 200 epochs: +#### **Conclusion & Observations** +- **Improved Detail at 200 Epochs:** Better training mask generation. +- **Overfitting Issue:** Generalization is poor beyond 100 epochs. +- **Limited Dataset Size (378 Images):** Restricts model’s diversity and quality. + +#### **Example image of training set at 100 and 200 epochs:** + + +#### **Example images of evaluation set at 100 and 200 epochs:**  ## Part 3: Diffusion Models -- GitLab