From 27903bcbf1eb534cadffecffbe2b17634d2b1d7e Mon Sep 17 00:00:00 2001
From: Benyahia Mohammed Oussama <mohammed.benyahia@etu.ec-lyon.fr>
Date: Mon, 31 Mar 2025 12:20:05 +0000
Subject: [PATCH] Edit README.md

---
 README.md | 222 ++++++++++++++++++++++++++++--------------------------
 1 file changed, 114 insertions(+), 108 deletions(-)

diff --git a/README.md b/README.md
index c19e834..9b8fa61 100644
--- a/README.md
+++ b/README.md
@@ -71,125 +71,131 @@ To enhance image generation and reduce ambiguities between similar digits (e.g.,
 - [DCGAN Tutorial](https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html)  
 - [MNIST Dataset](https://pytorch.org/vision/stable/generated/torchvision.datasets.MNIST.html#torchvision.datasets.MNIST)  
 
-## Part 2: Conditional GAN (cGAN) with U-Net
+Here is the corrected version with only the necessary adjustments:  
 
-### **Generator**
-In the cGAN architecture, the generator chosen is a U-Net.
+---
 
-#### **U-Net Overview:**
-- A U-Net takes an image as input and outputs another image.
-- It consists of two main parts: an encoder and a decoder.
-  - The encoder reduces the image dimension to extract main features.
-  - The decoder reconstructs the image using these features.
-- Unlike a simple encoder-decoder model, U-Net has skip connections that link encoder layers to corresponding decoder layers. These allow the decoder to use both high-frequency and low-frequency information.
+## Part 2: Conditional GAN (cGAN) with U-Net  
 
-#### **Architecture & Implementation:**
-The encoder takes a colored picture (3 channels: RGB), processes it through a series of convolutional layers, and encodes the features. The decoder then reconstructs the image using transposed convolutional layers, utilizing skip connections to enhance details.
+### **Generator**  
+In the cGAN architecture, the generator chosen is a U-Net.  
 
-![architecture Unet](images/unet_architecture.png)
+#### **U-Net Overview:**  
+- A U-Net takes an image as input and outputs another image.  
+- It consists of two main parts: an encoder and a decoder.  
+  - The encoder reduces the image dimension to extract main features.  
+  - The decoder reconstructs the image using these features.  
+- Unlike a simple encoder-decoder model, U-Net has skip connections that link encoder layers to corresponding decoder layers. These allow the decoder to use both high-frequency and low-frequency information.  
 
+#### **Architecture & Implementation:**  
+The encoder takes a colored picture (3 channels: RGB), processes it through a series of convolutional layers, and encodes the features. The decoder then reconstructs the image using transposed convolutional layers, utilizing skip connections to enhance details.  
+
+![architecture Unet](images/unet_architecture.png)  
 
-### **Question:**
-Knowing that the input and output images have a shape of 256x256 with 3 channels, what will be the dimension of the feature map "x8"?
+### **Question:**  
+Knowing that the input and output images have a shape of 256x256 with 3 channels, what will be the dimension of the feature map "x8"?  
 
-**Answer:** The dimension of the feature map x8 is **[numBatch, 512, 1, 1]**.
+**Answer:** The dimension of the feature map x8 is **[numBatch, 512, 32, 32]**.  
 
-### **Question:**
-Why are skip connections important in the U-Net architecture?
+### **Question:**  
+Why are skip connections important in the U-Net architecture?  
 
-#### **Explanation:**
-Skip connections link encoder and decoder layers, improving the model in several ways:
-- **Preserving Spatial Resolution:** Helps retain fine details that may be lost during encoding.
-- **Preventing Information Loss:** Transfers important features from the encoder to the decoder.
-- **Improving Performance:** Combines high-level and low-level features for better reconstruction.
-- **Mitigating Vanishing Gradient:** Eases training by allowing gradient flow through deeper layers.
+#### **Explanation:**  
+Skip connections link encoder and decoder layers, improving the model in several ways:  
+- **Preserving Spatial Resolution:** Helps retain fine details that may be lost during encoding.  
+- **Preventing Information Loss:** Transfers important features from the encoder to the decoder.  
+- **Improving Performance:** Combines high-level and low-level features for better reconstruction.  
+- **Mitigating Vanishing Gradient:** Eases training by allowing gradient flow through deeper layers.  
 
----
+---  
+
+### **Discriminator**  
+In the cGAN architecture, we use a **PatchGAN** discriminator instead of a traditional binary classifier.  
+
+#### **PatchGAN Overview:**  
+- Instead of classifying the entire image as real or fake, PatchGAN classifies **N × N patches** of the image.  
+- The size **N** depends on the number of convolutional layers in the network:  
+
+| Layers | Patch Size |  
+|--------|------------|  
+| 1      | 16×16      |  
+| 2      | 34×34      |  
+| 3      | 70×70      |  
+| 4      | 142×142    |  
+| 5      | 286×286    |  
+| 6      | 574×574    |  
+
+For this project, we use a **70×70 PatchGAN**.  
+
+![patchGAN](images/patchGAN.png)  
+
+### **Question:**  
+How many learnable parameters does this neural network have?  
+
+1. **conv1:**  
+   - Input channels: 6  
+   - Output channels: 64  
+   - Kernel size: 4×4  
+   - Parameters in conv1 = (4×4×6+1(bias))×64 = **6208**  
+
+2. **conv2:**  
+   - Weights: 4 × 4 × 64 × 128 = **131072**  
+   - Biases: **128**  
+   - BatchNorm: (scale + shift) for 128 channels: 2 × 128 = **256**    
+   - Parameters in conv2: **131072 + 128 + 256 = 131456**  
+
+3. **conv3:**  
+   - Weights: 4 × 4 × 128 × 256 = **524288**  
+   - Biases: **256**  
+   - BatchNorm: (scale + shift) for 256 channels: 2 × 256 = **512**    
+   - Parameters in conv3: **524288 + 256 + 512 = 525056**  
+
+4. **conv4:**  
+   - Weights: 4 × 4 × 256 × 512 = **2097152**  
+   - Biases: **512**  
+   - BatchNorm: (scale + shift) for 512 channels: 2 × 512 = **1024**    
+   - Parameters in conv4: **2097152 + 512 + 1024 = 2098688**  
+
+5. **out:**  
+   - Weights: 4 × 4 × 512 × 1 = **8192**  
+   - Biases: **1**  
+   - Parameters in out: **8192 + 1 = 8193**  
+
+**Total Learnable Parameters:**  
+
+**6208 + 131456 + 525056 + 2098688 + 8193 = 2,769,601**  
+
+---  
+
+### **Results Comparison: 100 vs. 200 Epochs**  
+
+#### **1. Training Performance**  
+- **100 Epochs:**  
+  - The generator produces images that resemble the target facades.  
+  - Some fine details may be missing, and slight noise is present.  
+- **200 Epochs:**  
+  - The generated facades have more details and refined structures.  
+  - Improved high-frequency details make outputs closer to target images.  
+  - Less noise, but minor artifacts may still exist.  
+
+#### **2. Evaluation Performance**  
+- **100 Epochs:**  
+  - Struggles with realistic facades on unseen masks.  
+  - Noticeable noise, but sometimes less than the 200-epoch model.  
+  - Some structures exist but lack consistency.  
+- **200 Epochs:**  
+  - Overfits to training data, leading to poor generalization.  
+  - Instead of realistic facades, it reuses training patches, causing noisy outputs.  
 
-### **Discriminator**
-In the cGAN architecture, we use a **PatchGAN** discriminator instead of a traditional binary classifier.
-
-#### **PatchGAN Overview:**
-- Instead of classifying the entire image as real or fake, PatchGAN classifies **N × N patches** of the image.
-- The size **N** depends on the number of convolutional layers in the network:
-
-| Layers | Patch Size |
-|--------|------------|
-| 1      | 16×16      |
-| 2      | 34×34      |
-| 3      | 70×70      |
-| 4      | 142×142    |
-| 5      | 286×286    |
-| 6      | 574×574    |
-
-For this project, we use a **70×70 PatchGAN**.
-
-![patchGAN](images/patchGAN.png)
-
-question : how many learnable parameters this neural network has ?:
-
-1. conv1:
-- Input channels: 6
-- Output channels: 64
-- Kernel size: 4*4
-- Parameters in conv1 = (4×4×6+1(bais))×64=6208
-
-2. conv2:
-- Weights: 4 × 4 × 64 × 128 = 131072
-- Biases: 128
-- BatchNorm: (scale + shift) for 128 channels: 2 × 128 = 256  
-- Parameters in conv2: 131072 + 128 + 256 = 131456
-
-3. conv3:
-- Weights: 4 × 4 × 128 × 256 = 524288
-- Biases: 256
-- BatchNorm: (scale + shift) for 256 channels: 2 × 256 = 512  
-- Parameters in conv3: 524288 + 256 + 512= 525056
-
-4. conv4:
-- Weights: 4 × 4 × 256 × 512 = 2097152
-- Biases: 512
-- BatchNorm: (scale + shift) for 512 channels: 2 × 512 = 1024  
-- Parameters in conv4: 2097152+512+1,024=2098688
-
-5. out:
-- Weights: 4 × 4 × 512 × 1 = 8192
-- Biases: 1
-- Parameters in out: 8192 + 1=8193
-
-**Total Learnable Parameters**
-
-**6,208 + 131,456 + 525,056 + 2,098,688 + 8,193 = 2,769,601**
-
-### **Results Comparison: 100 vs. 200 Epochs**
-
-#### **1. Training Performance**
-- **100 Epochs:**
-  - The generator produces images that resemble the target facades.
-  - Some fine details may be missing, and slight noise is present.
-- **200 Epochs:**
-  - The generated facades have more details and refined structures.
-  - Improved high-frequency details make outputs closer to target images.
-  - Less noise, but minor artifacts may still exist.
-
-#### **2. Evaluation Performance**
-- **100 Epochs:**
-  - Struggles with realistic facades on unseen masks.
-  - Noticeable noise, but sometimes less than the 200-epoch model.
-  - Some structures exist but lack consistency.
-- **200 Epochs:**
-  - Overfits to training data, leading to poor generalization.
-  - Instead of realistic facades, it reuses training patches, causing noisy outputs.
-
-#### **Conclusion & Observations**
-- **Improved Detail at 200 Epochs:** Better training mask generation.
-- **Overfitting Issue:** Generalization is poor beyond 100 epochs.
-- **Limited Dataset Size (378 Images):** Restricts model’s diversity and quality.
-
-#### Example image of training set at 100 and 200 epochs:
-![Example image for training set at 100 and 200 epochs](images/facades_trainingset_100_200_.png)
-
-#### Example images of evaluation set at 100 and 200 epochs:
+#### **Conclusion & Observations**  
+- **Improved Detail at 200 Epochs:** Better training mask generation.  
+- **Overfitting Issue:** Generalization is poor beyond 100 epochs.  
+- **Limited Dataset Size (378 Images):** Restricts model’s diversity and quality.  
+
+#### **Example image of training set at 100 and 200 epochs:**  
+![Example image for training set at 100 and 200 epochs](images/facades_trainingset_100_200_.png)  
+
+#### **Example images of evaluation set at 100 and 200 epochs:**  
 ![Example images of evaluation set at 100 and 200 epochs](images/facades_valset_100_200_.png)
 
 ## Part 3: Diffusion Models
-- 
GitLab