From f787c29c20af0894cac1104af39c0a923e9f7f9c Mon Sep 17 00:00:00 2001
From: Benyahia Mohammed Oussama <mohammed.benyahia@etu.ec-lyon.fr>
Date: Tue, 1 Apr 2025 22:02:37 +0000
Subject: [PATCH] Edit README.md

---
 README.md | 84 +++++++++++++++++++++++++++++--------------------------
 1 file changed, 45 insertions(+), 39 deletions(-)

diff --git a/README.md b/README.md
index 4a8ae90..7d4476c 100644
--- a/README.md
+++ b/README.md
@@ -200,71 +200,77 @@ How many learnable parameters does this neural network have?
 
 ## Part 3: Diffusion Models
 
-Diffusion models are a fascinating category of generative models that focus on iteratively transforming random noise into realistic data. The reverse diffusion process starts from noisy data, and with the help of a trained neural network, it gradually denoises the data, ultimately generating high-quality, detailed images. These models have been gaining popularity due to their ability to surpass GANs in generating diverse and high-quality images.
+Diffusion models represent a cutting-edge approach in generative modeling, focusing on the transformation of random noise into realistic data through an iterative denoising process. The reverse diffusion process begins with a noisy image and gradually refines it into a high-quality image using a trained neural network. These models have gained significant attention due to their ability to outperform GANs in generating diverse and high-fidelity images.
 
 ### Overview of Diffusion Models
-In this project, we focus on **DDPMs** (Denoising Diffusion Probabilistic Models), which are widely used for predicting image noise. The core idea is to progressively apply noise over several timesteps, then train a neural network to reverse this process. By doing so, the model learns to predict and remove the noise of the image, gradually denoising it.
+
+This project explores **Denoising Diffusion Probabilistic Models (DDPMs)**, which are widely used for image noise prediction. The fundamental idea is to progressively apply noise over multiple timesteps and then train a neural network to reverse this process. The model learns to predict and remove noise at each timestep, effectively reconstructing a clean image.
 
 ### The Diffusion Process
-- **Forward Diffusion Process**: Starting with a real image, noise is gradually added at each timestep. The amount of noise increases with each step, leading to a more noisy image as the timesteps progress. At the maximum timestep, the image is essentially pure noise.
-  
-- **Reverse Diffusion Process**: In this step, a neural network is trained to reverse the noise process by predicting the noise at each timestep. This allows the model to denoise images step-by-step, ultimately generating a clean image from random noise.
+
+- **Forward Diffusion Process**: Noise is incrementally added to an image over a series of timesteps. As the number of steps increases, the image becomes progressively noisier until it reaches a state of pure noise.
+- **Reverse Diffusion Process**: A neural network is trained to undo the noise addition process by predicting and removing noise step by step, ultimately recovering the original image.
 
 ### Noise Scheduler
-To control the diffusion process, we create a **noise scheduler**. This scheduler defines how much noise is added to the image at each timestep. The model is trained on the MNIST dataset, which was also used in Part 1 of this project.
 
-### Architecture for Diffusion Model: 
-**UNet2DModel (Diffusion Model)**  
-This UNet is designed for denoising diffusion probabilistic models (DDPMs), which progressively remove noise from images. The key differences include:
+To regulate the diffusion process, we use a **noise scheduler**, which defines the amount of noise added at each timestep. The model is trained on the MNIST dataset, as introduced in Part 1 of this project.
+
+### Architecture for Diffusion Model: **UNet2DModel (Diffusion Model)**
 
-- **Time Conditioning**: The time_proj and time_embedding modules encode timesteps, which are crucial for diffusion models to learn the progressive denoising process.
-- **ResNet Blocks Instead of Simple Conv Layers**: Each downsampling and upsampling step includes ResNetBlock2D, which has GroupNorm + SiLU (Swish) activation, making it more robust than standard convolution layers.
-- **SiLU (Swish) Activation**: Used instead of LeakyReLU/ReLU, offering smoother gradients.
-- **GroupNorm Instead of BatchNorm**: More stable for diffusion-based models.
+The UNet2DModel is specifically designed for diffusion-based denoising tasks. Key architectural features include:
+
+- **Time Conditioning**: The `time_proj` and `time_embedding` modules encode timestep information, crucial for learning the progressive denoising process.
+- **ResNet Blocks**: Instead of simple convolution layers, each downsampling and upsampling step integrates `ResNetBlock2D` with **GroupNorm** and **SiLU (Swish) activation**, enhancing robustness.
+- **SiLU Activation**: Chosen over LeakyReLU or ReLU for smoother gradients.
+- **GroupNorm**: Provides more stable training compared to BatchNorm in diffusion models.
 
 ### Training the Model
-We will train the diffusion model on the MNIST dataset using the **diffusers** library, which provides tools for training and using diffusion models. We will compare the results of training for different epochs and assess the quality of the generated images.
 
-**Bonus**: Try also training **UNet** by inputting the timestep embedding with the noised image.  
-This UNet is used for predicting the noise of an image by using the noised image and its corresponding timestep, typically for image-to-image translation tasks. Key characteristics are:
+We train the diffusion model using the **diffusers** library on the MNIST dataset. Performance is assessed across different epochs to analyze the quality of generated images.
 
-- **Encoder-Decoder Structure**: Uses downsampling (down1 to down7) with Conv2D + BatchNorm + LeakyReLU layers and upsampling (up7 to up1) with ConvTranspose2D + BatchNorm + ReLU.
-- **Skip Connections**: Each downsampling layer has a corresponding upsampling layer that concatenates feature maps (e.g., up6 receives outputs from down6).
-- **Dropout in Some Layers**: Helps regularize training.
-- **LeakyReLU Activation in Downsampling**: Helps with learning stable representations.
+**Bonus**: We also evaluate a standard **UNet Model** by inputting the timestep embedding along with the noisy image. This UNet follows an encoder-decoder structure and is typically used for image-to-image translation.
 
-### Comparison of the UNet Architecture and UNet2DModel (Diffusion Model)
+### UNet Model Architecture
 
-| Feature                     | UNet Model                                    | Diffusion UNet (DDPM)                             |
-|-----------------------------|-----------------------------------------------|--------------------------------------------------|
-| **Task**                    | Image-to-image translation                    | Image denoising (diffusion)                      |
-| **Downsampling**             | Strided Conv2D + BatchNorm + LeakyReLU       | ResNet Blocks + GroupNorm + SiLU                 |
-| **Upsampling**               | Transpose Conv2D                             | Interpolation + Conv2D                           |
-| **Activation**               | LeakyReLU (down), ReLU (up), Tanh (output)    | SiLU (Swish)                                     |
-| **Normalization**            | BatchNorm                                     | GroupNorm                                        |
-| **Skip Connections**         | Yes                                           | Yes                                              |
-| **Time Embedding**           | No                                            | Yes                                              |
+The UNet Model, in contrast to the diffusion model, aims to predict and remove noise in a single step. Key features include:
 
-### Results
+- **Encoder-Decoder Structure**: Employs Conv2D + BatchNorm + LeakyReLU for downsampling and ConvTranspose2D + BatchNorm + ReLU for upsampling.
+- **Skip Connections**: Each downsampling layer has a corresponding upsampling layer that concatenates feature maps.
+- **Dropout Layers**: Enhance regularization to prevent overfitting.
+- **LeakyReLU Activation**: Improves feature extraction in the downsampling process.
 
-Here are the visual results from:
+### Comparison: UNet Model vs. Diffusion UNet (DDPM)
 
-**Diffusion U-Net2D**  
+| Feature                 | UNet Model                                 | Diffusion UNet (DDPM)            |
+| ----------------------- | ------------------------------------------ | -------------------------------- |
+| **Primary Task**        | Image-to-image translation                 | Image denoising via diffusion    |
+| **Downsampling**        | Strided Conv2D + BatchNorm + LeakyReLU     | ResNet Blocks + GroupNorm + SiLU |
+| **Upsampling**          | Transpose Conv2D                           | Interpolation + Conv2D           |
+| **Activation Function** | LeakyReLU (down), ReLU (up), Tanh (output) | SiLU (Swish)                     |
+| **Normalization**       | BatchNorm                                  | GroupNorm                        |
+| **Skip Connections**    | Yes                                        | Yes                              |
+| **Time Embedding**      | No                                         | Yes                              |
+
+### Results
+
+**Diffusion UNet2D Model**
 ![Diffusion U-Net Example result](images/diffuse_denoise_mnist.png)
 
-**UNet Model**  
+
+**UNet Model**
 ![UNet Example result](images/unet_denoise_mnist.png)
 
-### Comparison of Noise Prediction Models for Image Denoising
-This section compares the UNet Diffusion model and the UNet Model in terms of their performance for image denoising. Both models leverage the UNet architecture, but they differ significantly in their approach.
 
-**UNet Diffusion Model**: This model operates iteratively, gradually denoising the image over multiple steps. While it provides high-quality results, it is computationally expensive due to the repeated noise addition and removal process, making it slower for real-time applications.
+### Analysis: Noise Prediction and Image Denoising Performance
+
+Both models leverage the UNet architecture but employ different strategies for noise removal:
 
-**UNet Model**: In contrast, the UNet Model operates in a single step, using standard image-to-image translation techniques to generate denoised images. This allows for faster inference, making it more suitable for real-time applications. However, the UNet Model struggles to predict the noise effectively, which leads to incomplete denoising. As a result, the denoised images are often not fully cleaned and may still contain visible noise, resulting in unrecognizable content.
+- **Diffusion UNet2D Model**: Works iteratively, progressively removing noise in multiple steps. This allows it to generate high-quality images but at the cost of increased computational complexity.
+- **UNet Model**: Predicts and removes noise in a single step, making it significantly faster. However, it struggles to accurately predict the noise pattern, leading to incomplete denoising and residual artifacts in the final output.
 
 ### Conclusion
 
-The **UNet2DModel (Diffusion Model)** excels in denoising quality due to its iterative process, which allows for more accurate and efficient noise removal. However, it is computationally expensive and less suited for real-time applications. On the other hand, the **UNet Model** is more efficient in terms of speed, making it suitable for time-sensitive tasks, but it fails to effectively predict and remove noise. This leads to suboptimal denoising, where the images are not adequately cleaned, and the content remains partially distorted. This explains the difference in results: the **UNet Diffusion model** produces cleaner, artifact-free images, while the **UNet Model** struggles with noise removal, leaving visible artifacts in the output.
+The **UNet2DModel (Diffusion Model)** provides superior denoising quality due to its iterative refinement process, making it ideal for high-quality image generation. However, its computational cost is high, limiting its applicability in real-time scenarios. On the other hand, the **UNet Model** is computationally efficient, offering faster inference, but its denoising performance is subpar, resulting in images where numbers become unrecognizable due to residual noise.
 
 
 ## Part 4: What About Those Beautiful Images?
-- 
GitLab