@@ -203,43 +203,39 @@ How many learnable parameters does this neural network have?
...
@@ -203,43 +203,39 @@ How many learnable parameters does this neural network have?
Diffusion models are a fascinating category of generative models that focus on iteratively transforming random noise into realistic data. The reverse diffusion process starts from noisy data, and with the help of a trained neural network, it gradually denoises the data, ultimately generating high-quality, detailed images. These models have been gaining popularity due to their ability to surpass GANs in generating diverse and high-quality images.
Diffusion models are a fascinating category of generative models that focus on iteratively transforming random noise into realistic data. The reverse diffusion process starts from noisy data, and with the help of a trained neural network, it gradually denoises the data, ultimately generating high-quality, detailed images. These models have been gaining popularity due to their ability to surpass GANs in generating diverse and high-quality images.
### Overview of Diffusion Models
### Overview of Diffusion Models
In the context of this project, we will focus on **DDPMs** (Denoising Diffusion Probabilistic Models), which are widely used for predicting images noise. The key idea is to apply noise progressively over several timesteps, and then train a neural network to reverse this process. By doing so, the model learns to pridict the noise of the image to desoise it.
In this project, we focus on **DDPMs** (Denoising Diffusion Probabilistic Models), which are widely used for predicting image noise. The core idea is to progressively apply noise over several timesteps, then train a neural network to reverse this process. By doing so, the model learns to predict and remove the noise of the image, gradually denoising it.
### The Diffusion Process
### The Diffusion Process
-**Forward Diffusion Process**: Starting with a real image, noise is gradually added to the image at each timestep. The amount of noise increases with each step, leading to a more noisy image as the timesteps increase. At the maximum timestep, the image is essentially pure noise.
-**Forward Diffusion Process**: Starting with a real image, noise is gradually added at each timestep. The amount of noise increases with each step, leading to a more noisy image as the timesteps progress. At the maximum timestep, the image is essentially pure noise.
-**Reverse Diffusion Process**: In this step, a neural network is trained to reverse the noise process, effectively predicting the noise at each timestep. This allows the model to denoise images step-by-step, eventually generating a clean image from random noise.
-**Reverse Diffusion Process**: In this step, a neural network is trained to reverse the noise process by predicting the noise at each timestep. This allows the model to denoise images step-by-step, ultimately generating a clean image from random noise.
### Noise Scheduler
### Noise Scheduler
To control the diffusion process, we will create a **noise scheduler**. This scheduler defines how much noise is added to the image at each timestep. We will train the model on the MNIST dataset, which is also used in Part 1 of this project.
To control the diffusion process, we create a **noise scheduler**. This scheduler defines how much noise is added to the image at each timestep. The model is trained on the MNIST dataset, which was also used in Part 1 of this project.
### Architecture for Diffusion Model:
### Architecture for Diffusion Model:
UNet2DModel (Diffusion Model)
**UNet2DModel (Diffusion Model)**
This UNet is designed for denoising diffusion probabilistic models (DDPMs), which progressively remove noise from images. Differences include:
This UNet is designed for denoising diffusion probabilistic models (DDPMs), which progressively remove noise from images. The key differences include:
Time Conditioning: The time_proj and time_embedding modules encode timesteps, which are crucial for diffusion models to learn the progressive denoising process.
-**Time Conditioning**: The time_proj and time_embedding modules encode timesteps, which are crucial for diffusion models to learn the progressive denoising process.
ResNet Blocks Instead of Simple Conv Layers: Each downsampling and upsampling step includes ResnetBlock2D, which has GroupNorm + SiLU (Swish) activation, making it more robust than standard convolution layers.
-**ResNet Blocks Instead of Simple Conv Layers**: Each downsampling and upsampling step includes ResNetBlock2D, which has GroupNorm + SiLU (Swish) activation, making it more robust than standard convolution layers.
SiLU (Swish) Activation: Used instead of LeakyReLU/ReLU, offering smooth gradients.
-**SiLU (Swish) Activation**: Used instead of LeakyReLU/ReLU, offering smoother gradients.
GroupNorm Instead of BatchNorm: More stable for diffusion-based models.
-**GroupNorm Instead of BatchNorm**: More stable for diffusion-based models.
#### **PatchGAN Discriminator for Diffusion Model**
In contrast to traditional GANs, the PatchGAN discriminator works by classifying patches of the image rather than the whole image at once. This allows the model to focus on local details, leading to more precise image generation.
### Training the Model
### Training the Model
We will train the diffusion model on the MNIST dataset using the **diffusers** library, which provides tools for training and using diffusion models. We will compare the results of training for different epochs and assess the quality of the generated images.
We will train the diffusion model on the MNIST dataset using the **diffusers** library, which provides tools for training and using diffusion models. We will compare the results of training for different epochs and assess the quality of the generated images.
bonus : try to train also unet :
UNet in cGAN Generator
**Bonus**: Try also training **UNet** by inputting the timestep embedding with the noised image.
This UNet is used as the generator in a Conditional GAN (cGAN), typically for image-to-image translation tasks. The key characteristics are:
This UNet is used for predicting the noise of an image by using the noised image and its corresponding timestep, typically for image-to-image translation tasks. Key characteristics are:
Encoder-Decoder Structure: Uses downsampling (down1 to down7) with Conv2D + BatchNorm + LeakyReLU layers and upsampling (up7 to up1) with ConvTranspose2D + BatchNorm + ReLU.
-**Encoder-Decoder Structure**: Uses downsampling (down1 to down7) with Conv2D + BatchNorm + LeakyReLU layers and upsampling (up7 to up1) with ConvTranspose2D + BatchNorm + ReLU.
Skip Connections: Each downsampling layer has a corresponding upsampling layer that concatenates feature maps (e.g., up6 receives outputs from down6).
-**Skip Connections**: Each downsampling layer has a corresponding upsampling layer that concatenates feature maps (e.g., up6 receives outputs from down6).
Dropout in Some Layers: Helps regularize training.
-**Dropout in Some Layers**: Helps regularize training.
LeakyReLU Activation in Downsampling: Helps with learning stable representations.
-**LeakyReLU Activation in Downsampling**: Helps with learning stable representations.
No Explicit Time Embedding: Since it’s not designed for diffusion models, it doesn’t incorporate timestep embeddings.
### Comparison of the UNet Architecture and UNet2DModel (Diffusion Model)
### Comparison of the UNet Architecture for cGAN and UNet2DModel (Diffusion Model)
@@ -251,19 +247,27 @@ for compairing wuth the deffussion :
...
@@ -251,19 +247,27 @@ for compairing wuth the deffussion :
### Results
### Results
Here a visual results from:
Here are the visual results from:
**Diffusion U-Net2D**
**Diffusion U-Net2D**


**Conditional U-Net (cGAN)**
**UNet Model**


### Comparison of Noise Prediction Models for Image Denoising
This section compares the UNet Diffusion model and the UNet Model in terms of their performance for image denoising. Both models leverage the UNet architecture, but they differ significantly in their approach.
**UNet Diffusion Model**: This model operates iteratively, gradually denoising the image over multiple steps. While it provides high-quality results, it is computationally expensive due to the repeated noise addition and removal process, making it slower for real-time applications.
**UNet Model**: In contrast, the UNet Model operates in a single step, using standard image-to-image translation techniques to generate denoised images. This allows for faster inference, making it more suitable for real-time applications. However, the UNet Model struggles to predict the noise effectively, which leads to incomplete denoising. As a result, the denoised images are often not fully cleaned and may still contain visible noise, resulting in unrecognizable content.
The **UNet2DModel (Diffusion Model)** excels in denoising quality due to its iterative process, which allows for more accurate and efficient noise removal. However, it is computationally expensive and less suited for real-time applications. On the other hand, the **UNet Model** is more efficient in terms of speed, making it suitable for time-sensitive tasks, but it fails to effectively predict and remove noise. This leads to suboptimal denoising, where the images are not adequately cleaned, and the content remains partially distorted. This explains the difference in results: the **UNet Diffusion model** produces cleaner, artifact-free images, while the **UNet Model** struggles with noise removal, leaving visible artifacts in the output.
### Conclusion
### Conclusion
In this section, we have outlined the architecture and training process for a Diffusion model using a U-Net. This model is trained to perform image denoising, progressively refining noisy images into clean ones. We compared it with the U-Net used in cGANs, highlighting the key differences and how they are tailored to their respective tasks.
In this section, we outlined the architecture and training process for a diffusion model using the UNet2DModel (Diffusion Model). This model is trained to perform image denoising, progressively refining noisy images into clean ones. We compared it with the UNet, highlighting the key differences and how they are tailored to their respective tasks.
By leveraging diffusion models, we aim to generate highly detailed and diverse images that surpass traditional GANs, especially in the context of noisy data and image-to-image tasks. We will continue training the model and assess its performance in the following steps.