Skip to content
Snippets Groups Projects
Commit 05f5366a authored by Delorme Antonin's avatar Delorme Antonin
Browse files

Delete ReadMe.md

parent dd15a89c
Branches
No related tags found
No related merge requests found
# Image classification
Legrand Frédéric - Project for the Deep Learning Course at ECL
# Usage
To download and run the project, open a terminal in the desired folder and run these commands, they will:
- Download the project
- Enter the project folder
- Add the data folder in the project folder
- Install the necessary dependancies
- Enter test folder
- Test that the code is functionnal with pytest (All 8 tests should run without issues)
```
git clone https://gitlab.ec-lyon.fr/flegrand/image-classification.git
cd image-classification
pip install -r requirements.txt
cd tests
pytest
```
If all tests are succesful, the project has all the necessary dependancies and everything should run normally.
# Description
This section of the readme goes through all the code and explains it.
## Prepare the CIFAR dataset
All the code for this section is in the read_cifar.py file.
Unit tests have been created in tests/test_read_cifar.py
Running pytest in tests folder will provide proof that algorithms work as expected.
### 1-
Data folder contains the cifar-10-batches-py folder with the relevant data for the image classification.
We focus on the cifar image dataset.
### 2-
read_cifar_batch takes in a file path and returns the corresponding data and labels
We import the data as float32 values to be able to use numpy.
Since values are between 0 and 256 for the data and between 0-10 for the labels we can import the labels as int64.
```rb
def read_cifar_batch(filePath: str):
with open(filePath, 'rb') as fo:
dict = pickle.load(fo, encoding='bytes')
return (np.array(dict[b'data']).astype('float64'), np.array(dict[b'labels']).astype('int64'))
```
### 3-
read_cifar takes in a folder path and returns the corresponding data and labels by making use of the previous read_cifar_batch function.
```rb
def read_cifar(folderPath: str):
data, labels = read_cifar_batch(folderPath+"/test_batch")
for i in range(1,6):
tempData, tempLabels = read_cifar_batch(folderPath+"/data_batch_"+str(i))
labels = np.concatenate((labels,tempLabels), axis=0)
data = np.concatenate((data, tempData), axis=0)
return data, labels
```
### 4-
split_dataset takes in some numpy array that represents the data, some corresponding labels and a split pourcentage.
The function finds a random permutation of these values with np.random.shuffle.
We then mix both data and labels in the same way.
We return the first elements so that we reach the split pourcentage we wanted.
```rb
def split_dataset(data: np.array, labels: list, split: float):
if not 0 < split < 1:
raise ValueError('Split is not a float between 0 and 1')
# Generate random indices for shuffling the data
num_samples = len(labels)
indices = np.arange(num_samples)
np.random.shuffle(indices)
# Shuffle the data and labels using the random indices
shuffled_data = data[indices]
shuffled_labels = np.array(labels)[indices]
# Calculate the split index based on the given split ratio
split_index = int(split * num_samples)
# Split the data and labels into training and testing sets
data_train, data_test = shuffled_data[:split_index], shuffled_data[split_index:]
labels_train, labels_test = shuffled_labels[:split_index], shuffled_labels[split_index:]
return data_train, data_test, labels_train, labels_test
```
## k-nearest neighbors
All the code for this section is in the knn.py file.
Functions have unit tests in the tests folder
Running pytest in tests folder will provide proof that algorithms work as expected.
### 1-
distance_matrix takes in two matrices which represents pictures.
It returns the L2 Euclidian distance of each image in the first set to each image in the second set.
Because of this the result is a numpy array of size: number_of_items_in_the_first_set * number_of_items_in_the_second_set
```rb
def distance_matrix(matrix1 : np.array, matrix2 : np.array):
sum_of_squares_matrix1 = np.sum(np.square(matrix1), axis=1, keepdims=True)
sum_of_squares_matrix2 = np.sum(np.square(matrix2), axis=1, keepdims=True)
dot_product = np.dot(matrix1, matrix2.T)
dists = np.sqrt(sum_of_squares_matrix1 + sum_of_squares_matrix2.T - 2 * dot_product)
return dists
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment