From 05f5366a44f3d9a909586035e3e00a429b9a2427 Mon Sep 17 00:00:00 2001 From: Delorme Antonin <antonin.delorme@etu.ec-lyon.fr> Date: Fri, 10 Nov 2023 19:38:05 +0000 Subject: [PATCH] Delete ReadMe.md --- ReadMe.md | 121 ------------------------------------------------------ 1 file changed, 121 deletions(-) delete mode 100644 ReadMe.md diff --git a/ReadMe.md b/ReadMe.md deleted file mode 100644 index 6889b26..0000000 --- a/ReadMe.md +++ /dev/null @@ -1,121 +0,0 @@ -# Image classification - -Legrand Frédéric - Project for the Deep Learning Course at ECL - -# Usage - -To download and run the project, open a terminal in the desired folder and run these commands, they will: -- Download the project -- Enter the project folder -- Add the data folder in the project folder -- Install the necessary dependancies -- Enter test folder -- Test that the code is functionnal with pytest (All 8 tests should run without issues) - -``` -git clone https://gitlab.ec-lyon.fr/flegrand/image-classification.git -cd image-classification -pip install -r requirements.txt -cd tests -pytest -``` - -If all tests are succesful, the project has all the necessary dependancies and everything should run normally. - -# Description - -This section of the readme goes through all the code and explains it. - -## Prepare the CIFAR dataset - -All the code for this section is in the read_cifar.py file. -Unit tests have been created in tests/test_read_cifar.py -Running pytest in tests folder will provide proof that algorithms work as expected. - -### 1- - -Data folder contains the cifar-10-batches-py folder with the relevant data for the image classification. -We focus on the cifar image dataset. - -### 2- - -read_cifar_batch takes in a file path and returns the corresponding data and labels -We import the data as float32 values to be able to use numpy. -Since values are between 0 and 256 for the data and between 0-10 for the labels we can import the labels as int64. - -```rb -def read_cifar_batch(filePath: str): - with open(filePath, 'rb') as fo: - dict = pickle.load(fo, encoding='bytes') - return (np.array(dict[b'data']).astype('float64'), np.array(dict[b'labels']).astype('int64')) -``` - -### 3- - -read_cifar takes in a folder path and returns the corresponding data and labels by making use of the previous read_cifar_batch function. - -```rb -def read_cifar(folderPath: str): - data, labels = read_cifar_batch(folderPath+"/test_batch") - for i in range(1,6): - tempData, tempLabels = read_cifar_batch(folderPath+"/data_batch_"+str(i)) - labels = np.concatenate((labels,tempLabels), axis=0) - data = np.concatenate((data, tempData), axis=0) - return data, labels -``` - -### 4- - -split_dataset takes in some numpy array that represents the data, some corresponding labels and a split pourcentage. - -The function finds a random permutation of these values with np.random.shuffle. -We then mix both data and labels in the same way. - -We return the first elements so that we reach the split pourcentage we wanted. - -```rb -def split_dataset(data: np.array, labels: list, split: float): - if not 0 < split < 1: - raise ValueError('Split is not a float between 0 and 1') - - # Generate random indices for shuffling the data - num_samples = len(labels) - indices = np.arange(num_samples) - np.random.shuffle(indices) - - # Shuffle the data and labels using the random indices - shuffled_data = data[indices] - shuffled_labels = np.array(labels)[indices] - - # Calculate the split index based on the given split ratio - split_index = int(split * num_samples) - - # Split the data and labels into training and testing sets - data_train, data_test = shuffled_data[:split_index], shuffled_data[split_index:] - labels_train, labels_test = shuffled_labels[:split_index], shuffled_labels[split_index:] - - return data_train, data_test, labels_train, labels_test -``` - -## k-nearest neighbors - -All the code for this section is in the knn.py file. -Functions have unit tests in the tests folder -Running pytest in tests folder will provide proof that algorithms work as expected. - -### 1- - -distance_matrix takes in two matrices which represents pictures. -It returns the L2 Euclidian distance of each image in the first set to each image in the second set. - -Because of this the result is a numpy array of size: number_of_items_in_the_first_set * number_of_items_in_the_second_set - -```rb -def distance_matrix(matrix1 : np.array, matrix2 : np.array): - sum_of_squares_matrix1 = np.sum(np.square(matrix1), axis=1, keepdims=True) - sum_of_squares_matrix2 = np.sum(np.square(matrix2), axis=1, keepdims=True) - - dot_product = np.dot(matrix1, matrix2.T) - - dists = np.sqrt(sum_of_squares_matrix1 + sum_of_squares_matrix2.T - 2 * dot_product) - return dists -- GitLab