Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
I
IA_Image classsification_TD1
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Deploy
Releases
Package registry
Model registry
Operate
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Delorme Antonin
IA_Image classsification_TD1
Commits
05f5366a
Commit
05f5366a
authored
1 year ago
by
Delorme Antonin
Browse files
Options
Downloads
Patches
Plain Diff
Delete ReadMe.md
parent
dd15a89c
Branches
Branches containing commit
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
ReadMe.md
+0
-121
0 additions, 121 deletions
ReadMe.md
with
0 additions
and
121 deletions
ReadMe.md
deleted
100644 → 0
+
0
−
121
View file @
dd15a89c
# Image classification
Legrand Frédéric - Project for the Deep Learning Course at ECL
# Usage
To download and run the project, open a terminal in the desired folder and run these commands, they will:
-
Download the project
-
Enter the project folder
-
Add the data folder in the project folder
-
Install the necessary dependancies
-
Enter test folder
-
Test that the code is functionnal with pytest (All 8 tests should run without issues)
```
git clone https://gitlab.ec-lyon.fr/flegrand/image-classification.git
cd image-classification
pip install -r requirements.txt
cd tests
pytest
```
If all tests are succesful, the project has all the necessary dependancies and everything should run normally.
# Description
This section of the readme goes through all the code and explains it.
## Prepare the CIFAR dataset
All the code for this section is in the read_cifar.py file.
Unit tests have been created in tests/test_read_cifar.py
Running pytest in tests folder will provide proof that algorithms work as expected.
### 1-
Data folder contains the cifar-10-batches-py folder with the relevant data for the image classification.
We focus on the cifar image dataset.
### 2-
read_cifar_batch takes in a file path and returns the corresponding data and labels
We import the data as float32 values to be able to use numpy.
Since values are between 0 and 256 for the data and between 0-10 for the labels we can import the labels as int64.
```
rb
def
read_cifar_batch
(
filePath:
str
):
with
open
(
filePath
,
'rb'
)
as
fo:
dict
=
pickle
.
load
(
fo
,
encoding
=
'bytes'
)
return
(
np
.
array
(
dict
[
b
'data'
]).
astype
(
'float64'
),
np
.
array
(
dict
[
b
'labels'
]).
astype
(
'int64'
))
```
### 3-
read_cifar takes in a folder path and returns the corresponding data and labels by making use of the previous read_cifar_batch function.
```
rb
def
read_cifar
(
folderPath:
str
):
data
,
labels
=
read_cifar_batch
(
folderPath
+
"/test_batch"
)
for
i
in
range
(
1
,
6
):
tempData
,
tempLabels
=
read_cifar_batch
(
folderPath
+
"/data_batch_"
+
str
(
i
))
labels
=
np
.
concatenate
((
labels
,
tempLabels
),
axis
=
0
)
data
=
np
.
concatenate
((
data
,
tempData
),
axis
=
0
)
return
data
,
labels
```
### 4-
split_dataset takes in some numpy array that represents the data, some corresponding labels and a split pourcentage.
The function finds a random permutation of these values with np.random.shuffle.
We then mix both data and labels in the same way.
We return the first elements so that we reach the split pourcentage we wanted.
```
rb
def
split_dataset
(
data:
np
.
array
,
labels:
list
,
split:
float
):
if
not
0
<
split
<
1
:
raise
ValueError
(
'Split is not a float between 0 and 1'
)
# Generate random indices for shuffling the data
num_samples
=
len
(
labels
)
indices
=
np
.
arange
(
num_samples
)
np
.
random
.
shuffle
(
indices
)
# Shuffle the data and labels using the random indices
shuffled_data
=
data
[
indices
]
shuffled_labels
=
np
.
array
(
labels
)[
indices
]
# Calculate the split index based on the given split ratio
split_index
=
int
(
split
*
num_samples
)
# Split the data and labels into training and testing sets
data_train
,
data_test
=
shuffled_data
[
:split_index
],
shuffled_data
[
split_index
:]
labels_train
,
labels_test
=
shuffled_labels
[
:split_index
],
shuffled_labels
[
split_index
:]
return
data_train
,
data_test
,
labels_train
,
labels_test
```
## k-nearest neighbors
All the code for this section is in the knn.py file.
Functions have unit tests in the tests folder
Running pytest in tests folder will provide proof that algorithms work as expected.
### 1-
distance_matrix takes in two matrices which represents pictures.
It returns the L2 Euclidian distance of each image in the first set to each image in the second set.
Because of this the result is a numpy array of size: number_of_items_in_the_first_set
*
number_of_items_in_the_second_set
```
rb
def
distance_matrix
(
matrix1
:
np
.
array
,
matrix2
:
np
.
array
):
sum_of_squares_matrix1
=
np
.
sum
(
np
.
square
(
matrix1
),
axis
=
1
,
keepdims
=
True
)
sum_of_squares_matrix2
=
np
.
sum
(
np
.
square
(
matrix2
),
axis
=
1
,
keepdims
=
True
)
dot_product
=
np
.
dot
(
matrix1
,
matrix2
.
T
)
dists
=
np
.
sqrt
(
sum_of_squares_matrix1
+
sum_of_squares_matrix2
.
T
-
2
*
dot_product
)
return
dists
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment