Instead of the MSE loss, we prefer to use a binary cross-entropy loss. We also want to replace the last activation layer of the network with a softmax layer.
Instead of the MSE loss, we prefer to use a binary cross-entropy loss. We also want to replace the last activation layer of the network with a softmax layer.
10. Write the function `one_hot` taking a (n)-D array as parameters and returning the corresponding (n+1)-D one-hot matrix.
11. Write the function `one_hot` taking a (n)-D array as parameters and returning the corresponding (n+1)-D one-hot matrix.
11. Write a function `learn_once_cross_entropy` taking the the same parameters as `learn_once_mse` and returns the same outputs. The function must use a cross entropy loss and the last layer of the network must be a softmax. We admit that $`\frac{\partial C}{\partial Z^{(2)}} = A^{(2)} - Y`$. Where $`Y`$ is a one-hot vector encoding the label.
12. Write a function `learn_once_cross_entropy` taking the the same parameters as `learn_once_mse` and returns the same outputs. The function must use a cross entropy loss and the last layer of the network must be a softmax. We admit that $`\frac{\partial C}{\partial Z^{(2)}} = A^{(2)} - Y`$. Where $`Y`$ is a one-hot vector encoding the label.
11. Write the function `learn_once_cross_entropy` taking as parameters:
13. Write the function `learn_once_cross_entropy` taking as parameters:
-`w1`, `b1`, `w2` and `b2` the weights and biases of the network,
-`w1`, `b1`, `w2` and `b2` the weights and biases of the network,
-`data` a matrix of shape (`batch_size` x `d_in`),
-`data` a matrix of shape (`batch_size` x `d_in`),
-`labels_train` a vector of size `batch_size`, and
-`labels_train` a vector of size `batch_size`, and
...
@@ -182,7 +182,7 @@ Instead of the MSE loss, we prefer to use a binary cross-entropy loss. We also w
...
@@ -182,7 +182,7 @@ Instead of the MSE loss, we prefer to use a binary cross-entropy loss. We also w
The function must return:
The function must return:
- `w1`, `b1`, `w2` and `b2` the updated weights and biases of the network,
- `w1`, `b1`, `w2` and `b2` the updated weights and biases of the network,
- `loss` the loss, for monitoring purpose.
- `loss` the loss, for monitoring purpose.
12. Write the function `train_mlp` taking as parameters:
14. Write the function `train_mlp` taking as parameters:
-`w1`, `b1`, `w2` and `b2` the weights and biases of the network,
-`w1`, `b1`, `w2` and `b2` the weights and biases of the network,
-`data_train` a matrix of shape (`batch_size` x `d_in`),
-`data_train` a matrix of shape (`batch_size` x `d_in`),
-`labels_train` a vector of size `batch_size`,
-`labels_train` a vector of size `batch_size`,
...
@@ -192,21 +192,21 @@ Instead of the MSE loss, we prefer to use a binary cross-entropy loss. We also w
...
@@ -192,21 +192,21 @@ Instead of the MSE loss, we prefer to use a binary cross-entropy loss. We also w
that perform `num_epoch` of training steps and returns:
that perform `num_epoch` of training steps and returns:
- `w1`, `b1`, `w2` and `b2` the updated weights and biases of the network,
- `w1`, `b1`, `w2` and `b2` the updated weights and biases of the network,
- `train_accuracies` the list of train accuracies across epochs as a list of floats.
- `train_accuracies` the list of train accuracies across epochs as a list of floats.
13. Write the function `test_mlp` taking as parameters:
15. Write the function `test_mlp` taking as parameters:
-`w1`, `b1`, `w2` and `b2` the weights and biases of the network,
-`w1`, `b1`, `w2` and `b2` the weights and biases of the network,
-`data_test` a matrix of shape (`batch_size` x `d_in`), and
-`data_test` a matrix of shape (`batch_size` x `d_in`), and
-`labels_test` a vector of size `batch_size`,
-`labels_test` a vector of size `batch_size`,
testing the network on the test set and returns:
testing the network on the test set and returns:
- `test_accuracy` the testing accuracy.
- `test_accuracy` the testing accuracy.
14. Write the function `run_mlp_training` taking as parameter:
16. Write the function `run_mlp_training` taking as parameter:
-`data_train`, `labels_train`, `data_test`, `labels_test`, the training and testing data,
-`data_train`, `labels_train`, `data_test`, `labels_test`, the training and testing data,
-`d_h` the number of neurons in the hidden layer
-`d_h` the number of neurons in the hidden layer
-`learning_rate` the learning rate, and
-`learning_rate` the learning rate, and
-`num_epoch` the number of training epoch,
-`num_epoch` the number of training epoch,
that train an MLP classifier and return the training accuracies across epochs as a list of floats and the final testing accuracy as a float.
that train an MLP classifier and return the training accuracies across epochs as a list of floats and the final testing accuracy as a float.
15. For `split_factor=0.9`, `d_h=64`, `learning_rate=0.1` and `num_epoch=100`, plot the evolution of learning accuracy across learning epochs. Save the graph as an image named `mlp.png` in the `results` directory.
17. For `split_factor=0.9`, `d_h=64`, `learning_rate=0.1` and `num_epoch=100`, plot the evolution of learning accuracy across learning epochs. Save the graph as an image named `mlp.png` in the `results` directory.