diff --git a/README.md b/README.md index 514c68ee37b9888377d2a2e7382b8ba8c0a02f79..69ca44265021726dbe8437a446474a055ec4ac52 100644 --- a/README.md +++ b/README.md @@ -177,9 +177,7 @@ We also need that the last activation layer of the network to be a softmax layer - `labels_train` a vector of size `batch_size`, and - `learning_rate` the learning rate, - that perform one gradient descent step using a binary cross-entropy loss. - We admit that $`\frac{\partial C}{\partial Z^{(2)}} = A^{(2)} - Y`$, where $`Y`$ is a one-hot vector encoding the label. - The function must return: + that perform one gradient descent step using a binary cross-entropy loss. We admit that $`\frac{\partial C}{\partial Z^{(2)}} = A^{(2)} - Y`$, where $`Y`$ is a one-hot vector encoding the label. The function must return: - `w1`, `b1`, `w2` and `b2` the updated weights and biases of the network, - `loss` the loss, for monitoring purpose. 13. Write the function `train_mlp` taking as parameters: