Skip to content
Snippets Groups Projects
Commit 9fb3c01f authored by Dellandrea Emmanuel's avatar Dellandrea Emmanuel
Browse files

Update README.md

parent b6a38bd2
Branches
No related tags found
No related merge requests found
......@@ -98,7 +98,7 @@ First of all, let's focus on the backpropagation of the gradient with an example
The weight matrix of the layer $`L`$ is denoted $`W^{(L)}`$. The bias vector of the layer $`L`$ is denoted $`B^{(L)}`$. We choose the sigmoid function, denoted $`\sigma`$, as the activation function. The output vector of the layer $`L`$ before activation is denoted $`Z^{(L)}`$. The output vector of the layer $`L`$ after activation is denoted $`A^{(L)}`$. By convention, we note $`A^{(0)}`$ the network input vector. Thus $`Z^{(L+1)} = W^{(L+1)}A^{(L)} + B^{(L+1)}`$ and $`A^{(L+1)} = \sigma\left(Z^{(L+1)}\right)`$. Let's consider a network with one hidden layer. Thus, the output is $`\hat{Y} = A^{(2)}`$.
Let $`Y`$ be the target (desired output). We use mean squared error (MSE) as the cost function. Thus, the cost is $`C = \frac{1}{N_{out}}\sum_{i=1}^{N_{out}} (\hat{y_i} - y_i)^2`$.
1. Prove that $`\sigma' = \sigma \times (1-\sigma)`$
1. We admit that for the sigmoide function $`\sigma`$, the derivative is $`\sigma' = \sigma \times (1-\sigma)`$
2. Express $`\frac{\partial C}{\partial A^{(2)}}`$, i.e. the vector of $`\frac{\partial C}{\partial a^{(2)}_i}`$ as a function of $`A^{(2)}`$ and $`Y`$.
3. Using the chaining rule, express $`\frac{\partial C}{\partial Z^{(2)}}`$, i.e. the vector of $`\frac{\partial C}{\partial z^{(2)}_i}`$ as a function of $`\frac{\partial C}{\partial A^{(2)}}`$ and $`A^{(2)}`$.
4. Similarly, express $`\frac{\partial C}{\partial W^{(2)}}`$, i.e. the matrix of $`\frac{\partial C}{\partial w^{(2)}_{i,j}}`$ as a function of $`\frac{\partial C}{\partial Z^{(2)}}`$ and $`A^{(1)}`$.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment