Skip to content
Snippets Groups Projects
Commit dcc8a516 authored by Quentin Gallouédec's avatar Quentin Gallouédec
Browse files

text new tex

parent 3b0918d5
No related branches found
No related tags found
No related merge requests found
...@@ -119,7 +119,7 @@ Let's consider a network with a hidden layer. ...@@ -119,7 +119,7 @@ Let's consider a network with a hidden layer.
The weight matrix of the layer $L$ is denoted $W^{(L)}$. The bias vector of the layer $L$ is denoted $B^{(L)}$. We choose the sigmoid function, denoted $\sigma$, as the activation function. The output vector of the layer $L$ before activation is denoted $Z^{(L)}$. The output vector of the layer $L$ after activation is denoted $A^{(L)}$. By convention, we note $A^{(0)}$ the network input vector. Thus $Z^{(L+1)} = W^{(L+1)}A^{(L)} + B^{(L+1)}$ and $A^{(L+1)} = \sigma\left(Z^{(L+1)}\right)$. In our example, the output is $\hat{Y} = A^{(2)}$. The weight matrix of the layer $L$ is denoted $W^{(L)}$. The bias vector of the layer $L$ is denoted $B^{(L)}$. We choose the sigmoid function, denoted $\sigma$, as the activation function. The output vector of the layer $L$ before activation is denoted $Z^{(L)}$. The output vector of the layer $L$ after activation is denoted $A^{(L)}$. By convention, we note $A^{(0)}$ the network input vector. Thus $Z^{(L+1)} = W^{(L+1)}A^{(L)} + B^{(L+1)}$ and $A^{(L+1)} = \sigma\left(Z^{(L+1)}\right)$. In our example, the output is $\hat{Y} = A^{(2)}$.
Let $Y$ be the desired output. We use mean squared error (MSE) as the cost function. Thus, the cost is $C = \frac{1}{N_{out}}\sum_{i=1}^{N_{out}} (\hat{y_i} - y_i)^2$. Let $Y$ be the desired output. We use mean squared error (MSE) as the cost function. Thus, the cost is $C = \frac{1}{N_{out}}\sum_{i=1}^{N_{out}} (\hat{y_i} - y_i)^2$.
1. Prove that $\sigma' = \sigma \times (1-\sigma)$ 1. Prove that $`\sigma' = \sigma \times (1-\sigma)`$
2. Express $\frac{\partial C}{\partial A^{(2)}}$, i.e. the vector of $\frac{\partial C}{\partial a^{(2)}_i}$ as a function of $A^{(2)}$ and $Y$. 2. Express $\frac{\partial C}{\partial A^{(2)}}$, i.e. the vector of $\frac{\partial C}{\partial a^{(2)}_i}$ as a function of $A^{(2)}$ and $Y$.
3. Using the chaining rule, express $\frac{\partial C}{\partial Z^{(2)}}$, i.e. the vector of $\frac{\partial C}{\partial z^{(2)}_i}$ as a function of $\frac{\partial C}{\partial A^{(2)}}$ and $A^{(2)}$. 3. Using the chaining rule, express $\frac{\partial C}{\partial Z^{(2)}}$, i.e. the vector of $\frac{\partial C}{\partial z^{(2)}_i}$ as a function of $\frac{\partial C}{\partial A^{(2)}}$ and $A^{(2)}$.
4. Similarly, express $\frac{\partial C}{\partial W^{(2)}}$, i.e. the matrix of $\frac{\partial C}{\partial w^{(2)}_{i,j}}$ as a function of $\frac{\partial C}{\partial Z^{(2)}}$ and $A^{(1)}$. 4. Similarly, express $\frac{\partial C}{\partial W^{(2)}}$, i.e. the matrix of $\frac{\partial C}{\partial w^{(2)}_{i,j}}$ as a function of $\frac{\partial C}{\partial Z^{(2)}}$ and $A^{(1)}$.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment