0
$\begingroup$

I am new to neural networks. I am studying back propagation and saw different references. for a layer $k$, some references state that the error $\delta_j^k$ for neuron $j$ at $k$th layer is

$$ \delta_j^k = \dfrac{\partial E}{\partial a_j^k} $$

while some other references state

$$ \delta_j^k = \dfrac{\partial E}{\partial z_j^k} $$ where $z^k = w^l a^{(l-1)} + b^k$. Andrew Ng in his courses introduced this as $$ \delta^k = (W^{(k+1)})^T \delta^{(k+1)} .* \sigma^{'}(z^{(k)}) $$ that made me confused. Which one is true?

$\endgroup$

1 Answer 1

2
$\begingroup$

The answer to your question very much depends on the resource that you are using (i.e. there is no real right or wrong).

When using the notation $\boldsymbol{z}^{(l)} = \boldsymbol{W}^{(l)} \boldsymbol{a}^{(l-1)} + \boldsymbol{b}^{(l)}$ and $\boldsymbol{a}^{(l)} = \phi\bigl(\boldsymbol{z}^{(l)}\bigr)$, the error given by $$\boldsymbol{\delta}^{(l)} = {\boldsymbol{W}^{(l+1)}}^\mathsf{T} \boldsymbol{\delta}^{(l+1)} \odot \phi'\bigl(\boldsymbol{z}^{(l)}\bigr)$$ is the derivative w.r.t. the pre-activations, i.e. $\frac{\partial E}{\partial \boldsymbol{z}^{(l)}}$. (Note: I took the liberty to use $\phi$ for the activation function and $\odot$ to denote the Hadamard (i.e. element-wise) product.)

When considering the gradient w.r.t. the activations, i.e. $\frac{\partial E}{\partial \boldsymbol{a}^{(l)}}$, the error would be $$\boldsymbol{d}^{(l)} = {\boldsymbol{W}^{(l)}}^\mathsf{T} \boldsymbol{d}^{(l+1)} \odot \phi'\bigl(\boldsymbol{z}^{(l)}\bigr).$$ The difference is subtle, but the recursion aligns the weights differently.

However, you will rarely find the latter expression, because (almost) everyone uses the gradients w.r.t. the pre-activations. For some intuition as to why that is, I refer to this answer I gave to another question.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.