I'm looking for a VERY DETAILED demonstration for the backward propagation algorithm in neural networks machine learning. Specifically the step below.
I've got the excellent Michael Nielsen demonstration, but I struggle to understand the step between formula (40):
$$\delta_j^L = \dfrac { \partial C} {\partial {z_j^L}} $$
and formula (41):
$$\delta_j^L = \sum_k \dfrac { \partial C} {\partial {a_k^L}} \dfrac {\partial {a_k^L}} {\partial {z_j^L}} $$
Which then gives (I understand this last step):
$$ \delta_j^L = \dfrac { \partial C} {\partial {a_j^L}} \dfrac {\partial {a_j^L}} {\partial {z_j^L}} $$
I suppose it's linked with the chain rule I've seen the partial derivative of a sum of two vectors but not the kind of sum in my example above.
Any help?