Calculating derivative for the final layer of a neural network

Question

I'm first learning about backpropagation in neural networks. We're doing stochastic gradient descent.

The lecture provides incomplete detail on computing the derivatives for the final layer.

We have the following chain rule equation for the partial of the change in the error as a function of a given weight:

$$ \frac{\partial \mathrm{e}(\mathbf{w})}{\partial w_{i j}^{(l)}}=\frac{\partial \mathrm{e}(\mathbf{w})}{\partial s_j^{(l)}} \times \frac{\partial s_j^{(l)}}{\partial w_{i j}^{(l)}} $$

$s$ stands for the signal (sum of weights $w$ times inputs $x$ from the previous layer). The second partial derivative, of the signal with regard to the weight is simple: $\frac{\partial s_j^{(l)}}{\partial w_{i j}^{(l)}}=x_i^{(l-1)}$.

We only need $\frac{\partial \mathrm{e}(\mathrm{w})}{\partial s_j^{(l)}}=\delta_j^{(l)}$.

For the final layer $l=L$ and $j=1$.

I'm trying to solve for $\delta_1^{(L)}=\frac{\partial \mathrm{e}(\mathbf{w})}{\partial s_1^{(L)}}$

The error function is $\mathrm{e}(\mathrm{w})=\left(x_1^{(L)}-y_n\right)^2$.
$x_1^{(L)}=\theta\left(s_1^{(L)}\right)$.
Our $\theta$ is tanh, so $\theta^{\prime}(s)=1-\theta^2(s)$.

How do I calculate $\delta_1^{(L)}$? I'm guessing its just an application of the chain rule, but I want to make sure I get it right before proceeding with back propagation on prior layers.

I agree, this should be an application of the chain rule again. How exactly do you need help with this? Maybe plug in $x_1^L$ into $e(w)$ and observe that $s_1^L$ is inside two functions, both of which have easy derivatives. — picky_porpoise
– picky_porpoise, Commented Nov 19, 2023 at 10:37
I'm a bit shaky on the chain rule with partial derivatives, would appreciate if someone could help work this out. I'm going to review the chain rule for partials and propose an answer — Ben G
– Ben G, Commented Nov 19, 2023 at 17:30

Ben G · Accepted Answer · 2023-11-19 18:35:42Z

Here's my attempt:

$$ \begin{aligned} & \delta_1^{(L)}=\frac{\partial \mathrm{e}(\mathbf{w})}{\partial s_1^{(L)}} \\ & = \frac{\partial(\theta(s_1^{(L)}) - y_n)^2}{\partial s_1^{(L)}} \\ & = 2(\theta(s_1^{(L)}) - y_n)(\theta^{\prime}(s_1^{(L)})) \\ & = 2(x_1^{(L)} - y_n)(\theta^{\prime}(s_1^{(L)})) \\ & = 2(x_1^{(L)} - y_n)(1 - \theta^2(s_1^{(L)})) \\ & = 2(x_1^{(L)} - y_n)(1 - (x_1^{(L)})^2) \end{aligned} $$

Is this correct?

$\begingroup$ Seems fine to me $\endgroup$

picky_porpoise
– picky_porpoise

2023-11-19 20:30:51 +00:00
Commented Nov 19, 2023 at 20:30 — picky_porpoise
– picky_porpoise, Commented Nov 19, 2023 at 20:30

Stack Exchange Network

Calculating derivative for the final layer of a neural network

1 Answer 1

Hot Network Questions

Calculating derivative for the final layer of a neural network

1 Answer 1

Related

Hot Network Questions