0
$\begingroup$

Let's say that I have an error function, E, such that: $$ E_p = \frac{1}{2} \sum_{j}(\delta_{pj})^2$$

If I take the partial derivative like this:

$$\frac{\partial{E_p}}{\partial{\delta_{pj}}} $$

will I get:

$$ \frac{\partial{E_p}}{\partial{\delta_{pj}}} = \delta_{pj}$$?

And if I take the partial derivative like this:

$$\frac{\partial{E_p}}{\partial{\delta_{p}}} $$ (notice that I've dropped the subscript, j),

will I get:

$$ \frac{\partial{E_p}}{\partial{\delta_{p}}} = \sum_{j}(\delta_{pj}) $$ ?

I'm trying to understand the effect that taking a partial derivative has on a summation involving a vector (especially when there are different indexes used in the derivatives) because I've had trouble finding information about that subject in particular. Maybe I just don't know what to search for, so if someone has a link to an article that will help me understand this matter, I will greatly appreciate it.

$\endgroup$

1 Answer 1

0
$\begingroup$

$E_p$ is defined via a sum and let me use a dummy variable $k$ for that index of summation.

$$ E_p=\frac{1}{2}\sum_{k}(\delta_{pk})^2 $$

Now vix a particular value of $j$ and differentiate.

$$ \frac{\partial E_p}{\partial \delta_{pj}}=\frac{1}{2}\sum_{k}2\delta_{pk}\frac{\partial \delta_{pk}}{\partial \delta_{pj}}= \sum_{k}\delta_{pk}\frac{\partial \delta_{pk}}{\partial \delta_{pj}} $$

$\delta_{p1},\delta_{p2},\cdots$ are independent so we can write

$$ \frac{\partial \delta_{pk}}{\partial \delta_{pj}}=\Delta_{kj} $$

I use a nonstandard notation $\Delta_{kj}$ for the delta kronecker symbol. The delta kronecker is equal to $1$ when $k=j$ and $0$ when $k\neq j$, with this we can now sum

$$ \frac{\partial E_p}{\partial \delta_{pj}}=\sum_{k}\delta_{pk}\Delta_{kj}=\delta_{pj} $$

What messes a lot of people up on questions like this is using the symbol $j$ in two ways: one as a dummy variable in the sum and as a particular value when you differentiate with respect to $\delta_{pj}$, overloading this symbol can cause confusion.

For your second question I am not sure what you mean by $\delta_{p}$. If you mean $\delta_{p}=(\delta_{p1},\delta_{p2},\cdots)$, then your notation reflects a Russian style of taking a gradient.

$$ \frac{\partial E_p}{\partial \delta_p}=\sum_{j}\frac{\partial E_p}{\partial \delta_{pj}}\hat{e}_j $$

Where $\hat{e}_j$ is a unit vector, oriented in the $\delta_{pj}$ direction. Using the previous result

$$ \frac{\partial E_p}{\partial \delta_p}=\sum_{j}\delta_{pj}\hat{e}_j=\delta_p $$

but perhaps you have something else in mind for the symbol $\delta_p$ and how to define differentiation with respect to that symbol. Hope that helps.

$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.