Taking partial derivative of matrix

Question

Let's say that I have an error function, E, such that: $$ E_p = \frac{1}{2} \sum_{j}(\delta_{pj})^2$$

If I take the partial derivative like this:

$$\frac{\partial{E_p}}{\partial{\delta_{pj}}} $$

will I get:

$$ \frac{\partial{E_p}}{\partial{\delta_{pj}}} = \delta_{pj}$$?

And if I take the partial derivative like this:

$$\frac{\partial{E_p}}{\partial{\delta_{p}}} $$ (notice that I've dropped the subscript, j),

will I get:

$$ \frac{\partial{E_p}}{\partial{\delta_{p}}} = \sum_{j}(\delta_{pj}) $$ ?

I'm trying to understand the effect that taking a partial derivative has on a summation involving a vector (especially when there are different indexes used in the derivatives) because I've had trouble finding information about that subject in particular. Maybe I just don't know what to search for, so if someone has a link to an article that will help me understand this matter, I will greatly appreciate it.

Tucker · Accepted Answer · 2017-06-10 22:57:56Z

$E_p$ is defined via a sum and let me use a dummy variable $k$ for that index of summation.

$$ E_p=\frac{1}{2}\sum_{k}(\delta_{pk})^2 $$

Now vix a particular value of $j$ and differentiate.

$$ \frac{\partial E_p}{\partial \delta_{pj}}=\frac{1}{2}\sum_{k}2\delta_{pk}\frac{\partial \delta_{pk}}{\partial \delta_{pj}}= \sum_{k}\delta_{pk}\frac{\partial \delta_{pk}}{\partial \delta_{pj}} $$

$\delta_{p1},\delta_{p2},\cdots$ are independent so we can write

$$ \frac{\partial \delta_{pk}}{\partial \delta_{pj}}=\Delta_{kj} $$

I use a nonstandard notation $\Delta_{kj}$ for the delta kronecker symbol. The delta kronecker is equal to $1$ when $k=j$ and $0$ when $k\neq j$, with this we can now sum

$$ \frac{\partial E_p}{\partial \delta_{pj}}=\sum_{k}\delta_{pk}\Delta_{kj}=\delta_{pj} $$

What messes a lot of people up on questions like this is using the symbol $j$ in two ways: one as a dummy variable in the sum and as a particular value when you differentiate with respect to $\delta_{pj}$, overloading this symbol can cause confusion.

For your second question I am not sure what you mean by $\delta_{p}$. If you mean $\delta_{p}=(\delta_{p1},\delta_{p2},\cdots)$, then your notation reflects a Russian style of taking a gradient.

$$ \frac{\partial E_p}{\partial \delta_p}=\sum_{j}\frac{\partial E_p}{\partial \delta_{pj}}\hat{e}_j $$

Where $\hat{e}_j$ is a unit vector, oriented in the $\delta_{pj}$ direction. Using the previous result

$$ \frac{\partial E_p}{\partial \delta_p}=\sum_{j}\delta_{pj}\hat{e}_j=\delta_p $$

but perhaps you have something else in mind for the symbol $\delta_p$ and how to define differentiation with respect to that symbol. Hope that helps.

Stack Exchange Network

Taking partial derivative of matrix

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Taking partial derivative of matrix

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions