1
$\begingroup$

I do not understand this simple equality

$$\frac{\partial h_t}{\partial w_\textrm{h}}= \frac{\partial f(x_{t},h_{t-1},w_\textrm{h})}{\partial w_\textrm{h}} +\frac{\partial f(x_{t},h_{t-1},w_\textrm{h})}{\partial h_{t-1}} \frac{\partial h_{t-1}}{\partial w_\textrm{h}}.$$

Where $$h_t = f(x_t, h_{t-1}, w_\textrm{h})$$

For me, the second term in the sum is superflous, as we are just replacing $h_t$ with its value ! So we should have either

$$\frac{\partial h_t}{\partial w_\textrm{h}}= \frac{\partial f(x_{t},h_{t-1},w_\textrm{h})}{\partial w_\textrm{h}}.$$ Or $$\frac{\partial h_t}{\partial w_\textrm{h}}=\frac{\partial f(x_{t},h_{t-1},w_\textrm{h})}{\partial h_{t-1}} \frac{\partial h_{t-1}}{\partial w_\textrm{h}}.$$

Where the second one is a simple expansion with the chain rule

What am I missing ? Thanks

$\endgroup$
2
  • 1
    $\begingroup$ This classical notation is terrible. In such ambiguous situations careful writers use a more modern notation, such as $f_j$ for the partial derivative of $f$ with respect to its $j^{\text{th}}$ argument: such clarification will immediately exhibit the equation as the multivariate chain rule . Regardless, in what sense are we "replacing $h_t$ with its value"? $\endgroup$ Commented Jun 5 at 14:52
  • 1
    $\begingroup$ @whuber Thank you for pointing that. I was confused because of the notations. Now I see that $h_t$ depends on $w_h$ through both the second and third arguments of f, hence why we take the total derivative and results in two terms (three if $x_t$ depended on $w_t$ which is obviously not) $\endgroup$ Commented Jun 5 at 15:19

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.