3
$\begingroup$

I am reading Kingma and Lei Ba's paper introducing the Adam optimizer. I was looking over some derivations for the second moment estimate:

Second Moment Estimate from Paper

I noticed that they find the sum of a finite geometric series from the second to third equation in the image. The equation for finding the sum is:

Sum of Geometric Series

But they don't seem to multiply by the first term of the series. What am I missing? If this is some kind of approximation, why is it allowed/favorable?

$\endgroup$

1 Answer 1

2
$\begingroup$

Doing the change of index $j=t-i$, we get $$ (1-\beta_2)\sum_{i=1}^t\beta_2^{t-i}=(1-\beta_2)\sum_{j=0}^{t-1}\beta_2^j= \sum_{j=0}^{t-1}\left(\beta_2^j-\beta_2^{j+1}\right)=1-\beta_2^t, $$ where the last equality follows from the fact that the sum is telescopic.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.