Update the average of a continuous sequence of numbers in constant time

Question

How can you add and subtract numbers in an average without having to iterate through the entire list?

This can be very useful in many situations. For example to continuously calculate the average of the last X values in a stream, adding two averages together, and updating a rating based on a new user vote.

This is called incremental averaging and was answered on Math.SE. — Dan Dascalescu
– Dan Dascalescu, Commented Jan 13, 2020 at 4:03

Peter Cordes · Accepted Answer · 2020-10-07 23:38:02Z

It is indeed possible to manipulate single values in an average in constant time, O(1).

The following function adds a number to an average. average is the current average, size is the current number of values in the average, and value is the number to add to the average:

double addToAverage(double average, int size, double value) { return (size * average + value) / (size + 1); }

Likewise, the following function removes a number from the average:

double subtractFromAverage(double average, int size, double value) { // if (size == 1) return 0; // wrong but then adding a value "works" // if (size == 1) return NAN; // mathematically proper // assert(size > 1); // debug-mode check // if(size < 2) throw(...) // always check return (size * average - value) / (size - 1); }

You might consider returning 0 as the average of a set of size 0 just so adding a value back in will give that value as the average. But if you want to consider it a bug to ever reduce your set to size 0, returning NAN will propagate that to future uses, making it more visible. But see What is the arithmetic mean of an empty sequence? - you might want to just noisily report the error on the spot, or throw a C++ exception (not just raise an FP exception) if it's a bug for this to ever happen.

If you don't special case it, you'll probably get + or -Inf, from a x / 0. with non-zero x, unless the value you remove is exactly equal to the current average; then you'll get 0. / 0. => NaN.

You can also combine these functions to easily replace a number. This is very convenient if you are calculating the average of the last X numbers in an array/stream.

double replaceInAverage(double average, int size, double oldValue, double newValue) { return (size * average - oldvalue + newValue) / size; }

It is also possible to calculate the total average of two averages in constant time:

double addAveragesTogether(double averageA, int sizeA, double averageB, int sizeB) { return (sizeA * averageA + sizeB * averageB) / (sizeA + sizeB); }

While addToAverage is correct, note that precision errors are likely to be smaller when using this alternative formula.
subtractFromAverage would throw an error if size is 1. I would add if (oldSize == 1) return 0;
@Yousif: I'm not sure silently returning 0 is better for all use-cases. If anything, NaN would be more appropriate. (The current code will actually return +-Inf which is not good either, unless average == value to get 0. / 0. => NaN). I guess the advantage to returning 0 is that adding to the average will set the average to that.
Also note that FP division is pretty expensive; this is still generally worth it but not as cheap as just adding and multiplying. (If size is a compile-time constant, you could do double inverse = 1. / size; but that might not be exact and could accumulate error over repeated use.)

c z · Accepted Answer · 2021-11-08 09:45:40Z

31

The typical way already mentioned is:

( n * a + v ) / (n + 1);

Where n is our old count, a is our old average, and v is our new value.

However, the n * a part will ultimately overflow as n gets bigger, especially if a itself is large. To avoid this use:

a + ( v - a ) / (n + 1)

As n increases we do lose some precision - naturally we are modifying a by successively smaller amounts. Batching values can mitigate the problem, but is probably overkill for most tasks.

edited Nov 8, 2021 at 9:45

answered Dec 4, 2018 at 17:36

c z

9,3564 gold badges58 silver badges63 bronze badges

5 Comments

JannisW Over a year ago

If someone is interested why the second equation works as well, you can find a nice explanation here: math.stackexchange.com/a/1836447/709688

Barnack Over a year ago

but is there an alternative for removal and replacement as well?

Peter Cordes Over a year ago

Note that floating point keeps the same relative accuracy at all scales, so multiplying and then dividing by similar-sized numbers doesn't lose much precision; there's only a problem if it actually overflows past DBL_MAX, about 1.79769e+308 which is extremely huge. The other major numerical problem is adding a small number to a big number with n*a + v or a + v/n. If v/n is less than 1ULP of a, adding it won't even flip the low bit of the mantissa of a. i.e. if |v| < |a|/2^53 or so. Even if v is not quite that small, you can still be losing most of its precision.

c z Over a year ago

@PeterCordes Yes, this compares equation 2 to recalculating the average from scratch. Equation 1 still has the same problem though - as n*a approaches MAX then n*a + v = n*a. Recalculating the average using a suitable datatype will always be better, but isn't always possible (or necessary), as in the OP's case.

c z Over a year ago

@Barnack To remove an item from the average, remove the effect of that item from the current average, i.e. a-(v-a)/(n-1). (where n and a represent the number of items and average before the removal of v).

Collectives™ on Stack Overflow

Update the average of a continuous sequence of numbers in constant time

2 Answers 2

4 Comments

5 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

5 Comments

Linked

Related