26

How can you add and subtract numbers in an average without having to iterate through the entire list?

This can be very useful in many situations. For example to continuously calculate the average of the last X values in a stream, adding two averages together, and updating a rating based on a new user vote.

1

2 Answers 2

62

It is indeed possible to manipulate single values in an average in constant time, O(1).

The following function adds a number to an average. average is the current average, size is the current number of values in the average, and value is the number to add to the average:

double addToAverage(double average, int size, double value) { return (size * average + value) / (size + 1); } 

Likewise, the following function removes a number from the average:

double subtractFromAverage(double average, int size, double value) { // if (size == 1) return 0; // wrong but then adding a value "works" // if (size == 1) return NAN; // mathematically proper // assert(size > 1); // debug-mode check // if(size < 2) throw(...) // always check return (size * average - value) / (size - 1); } 

You might consider returning 0 as the average of a set of size 0 just so adding a value back in will give that value as the average. But if you want to consider it a bug to ever reduce your set to size 0, returning NAN will propagate that to future uses, making it more visible. But see What is the arithmetic mean of an empty sequence? - you might want to just noisily report the error on the spot, or throw a C++ exception (not just raise an FP exception) if it's a bug for this to ever happen.

If you don't special case it, you'll probably get + or -Inf, from a x / 0. with non-zero x, unless the value you remove is exactly equal to the current average; then you'll get 0. / 0. => NaN.


You can also combine these functions to easily replace a number. This is very convenient if you are calculating the average of the last X numbers in an array/stream.

double replaceInAverage(double average, int size, double oldValue, double newValue) { return (size * average - oldvalue + newValue) / size; } 

It is also possible to calculate the total average of two averages in constant time:

double addAveragesTogether(double averageA, int sizeA, double averageB, int sizeB) { return (sizeA * averageA + sizeB * averageB) / (sizeA + sizeB); } 
Sign up to request clarification or add additional context in comments.

4 Comments

While addToAverage is correct, note that precision errors are likely to be smaller when using this alternative formula.
subtractFromAverage would throw an error if size is 1. I would add if (oldSize == 1) return 0;
@Yousif: I'm not sure silently returning 0 is better for all use-cases. If anything, NaN would be more appropriate. (The current code will actually return +-Inf which is not good either, unless average == value to get 0. / 0. => NaN). I guess the advantage to returning 0 is that adding to the average will set the average to that.
Also note that FP division is pretty expensive; this is still generally worth it but not as cheap as just adding and multiplying. (If size is a compile-time constant, you could do double inverse = 1. / size; but that might not be exact and could accumulate error over repeated use.)
31

The typical way already mentioned is:

( n * a + v ) / (n + 1); 

Where n is our old count, a is our old average, and v is our new value.

However, the n * a part will ultimately overflow as n gets bigger, especially if a itself is large. To avoid this use:

a + ( v - a ) / (n + 1) 

As n increases we do lose some precision - naturally we are modifying a by successively smaller amounts. Batching values can mitigate the problem, but is probably overkill for most tasks.

5 Comments

If someone is interested why the second equation works as well, you can find a nice explanation here: math.stackexchange.com/a/1836447/709688
but is there an alternative for removal and replacement as well?
Note that floating point keeps the same relative accuracy at all scales, so multiplying and then dividing by similar-sized numbers doesn't lose much precision; there's only a problem if it actually overflows past DBL_MAX, about 1.79769e+308 which is extremely huge. The other major numerical problem is adding a small number to a big number with n*a + v or a + v/n. If v/n is less than 1ULP of a, adding it won't even flip the low bit of the mantissa of a. i.e. if |v| < |a|/2^53 or so. Even if v is not quite that small, you can still be losing most of its precision.
@PeterCordes Yes, this compares equation 2 to recalculating the average from scratch. Equation 1 still has the same problem though - as n*a approaches MAX then n*a + v = n*a. Recalculating the average using a suitable datatype will always be better, but isn't always possible (or necessary), as in the OP's case.
@Barnack To remove an item from the average, remove the effect of that item from the current average, i.e. a-(v-a)/(n-1). (where n and a represent the number of items and average before the removal of v).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.