Finding the average of a list

Question

How do I find the arithmetic mean of a list in Python? For example:

[1, 2, 3, 4] ⟶ 2.5

sum(L) / float(len(L)). handle empty lists in caller code like if not L: ... — n611x007
– n611x007, Commented Nov 2, 2015 at 12:12
@mitch: it's not a matter of whether you can afford installing numpy. numpy is a whole word in itself. It's whether you actually need numpy. Installing numpy, a 16mb C extension, for mean calculating would be, well, very impractical, for someone not using it for other things. — n611x007
– n611x007, Commented Nov 2, 2015 at 12:15
instead of installing the whole numpy package for just avg/mean if using python 3 we can get this thing done using statistic module just by "from statistic import mean" or if on python 2.7 or less, the statistic module can be downloaded from src: hg.python.org/cpython/file/default/Lib/statistics.py doc: docs.python.org/dev/library/statistics.html and directly used. — 25mhz
– 25mhz, Commented Jul 18, 2016 at 4:48
Possible duplicate of Calculating arithmetic mean (average) in Python — Ravindra S
– Ravindra S, Commented May 23, 2017 at 16:21

Herms · Accepted Answer · 2022-07-27 06:37:32Z

912

For Python 3.8+, use statistics.fmean for numerical stability with floats. (Fast.)

For Python 3.4+, use statistics.mean for numerical stability with floats. (Slower.)

xs = [15, 18, 2, 36, 12, 78, 5, 6, 9] import statistics statistics.mean(xs) # = 20.11111111111111

For older versions of Python 3, use

sum(xs) / len(xs)

For Python 2, convert len to a float to get float division:

sum(xs) / float(len(xs))

edited Jul 27, 2022 at 6:37

user3064538

answered Jan 27, 2012 at 21:00

Herms

39.2k13 gold badges80 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Carla Dessi Over a year ago

as i said, i'm new to this, i was thinking i'd have to make it with a loop or something to count the amount of numbers in it, i didn't realise i could just use the length. this is the first thing i've done with python..

Foo Bar User Over a year ago

what if the sum is a massive number that wont fit in int/float ?

Arseniy Over a year ago

@FooBarUser then you should calc k = 1.0/len(l), and then reduce: reduce(lambda x, y: x + y * k, l)

n611x007 Over a year ago

downvoted because I cannot see why reduce and lambda should be on the top of a question about avarage calculation

Jules Gagnon-Marchand Over a year ago

He should really be using sum though, as guido says to try really hard to avoid reduce

|

Mateen Ulhaq · Accepted Answer · 2022-07-17 08:05:59Z

610

xs = [15, 18, 2, 36, 12, 78, 5, 6, 9] sum(xs) / len(xs)

edited Jul 17, 2022 at 8:05

Mateen Ulhaq

27.8k21 gold badges121 silver badges155 bronze badges

answered Jan 27, 2012 at 21:01

yprez

15.2k12 gold badges57 silver badges70 bronze badges

3 Comments

lahjaton_j Over a year ago

As a C++ programmer, that is neat as hell and float is not ugly at all!

Steinfeld Over a year ago

If you want to reduce some numbers after decimal point. This might come in handy: float('%.2f' % float(sum(l) / len(l)))

yprez Over a year ago

@Steinfeld I don't think conversion to string is the best way to go here. You can achieve the same in a cleaner way with round(result, 2).

Mateen Ulhaq · Accepted Answer · 2022-07-17 08:11:17Z

329

Use numpy.mean:

xs = [15, 18, 2, 36, 12, 78, 5, 6, 9] import numpy as np print(np.mean(xs))

edited Jul 17, 2022 at 8:11

Mateen Ulhaq

27.8k21 gold badges121 silver badges155 bronze badges

answered Jan 28, 2012 at 3:59

Akavall

86.8k58 gold badges214 silver badges261 bronze badges

5 Comments

L. Amber O'Hearn Over a year ago

That's strange. I would have assumed this would be much more efficient, but it appears to take 8 times as long on a random list of floats than simply sum(l)/len(l)

L. Amber O'Hearn Over a year ago

Oh, but np.array(l).mean() is much faster.

Akavall Over a year ago

@L.AmberO'Hearn, I just timed it and np.mean(l) and np.array(l).mean are about the same speed, and sum(l)/len(l) is about twice as fast. I used l = list(np.random.rand(1000)), for course both numpy methods become much faster if l is numpy.array.

n611x007 Over a year ago

well, unless that's the sole reason for installing numpy. installing a 16mb C package of whatever fame for mean calculation looks very strange on this scale.

Elias Over a year ago

Also it's better to use np.nanmean(l) in order to avoid issues with NAN and zero divisions

Mateen Ulhaq · Accepted Answer · 2022-07-17 08:38:16Z

246

For Python 3.4+, use mean() from the new statistics module to calculate the average:

from statistics import mean xs = [15, 18, 2, 36, 12, 78, 5, 6, 9] mean(xs)

edited Jul 17, 2022 at 8:38

Mateen Ulhaq

27.8k21 gold badges121 silver badges155 bronze badges

answered Jan 12, 2014 at 6:34

Marwan Alsabbagh

27k10 gold badges59 silver badges66 bronze badges

3 Comments

Serge Stroobandt Over a year ago

This is the most elegant answer because it employs a standard library module which is available since python 3.4.

Antti Haapala Over a year ago

And it is numerically stabler

user3064538 Over a year ago

And it produces a nicer error if you accidentally pass in an empty list statistics.StatisticsError: mean requires at least one data point instead of a more cryptic ZeroDivisionError: division by zero for the sum(x) / len(x) solution.

Asclepius · Accepted Answer · 2022-02-10 03:22:50Z

51

Why would you use reduce() for this when Python has a perfectly cromulent sum() function?

print sum(l) / float(len(l))

(The float() is necessary in Python 2 to force Python to do a floating-point division.)

edited Feb 10, 2022 at 3:22

Asclepius

64.6k20 gold badges188 silver badges164 bronze badges

answered Jan 27, 2012 at 21:02

kindall

185k36 gold badges291 silver badges321 bronze badges

2 Comments

RolfBly Over a year ago

For those of us new to the word 'cromulent'

user3064538 Over a year ago

float() is not necessary on Python 3.

Chetan Sharma · Accepted Answer · 2017-05-11 08:22:37Z

There is a statistics library if you are using python >= 3.4

https://docs.python.org/3/library/statistics.html

You may use it's mean method like this. Let's say you have a list of numbers of which you want to find mean:-

list = [11, 13, 12, 15, 17] import statistics as s s.mean(list)

It has other methods too like stdev, variance, mode, harmonic mean, median etc which are too useful.

Mateen Ulhaq · Accepted Answer · 2022-07-17 08:41:49Z

EDIT:

I added two other ways to get the average of a list (which are relevant only for Python 3.8+). Here is the comparison that I made:

import timeit import statistics import numpy as np from functools import reduce import pandas as pd import math LIST_RANGE = 10 NUMBERS_OF_TIMES_TO_TEST = 10000 l = list(range(LIST_RANGE)) def mean1(): return statistics.mean(l) def mean2(): return sum(l) / len(l) def mean3(): return np.mean(l) def mean4(): return np.array(l).mean() def mean5(): return reduce(lambda x, y: x + y / float(len(l)), l, 0) def mean6(): return pd.Series(l).mean() def mean7(): return statistics.fmean(l) def mean8(): return math.fsum(l) / len(l) for func in [mean1, mean2, mean3, mean4, mean5, mean6, mean7, mean8 ]: print(f"{func.__name__} took: ", timeit.timeit(stmt=func, number=NUMBERS_OF_TIMES_TO_TEST))

These are the results I got:

mean1 took: 0.09751558300000002 mean2 took: 0.005496791999999973 mean3 took: 0.07754683299999998 mean4 took: 0.055743208000000044 mean5 took: 0.018134082999999968 mean6 took: 0.6663848750000001 mean7 took: 0.004305374999999945 mean8 took: 0.003203333000000086

Interesting! looks like math.fsum(l) / len(l) is the fastest way, then statistics.fmean(l), and only then sum(l) / len(l). Nice!

Thank you @Asclepius for showing me these two other ways!

OLD ANSWER:

In terms of efficiency and speed, these are the results that I got testing the other answers:

# test mean caculation import timeit import statistics import numpy as np from functools import reduce import pandas as pd LIST_RANGE = 10 NUMBERS_OF_TIMES_TO_TEST = 10000 l = list(range(LIST_RANGE)) def mean1(): return statistics.mean(l) def mean2(): return sum(l) / len(l) def mean3(): return np.mean(l) def mean4(): return np.array(l).mean() def mean5(): return reduce(lambda x, y: x + y / float(len(l)), l, 0) def mean6(): return pd.Series(l).mean() for func in [mean1, mean2, mean3, mean4, mean5, mean6]: print(f"{func.__name__} took: ", timeit.timeit(stmt=func, number=NUMBERS_OF_TIMES_TO_TEST))

and the results:

mean1 took: 0.17030245899968577 mean2 took: 0.002183011999932205 mean3 took: 0.09744236000005913 mean4 took: 0.07070840100004716 mean5 took: 0.022754742999950395 mean6 took: 1.6689282460001778

so clearly the winner is: sum(l) / len(l)

I tried these timings with a list of length 100000000: mean2 < 1s; mean3,4 ~ 8s; mean5,6 ~ 27s; mean1 ~1minute. I find this surprising, would have expected numpy to be best with a large list, but there you go! Seems there's a problem with the statistics package!! (this was python 3.8 on a mac laptop, no BLAS as far as I know).
Incidentally, if I convert l into an np.array first, np.mean takes ~.16s, so about 6x faster than sum(l)/len(l). Conclusion: if you're doing lots of calculations, best do everything in numpy.
@drevicko see mean4, this is what I do there... I guess that it its already a np.array then it make sense to use np.mean, but in case you have a list then you should use sum(l) / len(l)
exactly! It also depends on what you'll be doing with it later. Im my work I'm typically doing a series of calculations, so it makes sense to convert to numpy at the start and leverage numpy's fast underlying libraries.
@AlonGouldman Great. I urge showing each speed in 1/1000 of a second (as an integer), otherwise the number is hard to read. For example, 170, 2, 97, etc. This should make it so much more easily readable. Please let me know if this is done, and I will check.

Maxime Chéramy · Accepted Answer · 2014-02-06 10:58:22Z

19

Instead of casting to float, you can add 0.0 to the sum:

def avg(l): return sum(l, 0.0) / len(l)

answered Feb 6, 2014 at 10:58

Maxime Chéramy

19k10 gold badges58 silver badges77 bronze badges

Comments

Andrew Clark · Accepted Answer · 2012-01-27 21:17:32Z

11

sum(l) / float(len(l)) is the right answer, but just for completeness you can compute an average with a single reduce:

>>> reduce(lambda x, y: x + y / float(len(l)), l, 0) 20.111111111111114

Note that this can result in a slight rounding error:

>>> sum(l) / float(len(l)) 20.111111111111111

answered Jan 27, 2012 at 21:17

Andrew Clark

210k36 gold badges285 silver badges310 bronze badges

3 Comments

Johan Lundberg Over a year ago

I get that this is just for fun but returning 0 for an empty list may not be the best thing to do

Andrew Clark Over a year ago

@JohanLundberg - You could replace the 0 with False as the last argument to reduce() which would give you False for an empty list, otherwise the average as before.

EndermanAPM Over a year ago

@AndrewClark why do you force floaton len?

Andrea Rastelli · Accepted Answer · 2020-05-11 12:24:29Z

I tried using the options above but didn't work. Try this:

from statistics import mean n = [11, 13, 15, 17, 19] print(n) print(mean(n))

worked on python 3.5

U13-Forward · Accepted Answer · 2018-10-17 01:03:05Z

Or use pandas's Series.mean method:

pd.Series(sequence).mean()

Demo:

>>> import pandas as pd >>> l = [15, 18, 2, 36, 12, 78, 5, 6, 9] >>> pd.Series(l).mean() 20.11111111111111 >>>

From the docs:

Series.mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)¶

And here is the docs for this:

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.mean.html

And the whole documentation:

https://pandas.pydata.org/pandas-docs/stable/10min.html

This isn't a pandas question, so it seems excessive to import such a heavy library for a simple operation like finding the mean.

darch · Accepted Answer · 2015-02-16 22:37:58Z

5

I had a similar question to solve in a Udacity´s problems. Instead of a built-in function i coded:

def list_mean(n): summing = float(sum(n)) count = float(len(n)) if n == []: return False return float(summing/count)

Much more longer than usual but for a beginner its quite challenging.

edited Feb 16, 2015 at 22:37

darch

4,3281 gold badge23 silver badges23 bronze badges

answered Feb 16, 2015 at 22:27

Paulo YC

612 silver badges4 bronze badges

4 Comments

wsysuper Over a year ago

Good. Every other answer didn't notice the empty list hazard!

kindall Over a year ago

Returning False (equivalent to the integer 0) is just about the worst possible way to handle this error. Better to catch the ZeroDivisionError and raise something better (perhaps ValueError).

MatTheWhale Over a year ago

@kindall how is a ValueError any better than a ZeroDivisionError? The latter is more specific, plus it seems a bit unnecessary to catch an arithmetic error only to re-throw a different one.

kindall Over a year ago

Because ZeroDivisionError is only useful if you know how the calculation is being done (i.e., that a division by the length of the list is involved). If you don't know that, it doesn't tell you what the problem is with the value you passed in. Whereas your new exception can include that more specific information.

Andres · Accepted Answer · 2016-07-17 02:10:06Z

5

as a beginner, I just coded this:

L = [15, 18, 2, 36, 12, 78, 5, 6, 9] total = 0 def average(numbers): total = sum(numbers) total = float(total) return total / len(numbers) print average(L)

edited Jul 17, 2016 at 2:10

Andres

4,5318 gold badges42 silver badges53 bronze badges

answered Jan 18, 2016 at 5:22

AlmoDev

9692 gold badges21 silver badges49 bronze badges

2 Comments

fralau Over a year ago

Bravo: IMHO, sum(l)/len(l) is by far the most elegant answer (no need to make type conversions in Python 3).

xilpex Over a year ago

There is no need to store the values in variables or use global variables.

Asclepius · Accepted Answer · 2022-02-10 03:08:07Z

If you wanted to get more than just the mean (aka average) you might check out scipy stats:

from scipy import stats l = [15, 18, 2, 36, 12, 78, 5, 6, 9] print(stats.describe(l)) # DescribeResult(nobs=9, minmax=(2, 78), mean=20.11111111111111, # variance=572.3611111111111, skewness=1.7791785448425341, # kurtosis=1.9422716419666397)

SingleNegationElimination · Accepted Answer · 2012-01-27 21:04:26Z

In order to use reduce for taking a running average, you'll need to track the total but also the total number of elements seen so far. since that's not a trivial element in the list, you'll also have to pass reduce an extra argument to fold into.

>>> l = [15, 18, 2, 36, 12, 78, 5, 6, 9] >>> running_average = reduce(lambda aggr, elem: (aggr[0] + elem, aggr[1]+1), l, (0.0,0)) >>> running_average[0] (181.0, 9) >>> running_average[0]/running_average[1] 20.111111111111111

Superpaul · Accepted Answer · 2015-09-08 05:24:57Z

Both can give you close to similar values on an integer or at least 10 decimal values. But if you are really considering long floating values both can be different. Approach can vary on what you want to achieve.

>>> l = [15, 18, 2, 36, 12, 78, 5, 6, 9] >>> print reduce(lambda x, y: x + y, l) / len(l) 20 >>> sum(l)/len(l) 20

Floating values

>>> print reduce(lambda x, y: x + y, l) / float(len(l)) 20.1111111111 >>> print sum(l)/float(len(l)) 20.1111111111

@Andrew Clark was correct on his statement.

Paul Rooney · Accepted Answer · 2020-07-28 22:02:23Z

suppose that

x = [ [-5.01,-5.43,1.08,0.86,-2.67,4.94,-2.51,-2.25,5.56,1.03], [-8.12,-3.48,-5.52,-3.78,0.63,3.29,2.09,-2.13,2.86,-3.33], [-3.68,-3.54,1.66,-4.11,7.39,2.08,-2.59,-6.94,-2.26,4.33] ]

you can notice that x has dimension 3*10 if you need to get the mean to each row you can type this

theMean = np.mean(x1,axis=1)

don't forget to import numpy as np

user1871712 · Accepted Answer · 2012-12-07 06:56:27Z

2

l = [15, 18, 2, 36, 12, 78, 5, 6, 9] l = map(float,l) print '%.2f' %(sum(l)/len(l))

edited Dec 7, 2012 at 6:56

answered Dec 4, 2012 at 5:47

user1871712

1751 silver badge4 bronze badges

1 Comment

Chris Koston Over a year ago

Inefficient. It converts all elements to float before adding them. It's faster to convert just the length.

Integraty_dev · Accepted Answer · 2019-06-13 09:04:47Z

Find the average in list By using the following PYTHON code:

l = [15, 18, 2, 36, 12, 78, 5, 6, 9] print(sum(l)//len(l))

try this it easy.

RussS · Accepted Answer · 2012-01-27 21:03:03Z

print reduce(lambda x, y: x + y, l)/(len(l)*1.0)

or like posted previously

sum(l)/(len(l)*1.0)

The 1.0 is to make sure you get a floating point division

reubano · Accepted Answer · 2016-01-12 13:32:15Z

Combining a couple of the above answers, I've come up with the following which works with reduce and doesn't assume you have L available inside the reducing function:

from operator import truediv L = [15, 18, 2, 36, 12, 78, 5, 6, 9] def sum_and_count(x, y): try: return (x[0] + y, x[1] + 1) except TypeError: return (x + y, 2) truediv(*reduce(sum_and_count, L)) # prints 20.11111111111111

Taylan · Accepted Answer · 2016-04-20 20:30:11Z

I want to add just another approach

import itertools,operator list(itertools.accumulate(l,operator.add)).pop(-1) / len(l)

Serhii Zelenchuk · Accepted Answer · 2022-06-03 19:04:48Z

Simple solution is a avemedi-lib

pip install avemedi_lib

Than include to your script

from avemedi_lib.functions import average, get_median, get_median_custom test_even_array = [12, 32, 23, 43, 14, 44, 123, 15] test_odd_array = [1, 2, 3, 4, 5, 6, 7, 8, 9] # Getting average value of list items print(average(test_even_array)) # 38.25 # Getting median value for ordered or unordered numbers list print(get_median(test_even_array)) # 27.5 print(get_median(test_odd_array)) # 27.5 # You can use your own sorted and your count functions a = sorted(test_even_array) n = len(a) print(get_median_custom(a, n)) # 27.5

Enjoy.

cottontail · Accepted Answer · 2023-12-09 02:27:37Z

Unlike statistics.mean(), statistics.fmean() works for a list of objects with different numeric types. For example:

from decimal import Decimal import statistics data = [1, 4.5, Decimal('3.5')] statistics.mean(data) # TypeError statistics.fmean(data) # OK

This is because under the hood, mean() uses statistics._sum() which returns a data type to convert the mean into (and Decimal is not on Python's number hierarchy), while fmean() uses math.fsum() which just adds the numbers up (which is also much faster than built-in sum() function).

One consequence of this is that fmean() always returns a float (because averaging involves division) while mean() could return a different type depending on the number types in the data. The following example shows that mean() can return different types while for the same lists, fmean() returns 3.0, a float for all of them.

statistics.mean([2, Fraction(4,1)]) # Fraction(3, 1) <--- fractions.Fraction statistics.mean([2, 4.0]) # 3.0 <--- float statistics.mean([2, 4]) # 3 <--- int

Also, unlike sum(data)/len(data), fmean() (and mean()) works not just on lists but on general iterables such as generators as well. This is useful, if your data is massive and/or you need to perform off-the-cuff filtering before computing the mean.

For example, if a list has NaN values averaging returns NaN. If you want to average the list while ignoring NaN values, you can filter out the NaN values and pass a generator to fmean:

data = [1, 2, float('nan')] statistics.fmean(x for x in data if x==x) # 1.5

Note that numpy has a function (numpy.nanmean()) that does the same job.

import numpy as np np.nanmean(data) # 1.5

Collectives™ on Stack Overflow

Finding the average of a list

24 Answers 24

8 Comments

3 Comments

5 Comments

3 Comments

2 Comments

Comments

5 Comments

Comments

3 Comments

Comments

1 Comment

4 Comments

2 Comments

Comments

1 Comment

Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

24 Answers 24

8 Comments

3 Comments

5 Comments

3 Comments

2 Comments

Comments

5 Comments

Comments

3 Comments

Comments

1 Comment

4 Comments

2 Comments

Comments

1 Comment

Comments

Comments

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Linked

Related