How to iterate over a list in chunks

Question

I have a Python script which takes as input a list of integers, which I need to work with four integers at a time. Unfortunately, I don't have control of the input, or I'd have it passed in as a list of four-element tuples. Currently, I'm iterating over it this way:

for i in range(0, len(ints), 4): # dummy op for example code foo += ints[i] * ints[i + 1] + ints[i + 2] * ints[i + 3]

It looks a lot like "C-think", though, which makes me suspect there's a more pythonic way of dealing with this situation. The list is discarded after iterating, so it needn't be preserved. Perhaps something like this would be better?

while ints: foo += ints[0] * ints[1] + ints[2] * ints[3] ints[0:4] = []

Still doesn't quite "feel" right, though. :-/

Update: With the release of Python 3.12, I've changed the accepted answer. For anyone who has not (or cannot) make the jump to Python 3.12 yet, I encourage you to check out the previous accepted answer or any of the other excellent, backwards-compatible answers below.

Related question: How do you split a list into evenly sized chunks in Python?

Your code does not work if the list size is not a multiple of four. — Pedro Henriques
– Pedro Henriques, Commented Jan 12, 2009 at 3:03
I'm extend()ing the list so that it's length is a multiple of four before it gets this far. — Ben Blank
– Ben Blank, Commented Jan 12, 2009 at 3:44
@ΤΖΩΤΖΙΟΥ — The questions are very similar, but not quite duplicate. It's "split into any number of chunks of size N" vs. "split into N chunks of any size". :-) — Ben Blank
– Ben Blank, Commented Jul 21, 2011 at 18:16
possible duplicate of How do you split a list into evenly sized chunks in Python? — dbr
– dbr, Commented Jun 23, 2012 at 15:23
Does this answer your question? How do you split a list into evenly sized chunks? — mkrieger1
– mkrieger1, Commented Apr 10, 2022 at 9:58

nosklo · Accepted Answer · 2022-07-27 06:50:24Z

624

def chunker(seq, size): return (seq[pos:pos + size] for pos in range(0, len(seq), size))

Works with any sequence:

text = "I am a very, very helpful text" for group in chunker(text, 7): print(repr(group),) # 'I am a ' 'very, v' 'ery hel' 'pful te' 'xt' print('|'.join(chunker(text, 10))) # I am a ver|y, very he|lpful text animals = ['cat', 'dog', 'rabbit', 'duck', 'bird', 'cow', 'gnu', 'fish'] for group in chunker(animals, 3): print(group) # ['cat', 'dog', 'rabbit'] # ['duck', 'bird', 'cow'] # ['gnu', 'fish']

edited Jul 27, 2022 at 6:50

user3064538

answered Jan 12, 2009 at 3:10

nosklo

224k58 gold badges300 silver badges299 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

jfs Over a year ago

@Carlos Crasborn's version works for any iterable (not just sequences as the above code); it is concise and probably just as fast or even faster. Though it might be a bit obscure (unclear) for people unfamiliar with itertools module.

Dror Over a year ago

Note that chunker returns a generator. Replace the return to: return [...] to get a list.

Alfe Over a year ago

Instead of writing a function building and then returning a generator, you could also write a generator directly, using yield: for pos in xrange(0, len(seq), size): yield seq[pos:pos + size]. I'm not sure if internally this would be handled any differently in any relevant aspect, but it might be even a tiny bit clearer.

apollov Over a year ago

Note this works only for sequences that support items access by index and won't work for generic iterators, because they may not support __getitem__ method.

nosklo Over a year ago

@smci the chunker() function above is a generator - it returns a generator expression

|

Mateen Ulhaq · Accepted Answer · 2023-06-30 06:00:06Z

440

Modified from the Recipes section of Python's itertools docs:

from itertools import zip_longest def grouper(iterable, n, fillvalue=None): args = [iter(iterable)] * n return zip_longest(*args, fillvalue=fillvalue)

Example

grouper('ABCDEFGHIJ', 3, 'x') # --> 'ABC' 'DEF' 'GHI' 'Jxx'

Note: on Python 2 use izip_longest instead of zip_longest.

edited Jun 30, 2023 at 6:00

Mateen Ulhaq

27.9k21 gold badges121 silver badges155 bronze badges

answered Jan 12, 2009 at 4:07

Craz

8,2362 gold badges25 silver badges17 bronze badges

21 Comments

Ben Blank Over a year ago

Finally got a chance to play around with this in a python session. For those who are as confused as I was, this is feeding the same iterator to izip_longest multiple times, causing it to consume successive values of the same sequence rather than striped values from separate sequences. I love it!

gotgenes Over a year ago

What's the best way to filter back out the fillvalue? ([item for item in items if item is not fillvalue] for items in grouper(iterable))?

anatoly techtonik Over a year ago

I suspect that the performance of this grouper recipe for 256k sized chunks will be very poor, because izip_longest will be fed 256k arguments.

LondonRob Over a year ago

In several places commenters say "when I finally worked out how this worked...." Maybe a bit of explanation is required. Particularly the list of iterators aspect.

CMCDragonkai Over a year ago

Is there a way to use this but without the None filling up the last chunk?

|

S.Lott · Accepted Answer · 2021-05-02 22:37:19Z

237

chunk_size = 4 for i in range(0, len(ints), chunk_size): chunk = ints[i:i+chunk_size] # process chunk of size <= chunk_size

edited May 2, 2021 at 22:37

user3064538

answered Jan 12, 2009 at 3:06

S.Lott

393k83 gold badges521 silver badges791 bronze badges

4 Comments

PlsWork Over a year ago

How does it behave if len(ints) is not a multiple of the chunkSize?

user3064538 Over a year ago

@AnnaVopureta chunk will have 1, 2 or 3 elements for the last batch of elements. See this question about why slice indices can be out of bounds.

Lou Over a year ago

Upvoted the solution that doesn't rely on itertools. It's nice to have a solution that works with Python out of the box.

Derek O Mar 13 at 17:55

great answer for its simplicity! it also works quite nicely for a dictionary/list comprehension: {f"chunk_{i}: ints[i:i+chunk_size] for i in range(0, len(ints), chunk_size)}

kafran · Accepted Answer · 2023-03-13 11:24:25Z

Since Python 3.8 you can use the walrus := operator and itertools.islice.

from itertools import islice list_ = [i for i in range(10, 100)] def chunker(it, size): iterator = iter(it) while chunk := list(islice(iterator, size)): print(chunk)

In [2]: chunker(list_, 10) [10, 11, 12, 13, 14, 15, 16, 17, 18, 19] [20, 21, 22, 23, 24, 25, 26, 27, 28, 29] [30, 31, 32, 33, 34, 35, 36, 37, 38, 39] [40, 41, 42, 43, 44, 45, 46, 47, 48, 49] [50, 51, 52, 53, 54, 55, 56, 57, 58, 59] [60, 61, 62, 63, 64, 65, 66, 67, 68, 69] [70, 71, 72, 73, 74, 75, 76, 77, 78, 79] [80, 81, 82, 83, 84, 85, 86, 87, 88, 89] [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]

ShadowRanger · Accepted Answer · 2023-05-18 13:56:34Z

As of Python 3.12, the itertools module gains a batched function that specifically covers iterating over batches of an input iterable, where the final batch may be incomplete (each batch is a tuple). Per the example code given in the docs:

>>> for batch in batched('ABCDEFG', 3): ... print(batch) ... ('A', 'B', 'C') ('D', 'E', 'F') ('G',)

Performance notes:

The implementation of batched, like all itertools functions to date, is at the C layer, so it's capable of optimizations Python level code cannot match, e.g.

On each pull of a new batch, it proactively allocates a tuple of precisely the correct size (for all but the last batch), instead of building the tuple up element by element with amortized growth causing multiple reallocations (the way a solution calling tuple on an islice does)
It only needs to look up the .__next__ function of the underlying iterator once per batch, not n times per batch (the way a zip_longest((iter(iterable),) * n)-based approach does)
The check for the end case is a simple C level NULL check (trivial, and required to handle possible exceptions anyway)
Handling the end case is a C goto followed by a direct realloc (no making a copy into a smaller tuple) down to the already known final size, since it's tracking how many elements it has successfully pulled (no complex "create sentinel for use as fillvalue and do Python level if/else checks for each batch to see if it's empty, with the final batch requiring a search for where the fillvalue appeared last, to create the cut-down tuple" required by zip_longest-based solutions).

Between all these advantages, it should massively outperform any Python-level solution (even highly optimized ones that push most or all of the per-item work to the C layer), regardless of whether the input iterable is long or short, and regardless of whether the batch size and the size of the final (possibly incomplete) batch (zip_longest-based solutions using guaranteed unique fillvalues for safety are the best possible solution for almost all cases when itertools.batched is not available, but they can suffer in pathological cases of "few large batches, with final batch mostly, not completely, filled", especially pre-3.10 when bisect can't be used to optimize slicing off the fillvalues from O(n) linear search down to O(log n) binary search, but batched avoids that search entirely, so it won't experience pathological cases at all).

It's good to see this functionality coming to the standard library! I'll have to make this the accepted answer once 3.12 releases and starts becoming widely available. 🙂
@BenBlank: It's released! And it works as advertised! Happy they finally made this a built-in, it's hard to optimize well for the general case with other tools at the Python layer.

Markus Jarderot · Accepted Answer · 2009-01-12 03:14:04Z

import itertools def chunks(iterable,size): it = iter(iterable) chunk = tuple(itertools.islice(it,size)) while chunk: yield chunk chunk = tuple(itertools.islice(it,size)) # though this will throw ValueError if the length of ints # isn't a multiple of four: for x1,x2,x3,x4 in chunks(ints,4): foo += x1 + x2 + x3 + x4 for chunk in chunks(ints,4): foo += sum(chunk)

Another way:

import itertools def chunks2(iterable,size,filler=None): it = itertools.chain(iterable,itertools.repeat(filler,size-1)) chunk = tuple(itertools.islice(it,size)) while len(chunk) == size: yield chunk chunk = tuple(itertools.islice(it,size)) # x2, x3 and x4 could get the value 0 if the length is not # a multiple of 4. for x1,x2,x3,x4 in chunks2(ints,4,0): foo += x1 + x2 + x3 + x4

+1 for using generators, seams like the most "pythonic" out of all suggested solutions
It's rather long and clumsy for something so easy, which isn't very pythonic at all. I prefer S. Lott's version
@zenazn: this will work on generator instances, slicing won't
In addition to working properly with generators and other non-sliceable iterators, the first solution also doesn't require a "filler" value if the final chunk is smaller than size, which is sometimes desirable.
Also +1 for generators. Other solutions require a len call and so don't work on other generators.

MSeifert · Accepted Answer · 2019-11-17 20:48:29Z

If you don't mind using an external package you could use iteration_utilities.grouper from iteration_utilties ¹. It supports all iterables (not just sequences):

from iteration_utilities import grouper seq = list(range(20)) for group in grouper(seq, 4): print(group)

which prints:

(0, 1, 2, 3) (4, 5, 6, 7) (8, 9, 10, 11) (12, 13, 14, 15) (16, 17, 18, 19)

In case the length isn't a multiple of the groupsize it also supports filling (the incomplete last group) or truncating (discarding the incomplete last group) the last one:

from iteration_utilities import grouper seq = list(range(17)) for group in grouper(seq, 4): print(group) # (0, 1, 2, 3) # (4, 5, 6, 7) # (8, 9, 10, 11) # (12, 13, 14, 15) # (16,) for group in grouper(seq, 4, fillvalue=None): print(group) # (0, 1, 2, 3) # (4, 5, 6, 7) # (8, 9, 10, 11) # (12, 13, 14, 15) # (16, None, None, None) for group in grouper(seq, 4, truncate=True): print(group) # (0, 1, 2, 3) # (4, 5, 6, 7) # (8, 9, 10, 11) # (12, 13, 14, 15)

Benchmarks

I also decided to compare the run-time of a few of the mentioned approaches. It's a log-log plot grouping into groups of "10" elements based on a list of varying size. For qualitative results: Lower means faster:

At least in this benchmark the iteration_utilities.grouper performs best. Followed by the approach of Craz.

The benchmark was created with simple_benchmark¹. The code used to run this benchmark was:

import iteration_utilities import itertools from itertools import zip_longest def consume_all(it): return iteration_utilities.consume(it, None) import simple_benchmark b = simple_benchmark.BenchmarkBuilder() @b.add_function() def grouper(l, n): return consume_all(iteration_utilities.grouper(l, n)) def Craz_inner(iterable, n, fillvalue=None): args = [iter(iterable)] * n return zip_longest(*args, fillvalue=fillvalue) @b.add_function() def Craz(iterable, n, fillvalue=None): return consume_all(Craz_inner(iterable, n, fillvalue)) def nosklo_inner(seq, size): return (seq[pos:pos + size] for pos in range(0, len(seq), size)) @b.add_function() def nosklo(seq, size): return consume_all(nosklo_inner(seq, size)) def SLott_inner(ints, chunk_size): for i in range(0, len(ints), chunk_size): yield ints[i:i+chunk_size] @b.add_function() def SLott(ints, chunk_size): return consume_all(SLott_inner(ints, chunk_size)) def MarkusJarderot1_inner(iterable,size): it = iter(iterable) chunk = tuple(itertools.islice(it,size)) while chunk: yield chunk chunk = tuple(itertools.islice(it,size)) @b.add_function() def MarkusJarderot1(iterable,size): return consume_all(MarkusJarderot1_inner(iterable,size)) def MarkusJarderot2_inner(iterable,size,filler=None): it = itertools.chain(iterable,itertools.repeat(filler,size-1)) chunk = tuple(itertools.islice(it,size)) while len(chunk) == size: yield chunk chunk = tuple(itertools.islice(it,size)) @b.add_function() def MarkusJarderot2(iterable,size): return consume_all(MarkusJarderot2_inner(iterable,size)) @b.add_arguments() def argument_provider(): for exp in range(2, 20): size = 2**exp yield size, simple_benchmark.MultiArgument([[0] * size, 10]) r = b.run()

¹ Disclaimer: I'm the author of the libraries iteration_utilities and simple_benchmark.

This module is great. Makes me feel like I'm cheating as a programmer. Thanks for posting it here.

bcoughlan · Accepted Answer · 2013-08-14 23:24:51Z

I needed a solution that would also work with sets and generators. I couldn't come up with anything very short and pretty, but it's quite readable at least.

def chunker(seq, size): res = [] for el in seq: res.append(el) if len(res) == size: yield res res = [] if res: yield res

List:

>>> list(chunker([i for i in range(10)], 3)) [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

Set:

>>> list(chunker(set([i for i in range(10)]), 3)) [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

Generator:

>>> list(chunker((i for i in range(10)), 3)) [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

teekarna · Accepted Answer · 2020-11-11 04:15:40Z

The more-itertools package has chunked method which does exactly that:

import more_itertools for s in more_itertools.chunked(range(9), 4): print(s)

Prints

[0, 1, 2, 3] [4, 5, 6, 7] [8]

chunked returns the items in a list. If you'd prefer iterables, use ichunked.

ShadowRanger · Accepted Answer · 2021-12-02 14:58:17Z

The ideal solution for this problem works with iterators (not just sequences). It should also be fast.

This is the solution provided by the documentation for itertools:

def grouper(n, iterable, fillvalue=None): #"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx" args = [iter(iterable)] * n return itertools.izip_longest(fillvalue=fillvalue, *args)

Using ipython's %timeit on my mac book air, I get 47.5 us per loop.

However, this really doesn't work for me since the results are padded to be even sized groups. A solution without the padding is slightly more complicated. The most naive solution might be:

def grouper(size, iterable): i = iter(iterable) while True: out = [] try: for _ in range(size): out.append(i.next()) except StopIteration: yield out break yield out

Simple, but pretty slow: 693 us per loop

The best solution I could come up with uses islice for the inner loop:

def grouper(size, iterable): it = iter(iterable) while True: group = tuple(itertools.islice(it, None, size)) if not group: break yield group

With the same dataset, I get 305 us per loop.

Unable to get a pure solution any faster than that, I provide the following solution with an important caveat: If your input data has instances of filldata in it, you could get wrong answer.

def grouper(n, iterable, fillvalue=None): #"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx" args = [iter(iterable)] * n # itertools.zip_longest on Python 3 for x in itertools.izip_longest(*args, fillvalue=fillvalue): if x[-1] is fillvalue: yield tuple(v for v in x if v is not fillvalue) else: yield x

I really don't like this answer, but it is significantly faster. 124 us per loop

You can reduce runtime for recipe #3 by ~10-15% by moving it to the C layer (omitting itertools imports; map must be Py3 map or imap): def grouper(n, it): return takewhile(bool, map(tuple, starmap(islice, repeat((iter(it), n))))). Your final function can be made less brittle by using a sentinel: get rid of the fillvalue argument; add a first line fillvalue = object(), then change the if check to if i[-1] is fillvalue: and the line it controls to yield tuple(v for v in i if v is not fillvalue). Guarantees no value in iterable can be mistaken for the filler value.
BTW, big thumbs up on #4. I was about to post my optimization of #3 as a better answer (performance-wise) than what had been posted so far, but with the tweak to make it reliable, resilient #4 runs over twice as fast as optimized #3; I did not expect a solution with Python level loops (and no theoretical algorithmic differences AFAICT) to win. I assume #3 loses due to the expense of constructing/iterating islice objects (#3 wins if n is relatively large, e.g. number of groups is small, but that's optimizing for an uncommon case), but I didn't expect it to be quite that extreme.
For #4, the first branch of the conditional is only ever taken on the last iteration (the final tuple). Instead of reconstituting the final tuple all over again, cache the modulo of the length of the original iterable at the top and use that to slice off the unwanted padding from izip_longest on the final tuple: yield i[:modulo]. Also, for the args variable, tuple it instead of a list: args = (iter(iterable),) * n. Shaves a few more clock cycles off. Last, if we ignore fillvalue and assume None, the conditional can become if None in i for even more clock cycles.
@Kumba: Your first suggestion assumes the input has known length. If it's an iterator/generator, not a collection with known length, there is nothing to cache. There's no real reason to use such an optimization anyway; you're optimizing the uncommon case (the last yield), while the common case is unaffected.

jfs · Accepted Answer · 2009-01-12 14:33:36Z

12

from itertools import izip_longest def chunker(iterable, chunksize, filler): return izip_longest(*[iter(iterable)]*chunksize, fillvalue=filler)

edited Jan 12, 2009 at 14:33

jfs

417k210 gold badges1k silver badges1.7k bronze badges

answered Jan 12, 2009 at 3:56

Pedro Henriques

1,7561 gold badge14 silver badges18 bronze badges

2 Comments

jfs Over a year ago

A readable way to do it is stackoverflow.com/questions/434287/…

mdmjsh Over a year ago

Note that in python 3 izip_longest is replaced by zip_longest

Community · Accepted Answer · 2017-05-23 12:18:14Z

Since nobody's mentioned it yet here's a zip() solution:

>>> def chunker(iterable, chunksize): ... return zip(*[iter(iterable)]*chunksize)

It works only if your sequence's length is always divisible by the chunk size or you don't care about a trailing chunk if it isn't.

Example:

>>> s = '1234567890' >>> chunker(s, 3) [('1', '2', '3'), ('4', '5', '6'), ('7', '8', '9')] >>> chunker(s, 4) [('1', '2', '3', '4'), ('5', '6', '7', '8')] >>> chunker(s, 5) [('1', '2', '3', '4', '5'), ('6', '7', '8', '9', '0')]

Or using itertools.izip to return an iterator instead of a list:

>>> from itertools import izip >>> def chunker(iterable, chunksize): ... return izip(*[iter(iterable)]*chunksize)

Padding can be fixed using @ΤΖΩΤΖΙΟΥ's answer:

>>> from itertools import chain, izip, repeat >>> def chunker(iterable, chunksize, fillvalue=None): ... it = chain(iterable, repeat(fillvalue, chunksize-1)) ... args = [it] * chunksize ... return izip(*args)

kriss · Accepted Answer · 2012-12-06 01:56:30Z

11

Similar to other proposals, but not exactly identical, I like doing it this way, because it's simple and easy to read:

it = iter([1, 2, 3, 4, 5, 6, 7, 8, 9]) for chunk in zip(it, it, it, it): print chunk >>> (1, 2, 3, 4) >>> (5, 6, 7, 8)

This way you won't get the last partial chunk. If you want to get (9, None, None, None) as last chunk, just use izip_longest from itertools.

answered Dec 6, 2012 at 1:56

kriss

24.3k17 gold badges104 silver badges120 bronze badges

5 Comments

Jean-François Fabre Over a year ago

can be improved with zip(*([it]*4))

kriss Over a year ago

@Jean-François Fabre: from a readability point of view I do not see it as an improvement. And it's also marginally slower. It's an improvement if you are golfing, which I'm not.

Jean-François Fabre Over a year ago

no I'm not golfing, but what if you have 10 arguments? I read that construct in some official page.but of course I can't seem to find it right now :)

kriss Over a year ago

@Jean-François Fabre: if I have 10 arguments, or a variable number of arguments, it's an option, but I'd rather write: zip(*(it,)*10)

Jean-François Fabre Over a year ago

right! that's what I read. not the list stuff that I've made up :)

Community · Accepted Answer · 2017-05-23 12:34:37Z

Another approach would be to use the two-argument form of iter:

from itertools import islice def group(it, size): it = iter(it) return iter(lambda: tuple(islice(it, size)), ())

This can be adapted easily to use padding (this is similar to Markus Jarderot’s answer):

from itertools import islice, chain, repeat def group_pad(it, size, pad=None): it = chain(iter(it), repeat(pad)) return iter(lambda: tuple(islice(it, size)), (pad,) * size)

These can even be combined for optional padding:

_no_pad = object() def group(it, size, pad=_no_pad): if pad == _no_pad: it = iter(it) sentinel = () else: it = chain(iter(it), repeat(pad)) sentinel = (pad,) * size return iter(lambda: tuple(islice(it, size)), sentinel)

Robert Rossney · Accepted Answer · 2009-01-12 04:17:03Z

4

If the list is large, the highest-performing way to do this will be to use a generator:

def get_chunk(iterable, chunk_size): result = [] for item in iterable: result.append(item) if len(result) == chunk_size: yield tuple(result) result = [] if len(result) > 0: yield tuple(result) for x in get_chunk([1,2,3,4,5,6,7,8,9,10], 3): print x (1, 2, 3) (4, 5, 6) (7, 8, 9) (10,)

edited Jan 12, 2009 at 4:17

answered Jan 12, 2009 at 3:19

Robert Rossney

97.3k24 gold badges150 silver badges218 bronze badges

4 Comments

Robert Rossney Over a year ago

(I think that MizardX's itertools suggestion is functionally equivalent to this.)

Robert Rossney Over a year ago

(Actually, on reflection, no I don't. itertools.islice returns an iterator, but it doesn't use an existing one.)

Valentas Over a year ago

It is nice and simple, but for some reason even without conversion to tuple 4-7 times slower than the accepted grouper method on iterable = range(100000000) & chunksize up to 10000.

Valentas Over a year ago

However, in general I would recommend this method, because the accepted one can be extremely slow when checking for last item is slow docs.python.org/3/library/itertools.html#itertools.zip_longest

Will · Accepted Answer · 2013-02-21 10:40:10Z

4

Using little functions and things really doesn't appeal to me; I prefer to just use slices:

data = [...] chunk_size = 10000 # or whatever chunks = [data[i:i+chunk_size] for i in xrange(0,len(data),chunk_size)] for chunk in chunks: ...

answered Feb 21, 2013 at 10:40

Will

76k43 gold badges177 silver badges256 bronze badges

2 Comments

n611x007 Over a year ago

nice but no good for an indefinite stream which has no known len. you can do a test with itertools.repeat or itertools.cycle.

n611x007 Over a year ago

Also, eats up memory because of using a [...for...] list comprehension to physically build a list instead of using a (...for...) generator expression which would just care about the next element and spare memory

Jean-François Fabre · Accepted Answer · 2018-12-31 22:31:47Z

Using map() instead of zip() fixes the padding issue in J.F. Sebastian's answer:

>>> def chunker(iterable, chunksize): ... return map(None,*[iter(iterable)]*chunksize)

Example:

>>> s = '1234567890' >>> chunker(s, 3) [('1', '2', '3'), ('4', '5', '6'), ('7', '8', '9'), ('0', None, None)] >>> chunker(s, 4) [('1', '2', '3', '4'), ('5', '6', '7', '8'), ('9', '0', None, None)] >>> chunker(s, 5) [('1', '2', '3', '4', '5'), ('6', '7', '8', '9', '0')]

This is better handled with itertools.izip_longest (Py2)/itertools.zip_longest (Py3); this use of map is doubly-deprecated, and not available in Py3 (you can't pass None as the mapper function, and it stops when the shortest iterable is exhausted, not the longest; it doesn't pad).

Tutul · Accepted Answer · 2014-09-01 12:44:40Z

One-liner, adhoc solution to iterate over a list x in chunks of size 4 -

for a, b, c, d in zip(x[0::4], x[1::4], x[2::4], x[3::4]): ... do something with a, b, c and d ...

John Mee · Accepted Answer · 2014-10-18 08:48:46Z

To avoid all conversions to a list import itertools and:

>>> for k, g in itertools.groupby(xrange(35), lambda x: x/10): ... list(g)

Produces:

... 0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] 1 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19] 2 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29] 3 [30, 31, 32, 33, 34] >>>

I checked groupby and it doesn't convert to list or use len so I (think) this will delay resolution of each value until it is actually used. Sadly none of the available answers (at this time) seemed to offer this variation.

Obviously if you need to handle each item in turn nest a for loop over g:

for k,g in itertools.groupby(xrange(35), lambda x: x/10): for i in g: # do what you need to do with individual items # now do what you need to do with the whole group

My specific interest in this was the need to consume a generator to submit changes in batches of up to 1000 to the gmail API:

 messages = a_generator_which_would_not_be_smart_as_a_list for idx, batch in groupby(messages, lambda x: x/1000): batch_request = BatchHttpRequest() for message in batch: batch_request.add(self.service.users().messages().modify(userId='me', id=message['id'], body=msg_labels)) http = httplib2.Http() self.credentials.authorize(http) batch_request.execute(http=http)

What if the list you are chunking is something other than a sequence of ascending integers?
@PaulMcGuire see groupby; given a function to describe order then elements of the iterable can be anything, right?
Yes, I'm familiar with groupby. But if messages were the letters "ABCDEFG", then groupby(messages, lambda x: x/3) would give you a TypeError (for trying to divide a string by an int), not 3-letter groupings. Now if you did groupby(enumerate(messages), lambda x: x[0]/3) you might have something. But you didn't say that in your post.

Alexey · Accepted Answer · 2019-03-27 16:55:34Z

Unless I misses something, the following simple solution with generator expressions has not been mentioned. It assumes that both the size and the number of chunks are known (which is often the case), and that no padding is required:

def chunks(it, n, m): """Make an iterator over m first chunks of size n. """ it = iter(it) # Chunks are presented as tuples. return (tuple(next(it) for _ in range(n)) for _ in range(m))

Greg Hewgill · Accepted Answer · 2009-01-12 02:59:16Z

In your second method, I would advance to the next group of 4 by doing this:

ints = ints[4:]

However, I haven't done any performance measurement so I don't know which one might be more efficient.

Having said that, I would usually choose the first method. It's not pretty, but that's often a consequence of interfacing with the outside world.

endolith · Accepted Answer · 2014-11-19 04:09:38Z

With NumPy it's simple:

ints = array([1, 2, 3, 4, 5, 6, 7, 8]) for int1, int2 in ints.reshape(-1, 2): print(int1, int2)

output:

1 2 3 4 5 6 7 8

frankish · Accepted Answer · 2017-07-20 15:20:06Z

I never want my chunks padded, so that requirement is essential. I find that the ability to work on any iterable is also requirement. Given that, I decided to extend on the accepted answer, https://stackoverflow.com/a/434411/1074659.

Performance takes a slight hit in this approach if padding is not wanted due to the need to compare and filter the padded values. However, for large chunk sizes, this utility is very performant.

#!/usr/bin/env python3 from itertools import zip_longest _UNDEFINED = object() def chunker(iterable, chunksize, fillvalue=_UNDEFINED): """ Collect data into chunks and optionally pad it. Performance worsens as `chunksize` approaches 1. Inspired by: https://docs.python.org/3/library/itertools.html#itertools-recipes """ args = [iter(iterable)] * chunksize chunks = zip_longest(*args, fillvalue=fillvalue) yield from ( filter(lambda val: val is not _UNDEFINED, chunk) if chunk[-1] is _UNDEFINED else chunk for chunk in chunks ) if fillvalue is _UNDEFINED else chunks

Antti Haapala · Accepted Answer · 2019-03-20 11:14:58Z

def chunker(iterable, n): """Yield iterable in chunk sizes. >>> chunks = chunker('ABCDEF', n=4) >>> chunks.next() ['A', 'B', 'C', 'D'] >>> chunks.next() ['E', 'F'] """ it = iter(iterable) while True: chunk = [] for i in range(n): try: chunk.append(next(it)) except StopIteration: yield chunk raise StopIteration yield chunk if __name__ == '__main__': import doctest doctest.testmod()

elhefe · Accepted Answer · 2012-11-12 02:22:56Z

Yet another answer, the advantages of which are:

1) Easily understandable
2) Works on any iterable, not just sequences (some of the above answers will choke on filehandles)
3) Does not load the chunk into memory all at once
4) Does not make a chunk-long list of references to the same iterator in memory
5) No padding of fill values at the end of the list

That being said, I haven't timed it so it might be slower than some of the more clever methods, and some of the advantages may be irrelevant given the use case.

def chunkiter(iterable, size): def inneriter(first, iterator, size): yield first for _ in xrange(size - 1): yield iterator.next() it = iter(iterable) while True: yield inneriter(it.next(), it, size) In [2]: i = chunkiter('abcdefgh', 3) In [3]: for ii in i: for c in ii: print c, print '' ...: a b c d e f g h

Update:
A couple of drawbacks due to the fact the inner and outer loops are pulling values from the same iterator:
1) continue doesn't work as expected in the outer loop - it just continues on to the next item rather than skipping a chunk. However, this doesn't seem like a problem as there's nothing to test in the outer loop.
2) break doesn't work as expected in the inner loop - control will wind up in the inner loop again with the next item in the iterator. To skip whole chunks, either wrap the inner iterator (ii above) in a tuple, e.g. for c in tuple(ii), or set a flag and exhaust the iterator.

Wilfred Hughes · Accepted Answer · 2014-02-20 11:45:37Z

def group_by(iterable, size): """Group an iterable into lists that don't exceed the size given. >>> group_by([1,2,3,4,5], 2) [[1, 2], [3, 4], [5]] """ sublist = [] for index, item in enumerate(iterable): if index > 0 and index % size == 0: yield sublist sublist = [] sublist.append(item) if sublist: yield sublist

Suor · Accepted Answer · 2014-06-04 20:13:23Z

You can use partition or chunks function from funcy library:

from funcy import partition for a, b, c, d in partition(4, ints): foo += a * b * c * d

These functions also has iterator versions ipartition and ichunks, which will be more efficient in this case.

You can also peek at their implementation.

Community · Accepted Answer · 2017-05-23 12:18:14Z

About solution gave by J.F. Sebastian here:

def chunker(iterable, chunksize): return zip(*[iter(iterable)]*chunksize)

It's clever, but has one disadvantage - always return tuple. How to get string instead?
Of course you can write ''.join(chunker(...)), but the temporary tuple is constructed anyway.

You can get rid of the temporary tuple by writing own zip, like this:

class IteratorExhausted(Exception): pass def translate_StopIteration(iterable, to=IteratorExhausted): for i in iterable: yield i raise to # StopIteration would get ignored because this is generator, # but custom exception can leave the generator. def custom_zip(*iterables, reductor=tuple): iterators = tuple(map(translate_StopIteration, iterables)) while True: try: yield reductor(next(i) for i in iterators) except IteratorExhausted: # when any of iterators get exhausted. break

Then

def chunker(data, size, reductor=tuple): return custom_zip(*[iter(data)]*size, reductor=reductor)

Example usage:

>>> for i in chunker('12345', 2): ... print(repr(i)) ... ('1', '2') ('3', '4') >>> for i in chunker('12345', 2, ''.join): ... print(repr(i)) ... '12' '34'

Not a critique meant for you to change your answer, but rather a comment: Code is a liability. The more code you write the more space you create for bugs to hide. From this point of view, rewriting zip instead of using the existing one seems not to be the best idea.

BallpointBen · Accepted Answer · 2017-06-30 02:05:12Z

I like this approach. It feels simple and not magical and supports all iterable types and doesn't require imports.

def chunk_iter(iterable, chunk_size): it = iter(iterable) while True: chunk = tuple(next(it) for _ in range(chunk_size)) if not chunk: break yield chunk

Andrey Cizov · Accepted Answer · 2017-07-06 22:20:50Z

Quite pythonic here (you may also inline the body of the split_groups function)

import itertools def split_groups(iter_in, group_size): return ((x for _, x in item) for _, item in itertools.groupby(enumerate(iter_in), key=lambda x: x[0] // group_size)) for x, y, z, w in split_groups(range(16), 4): foo += x * y + z * w

Collectives™ on Stack Overflow

How to iterate over a list in chunks

40 Answers 40

11 Comments

21 Comments

4 Comments

Comments

2 Comments

8 Comments

Benchmarks

1 Comment

Comments

Comments

4 Comments

2 Comments

1 Comment

5 Comments

1 Comment

4 Comments

2 Comments

1 Comment

Comments

3 Comments

Comments

Comments

Comments

Comments

Comments

Comments

1 Comment

Comments

1 Comment

Comments

Comments

Linked

Hot Network Questions