263

I wrote myself a utility to break a list into batches of given size. I just wanted to know if there is already any apache commons util for this.

public static <T> List<List<T>> getBatches(List<T> collection,int batchSize){ int i = 0; List<List<T>> batches = new ArrayList<List<T>>(); while(i<collection.size()){ int nextInc = Math.min(collection.size()-i,batchSize); List<T> batch = collection.subList(i,i+nextInc); batches.add(batch); i = i + nextInc; } return batches; } 

Please let me know if there any existing utility already for the same.

3
  • 5
    Not sure this is off-topic. The question is not "what library does this" but "how can I do this with apache common utils". Commented Aug 16, 2018 at 12:29
  • 1
    @FlorianF I agree with you. This question and its answers are very useful, and it could be well saved with a small edit. It was a lazy action to close it hastily. Commented Sep 24, 2018 at 8:15
  • Found useful blog post with nice class and benchmarks here : e.printstacktrace.blog/… Commented Nov 20, 2019 at 11:10

23 Answers 23

376

Check out Lists.partition(java.util.List, int) from Google Guava:

Returns consecutive sublists of a list, each of the same size (the final list may be smaller). For example, partitioning a list containing [a, b, c, d, e] with a partition size of 3 yields [[a, b, c], [d, e]] -- an outer list containing two inner lists of three and two elements, all in the original order.

Sign up to request clarification or add additional context in comments.

5 Comments

link partition documentation and link code example
For apache common users, the function is also available: commons.apache.org/proper/commons-collections/apidocs/org/…
f you are working with a list I use the "Apache Commons Collections 4" library. It has a partition method in the ListUtils class: ... int targetSize = 100; List<Integer> largeList = ... List<List<Integer>> output = ListUtils.partition(largeList, targetSize); This method is adapted from code.google.com/p/guava-libraries
Thank you. I can't believe how hard this is to do in Java.
Checked both of these methods with the benchmark (via Guava and via AtomicInteger + groupingBy) Unexpected, but Guava`s way is the winner (up to 8 times faster)
133
+250

In case you want to produce a Java-8 stream of batches, you can try the following code:

public static <T> Stream<List<T>> batches(List<T> source, int length) { if (length <= 0) throw new IllegalArgumentException("length = " + length); int size = source.size(); if (size <= 0) return Stream.empty(); int fullChunks = (size - 1) / length; return IntStream.range(0, fullChunks + 1).mapToObj( n -> source.subList(n * length, n == fullChunks ? size : (n + 1) * length)); } public static void main(String[] args) { List<Integer> list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14); System.out.println("By 3:"); batches(list, 3).forEach(System.out::println); System.out.println("By 4:"); batches(list, 4).forEach(System.out::println); } 

Output:

By 3: [1, 2, 3] [4, 5, 6] [7, 8, 9] [10, 11, 12] [13, 14] By 4: [1, 2, 3, 4] [5, 6, 7, 8] [9, 10, 11, 12] [13, 14] 

3 Comments

How do I break, continue or return in this approach?
@Tyr1on In main(String[]), instead of Stream.forEach(Consumer) you use Stream.iterator() and make a basic for loop.
How can one apply this approach on the infinite stream? Like this one: Stream.generate(Math.random).
51

Use Apache Commons ListUtils.partition.

org.apache.commons.collections4.ListUtils.partition(final List<T> list, final int size) 

1 Comment

for (List<Screen> chunk : ListUtils.partition(bigList, 50)) { repository.batchUpdate(chunk); }
34

Here is a simple solution for Java 8+:

public static <T> Collection<List<T>> prepareChunks(List<T> inputList, int chunkSize) { AtomicInteger counter = new AtomicInteger(); return inputList.stream().collect(Collectors.groupingBy(it -> counter.getAndIncrement() / chunkSize)).values(); } 

1 Comment

Doesn't the documentation discourage using side effect operations in streams?
28

With Java 9 you can use IntStream.iterate() with hasNext condition. So you can simplify the code of your method to this:

public static <T> List<List<T>> getBatches(List<T> collection, int batchSize) { return IntStream.iterate(0, i -> i < collection.size(), i -> i + batchSize) .mapToObj(i -> collection.subList(i, Math.min(i + batchSize, collection.size()))) .collect(Collectors.toList()); } 

Using {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, the result of getBatches(numbers, 4) will be:

[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9]] 

Comments

23

Another approach is to use Collectors.groupingBy of indices and then map the grouped indices to the actual elements:

 final List<Integer> numbers = range(1, 12) .boxed() .collect(toList()); System.out.println(numbers); final List<List<Integer>> groups = range(0, numbers.size()) .boxed() .collect(groupingBy(index -> index / 4)) .values() .stream() .map(indices -> indices .stream() .map(numbers::get) .collect(toList())) .collect(toList()); System.out.println(groups); 

Output:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11]]

4 Comments

@Sebien This does work for the general case. The groupingBy is done on the elements of the IntStream.range, not the list elements. See e.g. ideone.com/KYBc7h.
what's range() function?
I used the range function from here : import static java.util.stream.IntStream.range;
Generics based solution based on this can be found here: stackoverflow.com/a/77929672/337666
12

I came up with this one:

private static <T> List<List<T>> partition(Collection<T> members, int maxSize) { List<List<T>> res = new ArrayList<>(); List<T> internal = new ArrayList<>(); for (T member : members) { internal.add(member); if (internal.size() == maxSize) { res.add(internal); internal = new ArrayList<>(); } } if (internal.isEmpty() == false) { res.add(internal); } return res; } 

1 Comment

Nice solution, you also can simplify internal.isEmpty() == false to !internal.isEmpty()
9

Here an example:

final AtomicInteger counter = new AtomicInteger(); final int partitionSize=3; final List<Object> list=new ArrayList<>(); list.add("A"); list.add("B"); list.add("C"); list.add("D"); list.add("E"); final Collection<List<Object>> subLists=list.stream().collect(Collectors.groupingBy (it->counter.getAndIncrement() / partitionSize)) .values(); System.out.println(subLists); 

Input: [A, B, C, D, E]

Output: [[A, B, C], [D, E]]

You can find examples here: https://e.printstacktrace.blog/divide-a-list-to-lists-of-n-size-in-Java-8/

Comments

6

The following example demonstrates chunking of a List:

package de.thomasdarimont.labs; import java.util.ArrayList; import java.util.Arrays; import java.util.HashMap; import java.util.List; import java.util.Map; public class SplitIntoChunks { public static void main(String[] args) { List<Integer> ints = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11); List<List<Integer>> chunks = chunk(ints, 4); System.out.printf("Ints: %s%n", ints); System.out.printf("Chunks: %s%n", chunks); } public static <T> List<List<T>> chunk(List<T> input, int chunkSize) { int inputSize = input.size(); int chunkCount = (int) Math.ceil(inputSize / (double) chunkSize); Map<Integer, List<T>> map = new HashMap<>(chunkCount); List<List<T>> chunks = new ArrayList<>(chunkCount); for (int i = 0; i < inputSize; i++) { map.computeIfAbsent(i / chunkSize, (ignore) -> { List<T> chunk = new ArrayList<>(); chunks.add(chunk); return chunk; }).add(input.get(i)); } return chunks; } } 

Output:

Ints: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] Chunks: [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11]] 

Comments

5

There was another question that was closed as being a duplicate of this one, but if you read it closely, it's subtly different. So in case someone (like me) actually wants to split a list into a given number of almost equally sized sublists, then read on.

I simply ported the algorithm described here to Java.

@Test public void shouldPartitionListIntoAlmostEquallySizedSublists() { List<String> list = Arrays.asList("a", "b", "c", "d", "e", "f", "g"); int numberOfPartitions = 3; List<List<String>> split = IntStream.range(0, numberOfPartitions).boxed() .map(i -> list.subList( partitionOffset(list.size(), numberOfPartitions, i), partitionOffset(list.size(), numberOfPartitions, i + 1))) .collect(toList()); assertThat(split, hasSize(numberOfPartitions)); assertEquals(list.size(), split.stream().flatMap(Collection::stream).count()); assertThat(split, hasItems(Arrays.asList("a", "b", "c"), Arrays.asList("d", "e"), Arrays.asList("f", "g"))); } private static int partitionOffset(int length, int numberOfPartitions, int partitionIndex) { return partitionIndex * (length / numberOfPartitions) + Math.min(partitionIndex, length % numberOfPartitions); } 

Comments

5

Similar to OP without streams and libs, but conciser:

public <T> List<List<T>> getBatches(List<T> collection, int batchSize) { List<List<T>> batches = new ArrayList<>(); for (int i = 0; i < collection.size(); i += batchSize) { batches.add(collection.subList(i, Math.min(i + batchSize, collection.size()))); } return batches; } 

Comments

4

Using various cheats from the web, I came to this solution:

int[] count = new int[1]; final int CHUNK_SIZE = 500; Map<Integer, List<Long>> chunkedUsers = users.stream().collect( Collectors.groupingBy( user -> { count[0]++; return Math.floorDiv( count[0], CHUNK_SIZE ); } ) ); 

We use count to mimic a normal collection index.
Then, we group the collection elements in buckets, using the algebraic quotient as bucket number.
The final map contains as key the bucket number, as value the bucket itself.

You can then easily do an operation on each of the buckets with:

chunkedUsers.values().forEach( ... ); 

1 Comment

Could use an AtomicInteger for count.
4

You can use below code to get the batch of list.

Iterable<List<T>> batchIds = Iterables.partition(list, batchSize); 

You need to import Google Guava library to use above code.

Comments

4

Note that List#subList() returns a view of the underlying collection, which can result in unexpected consequences when editing the smaller lists - the edits will reflect in the original collection or may throw ConcurrentModificationException.

Comments

2
List<T> batch = collection.subList(i,i+nextInc); -> List<T> batch = collection.subList(i, i = i + nextInc); 

Comments

2

Here's a solution using vanilla java and the super secret modulo operator :)

Given the content/order of the chunks doesn't matter, this would be the easiest approach. (When preparing stuff for multi-threading it usually doesn't matter, which elements are processed on which thread for example, just need an equal distribution).

public static <T> List<T>[] chunk(List<T> input, int chunkCount) { List<T>[] chunks = new List[chunkCount]; for (int i = 0; i < chunkCount; i++) { chunks[i] = new LinkedList<T>(); } for (int i = 0; i < input.size(); i++) { chunks[i % chunkCount].add(input.get(i)); } return chunks; } 

Usage:

 List<String> list = Arrays.asList("a", "b", "c", "d", "e", "f", "g", "h", "i", "j"); List<String>[] chunks = chunk(list, 4); for (List<String> chunk : chunks) { System.out.println(chunk); } 

Output:

[a, e, i] [b, f, j] [c, g] [d, h] 

Comments

2

Below solution using Java 8 Streams:

 //Sample Input List<String> input = new ArrayList<String>(); IntStream.range(1,999).forEach((num) -> { input.add(""+num); }); //Identify no. of batches int BATCH_SIZE = 10; int multiples = input.size() / BATCH_SIZE; if(input.size()%BATCH_SIZE!=0) { multiples = multiples + 1; } //Process each batch IntStream.range(0, multiples).forEach((indx)->{ List<String> batch = input.stream().skip(indx * BATCH_SIZE).limit(BATCH_SIZE).collect(Collectors.toList()); System.out.println("Batch Items:"+batch); }); 

Comments

2

if someone is looking for Kotlin version, here is

list.chunked(size) 

or

list.windowed(size) 

once had an interview question and I wrote below one =D

fun <T> batch(list: List<T>, limit: Int): List<List<T>> { val result = ArrayList<List<T>>() var batch = ArrayList<T>() for (i in list) { batch.add(i) if (batch.size == limit) { result.add(batch) batch = ArrayList() } } if (batch.isNotEmpty()) { result.add(batch) } return result } 

2 Comments

Upvoted for the chunked.
Pointing out that chunked/windowed is not lazy and isn't backed up by the original list (Compared to Lists.partition). It will immediately create a new list with all the many inner lists.
0

Another approach to solve this, question:

public class CollectionUtils { /** * Splits the collection into lists with given batch size * @param collection to split in to batches * @param batchsize size of the batch * @param <T> it maintains the input type to output type * @return nested list */ public static <T> List<List<T>> makeBatch(Collection<T> collection, int batchsize) { List<List<T>> totalArrayList = new ArrayList<>(); List<T> tempItems = new ArrayList<>(); Iterator<T> iterator = collection.iterator(); for (int i = 0; i < collection.size(); i++) { tempItems.add(iterator.next()); if ((i+1) % batchsize == 0) { totalArrayList.add(tempItems); tempItems = new ArrayList<>(); } } if (tempItems.size() > 0) { totalArrayList.add(tempItems); } return totalArrayList; } } 

Comments

0

A one-liner in Java 8 would be:

import static java.util.function.Function.identity; import static java.util.stream.Collectors.*; private static <T> Collection<List<T>> partition(List<T> xs, int size) { return IntStream.range(0, xs.size()) .boxed() .collect(collectingAndThen(toMap(identity(), xs::get), Map::entrySet)) .stream() .collect(groupingBy(x -> x.getKey() / size, mapping(Map.Entry::getValue, toList()))) .values(); } 

Comments

0

Solution with generics based on answer from here https://stackoverflow.com/a/41500804/337666Adrean

import static java.util.stream.Collectors.groupingBy; import static java.util.stream.IntStream.range; import java.util.Collection; import java.util.List; import java.util.stream.Collectors; public class CollectionUtils { public static <T> List<List<T>> partition(Collection<T> input, int size) { if (size <= 0) { throw new IllegalArgumentException("Invalid batch size of: " + size + ". Size should be greater than zero"); } @SuppressWarnings("unchecked") T[] inputArray = (T[]) input.toArray(); return range(0, input.size()) .boxed() .collect(groupingBy(index -> index / size)) .values() .stream() .map(indices -> indices .stream() .map(x -> inputArray[x]) .collect(Collectors.toList())) .collect(Collectors.toList()); } } 

Comments

0

Since java 22 with preview features enabled, you can now use the Gatherer API:

List<String> yourList = ... List<List<String>> chunks = yourList .stream() .gather(Gatherers.windowFixed(10)) .toList(); 

Comments

-1

import com.google.common.collect.Lists;

List<List<T>> batches = Lists.partition(List<T>,batchSize)

Use Lists.partition(List,batchSize). You need to import Lists from google common package (com.google.common.collect.Lists)

It will return List of List<T> with and the size of every element equal to your batchSize.

1 Comment

You can also use their own subList(startIndex, endIndex) method for breaking list based on required index.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.