Stream doesn't preserve the order after grouping

Question

I have a List name availableSeats I am sorting and grouping by the blockIndex property like below:

availableSeats.stream() .sorted(Comparator.comparing(SeatedTicketAssignment::getBlockIndex)) .collect(Collectors.groupingBy(SeatedTicketAssignment::getBlockIndex)) .forEach((block, blockAssignments) -> { //Rest of the code }

The problem is that the result of grouping by is not sorted by the blockIndex.

Rogue · Accepted Answer · 2017-09-05 22:44:55Z

50

Keep in mind, Collectors#groupingBy(Function) will return a HashMap, which does not guarantee order. If you want the ordering to be in the order that the aggregate identity (your i % 2 == 0 result) shows up, then you can use LinkedHashMap:

.collect(Collectors.groupingBy(i -> i % 2 == 0, LinkedHashMap::new, Collectors.toList()))

Would return a LinkedHashMap<Boolean, List<SeatedTicketAssignment>> (since your collector is grouped by a boolean). Additionally, since the list utilized by the collector is an ArrayList, it should preserve the iteration order of the stream relative to the list.

edited Sep 5, 2017 at 22:44

answered Aug 8, 2017 at 1:00

Rogue

11.5k6 gold badges51 silver badges74 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Holger Over a year ago

If the key type is Boolean, you can use partitioningBy and get a map that has an intrinsic order, however, where in the question does that key function appear? Since the order is that of the previous sorting (by the same value) step, removing the sort operation and collecting into a TreeMap would be much simpler…

Sviatlana Over a year ago

Thaks for saving my time.

Cristiano Over a year ago

@Holger, the list created by the third argument of groupingBy is supposed to preserve the entered order of the initial list?

Rogue Over a year ago

@Cristiano It returns an ArrayList, so relative to whatever order the stream was in before for a sequential stream.

Holger Over a year ago

@Cristiano the toList() collector preserves the order. But the final result is a Map containing up to two groups. When grouping into a LinkedHashMap, the map’s order depends on the first element. When grouping into a TreeMap, it will always be false, true. When using partitioningBy(i -> i % 2 == 0), the order also will always be false, true, but this has not been documented. Further, when using partitioningBy(i -> i % 2 == 0) both keys will always exist even when not appearing in the stream (then, with an empty list). This missing information has been added to the doc in Java 9

SergGr · Accepted Answer · 2017-08-08 01:07:54Z

Unfortunately Stream API implementation is not aware of the fact, that the stream you pass is already sorted by what you need so "grouping" is actually trivial. Thus it uses the default way which is essentially similar to this SO answer i.e. create a Map for groups and fill it with elements of the stream. And by default the Map implementation that is used is the HashMap (see code here) which is good for performance reasons but bad for your goal because HashMap doesn't preserve the order of the keys than first sorting.

It might seems a bit unlucky that Group By is implemented in Stream API only as a "collector" so you can't first group and then sort in a one-liner. But this seems intentional: there is no way to implement Group By without fully materializing result so it can't be lazy and thus has to be a collector. @Rogue provided a nice trick with LinkedHashMap but to me it is to bound to implementation details. Still I would write a few more lines of code and first group and then sort the entries of the list (i.e. actually grouped HashMap) by key. Most probably it would be faster.

Why sorting afterwards, when you can sort while grouping? Collectors.groupingBy(SeatedTicketAssignment::getBlockIndex, TreeMap::new, Collectors.toList()), if that’s not a one-liner, I don’t know…
@Holger, it really depends on the data. The issue is that TreeMap as any other sorting has O(N*log(N)) complexity against O(N) for HashMap. If groupping just reduce number of elements just 2 of 3 time TreeMap would probably win, but if we group in something like sqrt(N) buckets, first grouping and then sorting will be faster. Another point is that IMHO this as Rogue answer relies on implementation details too much which is bad (Example: would you guarantee, that the order will always be preserved if later someone change the code to parallel streams?). Thus I prefer explicit code.
Rogue’s answer does not rely on implementation details. Preserving the encounter order is guaranteed, regardless of whether you use a parallel stream or not. But his answer, using LinkedHashMap, still relies on a preceding sort operation, which not only has an O(n log n) worst case, but also needs a temporary O(n) storage operation before the collection. Plus the actual collect operation. So collecting directly into a TreeMap without a preceding sort operation will likely be faster.
As a fun fact, even if you don’t need a concurrent map here, if you are looking for a concurrent map that is intrinsically sorted, ConcurrentSkipListMap is there for you since Java 6…
groupingByConcurrent is an unordered collector, so it does not maintain the stream’s original encounter order, if there ever was one, but if you use ConcurrentSkipListMap as target, it will sort the inserted elements anyway and that sorted order is what you are interested in.

Matt Timmermans · Accepted Answer · 2017-08-08 01:18:36Z

Since the groupingBy collector does not require sorted input, you can sort the groups after collection. This will be faster than sorting items first, anyway, assuming that there are fewer groups than items:

availableSeats.stream() .collect(Collectors.groupingBy(SeatedTicketAssignment::getBlockIndex)) .entrySet().stream() .sorted(Comparator.comparing(Map.Entry::getKey)) .forEach(mapEntry -> { //Rest of the code }

Collectives™ on Stack Overflow

Stream doesn't preserve the order after grouping

3 Answers 3

5 Comments

11 Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

11 Comments

1 Comment

Linked

Related