Split a List into chunks by an Element

Question

I have a collection of Objects (Pos) with this model :

public class Pos { private String beforeChangement; private String type; private String afterChangement; }

The list of objects is like this :

[ Pos(beforeChangement=Découvrez, type=VER, afterChangement=découvrir), Pos(beforeChangement=un, type=DET, afterChangement=un), Pos(beforeChangement=large, type=ADJ, afterChangement=large), Pos(beforeChangement=., type=SENT, afterChangement=.), Pos(beforeChangement=Livraison, type=NOM, afterChangement=livraison), Pos(beforeChangement=et, type=KON, afterChangement=et), Pos(beforeChangement=retour, type=NOM, afterChangement=retour), Pos(beforeChangement=., type=SENT, afterChangement=.), Pos(beforeChangement=achetez, type=VER, afterChangement=acheter), Pos(beforeChangement=gratuitement, type=ADV, afterChangement=gratuitement), Pos(beforeChangement=., type=SENT, afterChangement=.), Pos(beforeChangement=allez, type=VER, afterChangement=aller), Pos(beforeChangement=faites, type=VER, afterChangement=faire), Pos(beforeChangement=vite, type=ADV, afterChangement=vite), Pos(beforeChangement=chers, type=ADJ, afterChangement=cher), Pos(beforeChangement=clients, type=NOM, afterChangement=client)] Pos(beforeChangement=., type=SENT, afterChangement=.) ]

I want to split this List of Objects by the the field of beforeChangement or afterChangement == "." to have this format (A List of List) List<List<SOP>> :

[ [Pos(beforeChangement=Découvrez, type=VER, afterChangement=découvrir), Pos(beforeChangement=un, type=DET, afterChangement=un), Pos(beforeChangement=large, type=ADJ, afterChangement=large)], [Pos(beforeChangement=Livraison, type=NOM, afterChangement=livraison), Pos(beforeChangement=et, type=KON, afterChangement=et), Pos(beforeChangement=retour, type=NOM, afterChangement=retour)], [Pos(beforeChangement=achetez, type=VER, afterChangement=acheter), Pos(beforeChangement=gratuitement, type=ADV, afterChangement=gratuitement)], [Pos(beforeChangement=allez, type=VER, afterChangement=aller), Pos(beforeChangement=faites, type=VER, afterChangement=faire), Pos(beforeChangement=vite, type=ADV, afterChangement=vite), Pos(beforeChangement=chers, type=ADJ, afterChangement=cher), Pos(beforeChangement=clients, type=NOM, afterChangement=client)] ]

Is like performing an inverse flatMap to have a List of Array or List (Chunks) after splitting by a field of object that is the String "."

do you have any idea about how to do it using Streams ?

Thank you guys

Youcef LAIDANI · Accepted Answer · 2018-08-09 09:52:15Z

hmm, I would like to solve your problem using a simple loop like this :

List<List<Pos>> result = new ArrayList<>(); List<Pos> part = new ArrayList<>(); for(Pos pos : listPos){ if(pos.getBeforeChangement().equals(".") || pos.getAfterChangement().equals(".")){ result.add(part);//If the condition is correct then add the sub list to result list part = new ArrayList<>();// and reinitialize the sub-list } else { part.add(pos);// else just put the Pos object to the sub-list } } //Just in case the listPos not end with "." values then the last part should not be escaped if(!part.isEmpty()){ result.add(part); }

Note, the question is not clear enough your Object class is named SOP and the List of Object is Pos which one is correct, In my answer I based to public class Pos{..} instead of public class SOP{..}.

take a look at the Ideone demo

Thank you for your answer I made a correction Pos and SOP is the same .
There's a significant problem with this solution, namely when the list of Pos does not end with ., your code will skip the entire last sentence.
No @TomaszLinkowski check the outputs in The question, the Object which have . is not include in the result. also you can compare with the outputs of the Question and the Outputs of the demo mentioned in my answer
You're right that - according to the question - the objects containing . should be skipped (I did not notice it, and my answer does not do it). But what I mean is that if the input list did not end with a "dot"-object, your code is simply skipping the entire last sublist instead of either including such sublist or throwing an error. See this clone of your snippet, where I removed the last line from listPos. Note that "allez", "faites", "vite", "chers", "clients" is missing from the output.

Vlad Bochenin · Accepted Answer · 2018-08-09 10:01:07Z

with StreamEx library you can use groupRuns method to split list for list of lists.

For example:

List<List<Pos>> collect = StreamEx.of(originalList.stream()) .groupRuns((p1, p2) -> !(".".equals(p2.beforeChangement) || ".".equals(p2.afterChangement))) .collect(Collectors.toList());

Method groupRuns returns Stream of lists. In example above it are lists where first element with ..

You can filter out these elements later. For example using map method:

StreamEx.of(originalList.stream()) .groupRuns((p1, p2) -> !(".".equals(p2.beforeChangement) || ".".equals(p2.afterChangement))) // returns Stream of lists with '.' element .map(l -> l.stream() .filter(p -> !(".".equals(p.beforeChangement) || ".".equals(p.afterChangement))) //filter out element with '.' .collect(Collectors.toList())) .filter(l -> !l.isEmpty()) // filter out empty lists .collect(Collectors.toList());

As far as I understand, though, this code will place the dots in separate lists, right?
Instead of map + filter related to dots (and an extra collect there), I would simply use the following: .filter(l -> !isPeriod(l.get(0))) where boolean isPeriod(Pos pos) { return ".".equals(pos.beforeChangement) || ".".equals(pos.afterChangement); }
@TomaszLinkowski your example is filter out whole list if first element is '.'-element
Now that I read the code more carefully I understood that the periods are not placed into separate lists. I got confused because I thought the predicate in groupRuns is (p1, p2) -> !isPeriod(p1) && !isPeriod(p2) while in fact it is (p1, p2) -> !isPeriod(p2). This is a bit strange condition, though, because it means the periods go at the beginnings of the lists. However, you're right that my filtering proposal wouldn't work. Instead, I would change the predicate to (p1, p2) -> !isPeriod(p1), and then removed the last element from each list using peek if it matched isPeriod.

Tomasz Linkowski · Accepted Answer · 2018-08-09 10:18:17Z

Well, I would be conservative here, and I wouldn't use Streams (although it's possible).

The following snippet does what you need:

List<Pos> posList; List<List<Pos>> result = new ArrayList<>(); boolean startNewSentence = true; for (Pos pos : posList) { if (startNewSentence) { result.add(new ArrayList<>()); } startNewSentence = isPeriod(pos); if (!startNewSentence) { result.get(result.size() - 1).add(pos); } }

where:

boolean isPeriod(Pos pos) { return ".".equals(pos.beforeChangement()) || ".".equals(pos.afterChangement()); }

PS. Note there's no such word as "changement" in English. The noun from verb "change" is also "change".

@Dr.Mza I updated the code so that the elements with periods are not included in the result.

Dongfang Qu · Accepted Answer · 2018-08-09 09:08:58Z

0

Collectors.groupingBy() may help you.

edited Aug 9, 2018 at 9:08

answered Aug 9, 2018 at 9:01

Dongfang Qu

3512 silver badges11 bronze badges

Comments

Ashishkumar Singh · Accepted Answer · 2018-08-09 09:21:53Z

Let's say your object name for the list is SOP object is listSOP. Then

List<SOP> listSOP = new ArrayList<>(); .... populate your list. Map<String,List<SOP>> map = listSOP.stream().collect(Collectors.groupingBy(SOP::getBeforeChangement)

This should return a Map of type <String(BeforeChangement), List<SOP>>.

Here getBeforeChangement is the getter method in your SOP class which should return value of variable beforeChangement

This will not work: it will group together all the SOPs with the same beforeChangement instead of partitioning the original list into ordered sublists.

Collectives™ on Stack Overflow

Split a List into chunks by an Element

5 Answers 5

4 Comments

5 Comments

2 Comments

Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

4 Comments

5 Comments

2 Comments

Comments

1 Comment

Related