61

I have been curious about the difference between Collections.parallelStream() and Collections.stream().parallel(). According to the Javadocs, parallelStream() tries to return a parallel stream, whereas stream().parallel() returns a parallel stream. Through some testing of my own, I have found no differences. Where does the difference in these two methods lie? Is one implementation more time efficient than another? Thanks.

2
  • 25
    Short answer: no difference. Long answer: noooooooo difference. They're just aliases for each other. Commented May 5, 2017 at 18:13
  • I just find something , I do not know whether it represents the difference of them? please see stackoverflow.com/questions/44013553/… Commented May 17, 2017 at 0:24

3 Answers 3

43

Even if they act the same at the moment, there is a difference - at least in their documentation, as you correctly pointed out; that might be exploited in the future as far as I can tell.

At the moment the parallelStream method is defined in the Collection interface as:

default Stream<E> parallelStream() { return StreamSupport.stream(spliterator(), true); } 

Being a default method it could be overridden in implementations (and that's what Collections inner classes actually do).

That hints that even if the default method returns a parallel Stream, there could be Collections that override this method to return a non-parallel Stream. That is the reason the documentation is probably the way it is.

At the same time even if parallelStream returns a sequential stream - it is still a Stream, and then you could easily call parallel on it:

 Collections.some() .parallelStream() // actually sequential .parallel() // force it to be parallel 

At least for me, this looks weird.

It seems that the documentation should somehow state that after calling parallelStream there should be no reason to call parallel again to force that - since it might be useless or even bad for the processing.

EDIT

For anyone reading this - please read the comments by Holger also; it covers cases beyond what I said in this answer.

Sign up to request clarification or add additional context in comments.

7 Comments

It’s hard to imagine a scenario where it is reasonable that a stream source denies parallel processing for the stream, without knowing what actual operations will be chained. Perhaps, if it is rolling its own implementation of the Stream API, but then, it is in full control of what happens when .parallel() is called…
Well, an empty collection would never benefit from parallel processing, however, since there is no guaranty about the number of threads anyway, there is no harm in still letting isParallel() return true. Likewise, a stream over a singleton collection without a flatMap operation will never use a second thread, still, there is no harm in isParallel() returning true. This also applies to all streams whose underlying spliterator returns null in trySplit; without something like sorted or flatMap, it doesn’t matter whether you call .parallel(); it won’t have any effect.
Just thinking it to its end, if you have a source that doesn’t support any parallel processing, there is still no reason why thatSource .parallelStream() .sorted(). otherOps() should be forbidden to use parallel processing, once the internally used array has been populated, as at that point, the processing is entirely detached from the source. I have the feeling that the sentence stems from a time when the interaction between the source and the stream implementation was still in motion.
@Holger there are times I don't actually get your answers, but certainly try to come to those to revise them... by internal array you meant the sorted operation here that would copy the elements from the Spliterator to the Sink (array or ArrayList) right? Sorry to bring this up so late
Yes, exactly, the temporary storage used by sorted(), whether array or not, is independent from the source collection, so it can be processed in parallel, even if the source tries to deny it. Likewise, if a spliterator’s trySplit method returns null, the Stream implementation could turn to a buffering strategy like AbstractSpliterator.trySplit does. Actually, it would make more sense if the Stream did this instead of AbstractSpliterator
|
10

There is no difference between Collections.parallelStream() and Collections.stream().parallel(). They will both divide the stream to the extent that the underlying spliterator will allow, and they will both run using the default ForkJoinPool (unless already running inside another one).

Comments

4
class Employee { String name; int salary; public int getSalary() { return salary; } public void setSalary(int salary) { this.salary = salary; } public Employee(String name, int salary) { this.name = name; this.salary = salary; } } class ParallelStream { public static void main(String[] args) { long t1, t2; List<Employee> eList = new ArrayList<>(); for (int i = 0; i < 100; i++) { eList.add(new Employee("A", 20000)); eList.add(new Employee("B", 3000)); eList.add(new Employee("C", 15002)); eList.add(new Employee("D", 7856)); eList.add(new Employee("E", 200)); eList.add(new Employee("F", 50000)); } /***** Here We Are Creating A 'Sequential Stream' & Displaying The Result *****/ t1 = System.currentTimeMillis(); System.out.println("Sequential Stream Count?= " + eList.stream().filter(e -> e.getSalary() > 15000).count()); t2 = System.currentTimeMillis(); System.out.println("Sequential Stream Time Taken?= " + (t2 - t1) + "\n"); /***** Here We Are Creating A 'Parallel Stream' & Displaying The Result *****/ t1 = System.currentTimeMillis(); System.out.println("Parallel Stream Count?= " + eList.parallelStream().filter(e -> e.getSalary() > 15000).count()); t2 = System.currentTimeMillis(); System.out.println("Parallel Stream Time Taken?= " + (t2 - t1)); /***** Here We Are Creating A 'Parallel Stream with Collection.stream.parallel' & Displaying The Result *****/ t1 = System.currentTimeMillis(); System.out.println("stream().parallel() Count?= " + eList.stream().parallel().filter(e -> e.getSalary() > 15000).count()); t2 = System.currentTimeMillis(); System.out.println("stream().parallel() Time Taken?= " + (t2 - t1)); } } 

I had tried with all three ways .stream(),.parallelStream() and .stream().parallel(). with same number of records and able to identify timing taken by all three approach.

Here i had mentioned O/P of same.

Sequential Stream Count?= 300 Sequential Stream Time Taken?= 18 Parallel Stream Count?= 300 Parallel Stream Time Taken?= 6 stream().parallel() Count?= 300 stream().parallel() Time Taken?= 1 

I am not sure,but as mentioned in O/P time taken by stream().parallel() is 1/6th of parallelStream().

Still any experts suggestions are mostly welcome.

4 Comments

if you run the stream separately, could see almost same time for the streams.
@sagar-gangwal Try to change loop size from 100 to 5_000_000 then you will see another result like below: Sequential Stream Count?= 15000000 Sequential Stream Time Taken?= 102 Parallel Stream Count?= 15000000 Parallel Stream Time Taken?= 64 stream().parallel() Count?= 15000000 stream().parallel() Time Taken?= 97
Remember that Java Runtime uses JIT, so benchmarking things like this is a bit trickier.
loop size 100, 000 result 1st call:- Sequential Stream Count?= 300000 Sequential Stream Time Taken?= 52 Parallel Stream Count?= 300000 Parallel Stream Time Taken?= 40 stream().parallel() Count?= 300000 stream().parallel() Time Taken?= 29 2nd call:- Sequential Stream Count?= 300000 Sequential Stream Time Taken?= 25 Parallel Stream Count?= 300000 Parallel Stream Time Taken?= 47 stream().parallel() Count?= 300000 stream().parallel() Time Taken?= 37

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.