I have been curious about the difference between Collections.parallelStream() and Collections.stream().parallel(). According to the Javadocs, parallelStream() tries to return a parallel stream, whereas stream().parallel() returns a parallel stream. Through some testing of my own, I have found no differences. Where does the difference in these two methods lie? Is one implementation more time efficient than another? Thanks.
- 25Short answer: no difference. Long answer: noooooooo difference. They're just aliases for each other.Louis Wasserman– Louis Wasserman2017-05-05 18:13:50 +00:00Commented May 5, 2017 at 18:13
- I just find something , I do not know whether it represents the difference of them? please see stackoverflow.com/questions/44013553/…zhuguowei– zhuguowei2017-05-17 00:24:25 +00:00Commented May 17, 2017 at 0:24
3 Answers
Even if they act the same at the moment, there is a difference - at least in their documentation, as you correctly pointed out; that might be exploited in the future as far as I can tell.
At the moment the parallelStream method is defined in the Collection interface as:
default Stream<E> parallelStream() { return StreamSupport.stream(spliterator(), true); } Being a default method it could be overridden in implementations (and that's what Collections inner classes actually do).
That hints that even if the default method returns a parallel Stream, there could be Collections that override this method to return a non-parallel Stream. That is the reason the documentation is probably the way it is.
At the same time even if parallelStream returns a sequential stream - it is still a Stream, and then you could easily call parallel on it:
Collections.some() .parallelStream() // actually sequential .parallel() // force it to be parallel At least for me, this looks weird.
It seems that the documentation should somehow state that after calling parallelStream there should be no reason to call parallel again to force that - since it might be useless or even bad for the processing.
EDIT
For anyone reading this - please read the comments by Holger also; it covers cases beyond what I said in this answer.
7 Comments
Stream API, but then, it is in full control of what happens when .parallel() is called…isParallel() return true. Likewise, a stream over a singleton collection without a flatMap operation will never use a second thread, still, there is no harm in isParallel() returning true. This also applies to all streams whose underlying spliterator returns null in trySplit; without something like sorted or flatMap, it doesn’t matter whether you call .parallel(); it won’t have any effect.thatSource .parallelStream() .sorted(). otherOps() should be forbidden to use parallel processing, once the internally used array has been populated, as at that point, the processing is entirely detached from the source. I have the feeling that the sentence stems from a time when the interaction between the source and the stream implementation was still in motion.sorted operation here that would copy the elements from the Spliterator to the Sink (array or ArrayList) right? Sorry to bring this up so latesorted(), whether array or not, is independent from the source collection, so it can be processed in parallel, even if the source tries to deny it. Likewise, if a spliterator’s trySplit method returns null, the Stream implementation could turn to a buffering strategy like AbstractSpliterator.trySplit does. Actually, it would make more sense if the Stream did this instead of AbstractSpliterator…class Employee { String name; int salary; public int getSalary() { return salary; } public void setSalary(int salary) { this.salary = salary; } public Employee(String name, int salary) { this.name = name; this.salary = salary; } } class ParallelStream { public static void main(String[] args) { long t1, t2; List<Employee> eList = new ArrayList<>(); for (int i = 0; i < 100; i++) { eList.add(new Employee("A", 20000)); eList.add(new Employee("B", 3000)); eList.add(new Employee("C", 15002)); eList.add(new Employee("D", 7856)); eList.add(new Employee("E", 200)); eList.add(new Employee("F", 50000)); } /***** Here We Are Creating A 'Sequential Stream' & Displaying The Result *****/ t1 = System.currentTimeMillis(); System.out.println("Sequential Stream Count?= " + eList.stream().filter(e -> e.getSalary() > 15000).count()); t2 = System.currentTimeMillis(); System.out.println("Sequential Stream Time Taken?= " + (t2 - t1) + "\n"); /***** Here We Are Creating A 'Parallel Stream' & Displaying The Result *****/ t1 = System.currentTimeMillis(); System.out.println("Parallel Stream Count?= " + eList.parallelStream().filter(e -> e.getSalary() > 15000).count()); t2 = System.currentTimeMillis(); System.out.println("Parallel Stream Time Taken?= " + (t2 - t1)); /***** Here We Are Creating A 'Parallel Stream with Collection.stream.parallel' & Displaying The Result *****/ t1 = System.currentTimeMillis(); System.out.println("stream().parallel() Count?= " + eList.stream().parallel().filter(e -> e.getSalary() > 15000).count()); t2 = System.currentTimeMillis(); System.out.println("stream().parallel() Time Taken?= " + (t2 - t1)); } } I had tried with all three ways .stream(),.parallelStream() and .stream().parallel(). with same number of records and able to identify timing taken by all three approach.
Here i had mentioned O/P of same.
Sequential Stream Count?= 300 Sequential Stream Time Taken?= 18 Parallel Stream Count?= 300 Parallel Stream Time Taken?= 6 stream().parallel() Count?= 300 stream().parallel() Time Taken?= 1 I am not sure,but as mentioned in O/P time taken by stream().parallel() is 1/6th of parallelStream().
Still any experts suggestions are mostly welcome.
4 Comments
Sequential Stream Count?= 15000000 Sequential Stream Time Taken?= 102 Parallel Stream Count?= 15000000 Parallel Stream Time Taken?= 64 stream().parallel() Count?= 15000000 stream().parallel() Time Taken?= 97Sequential Stream Count?= 300000 Sequential Stream Time Taken?= 52 Parallel Stream Count?= 300000 Parallel Stream Time Taken?= 40 stream().parallel() Count?= 300000 stream().parallel() Time Taken?= 29 2nd call:- Sequential Stream Count?= 300000 Sequential Stream Time Taken?= 25 Parallel Stream Count?= 300000 Parallel Stream Time Taken?= 47 stream().parallel() Count?= 300000 stream().parallel() Time Taken?= 37