Unexpected parallelstream performance in Java 8

Question

I experienced a performance issue when using the stream created using the spliterator() over an Iterable. ie., like StreamSupport.stream(integerList.spliterator(), true). Wanted to prove this over a normal collection. Please see below some benchmark results.

Question: Why does the parallel stream created from an iterable much slower than the stream created from an ArrayList or an IntStream ?

From a range

 public void testParallelFromIntRange() { long start = System.nanoTime(); IntStream stream = IntStream.rangeClosed(1, Integer.MAX_VALUE).parallel(); System.out.println("Is Parallel: "+stream.isParallel()); stream.forEach(ParallelStreamSupportTest::calculate); long end = System.nanoTime(); System.out.println("ParallelStream from range Takes : " + TimeUnit.MILLISECONDS.convert((end - start), TimeUnit.NANOSECONDS) + " milli seconds"); }

Is Parallel: true
ParallelStream from range Takes : 490 milli seconds

From an Iterable

 public void testParallelFromIterable() { Set<Integer> integerList = ContiguousSet.create(Range.closed(1, Integer.MAX_VALUE), DiscreteDomain.integers()); long start = System.nanoTime(); Stream<Integer> stream = StreamSupport.stream(integerList.spliterator(), true); System.out.println("Is Parallel: " + stream.isParallel()); stream.forEach(ParallelStreamSupportTest::calculate); long end = System.nanoTime(); System.out.println("ParallelStream from Iterable Takes : " + TimeUnit.MILLISECONDS.convert((end - start), TimeUnit.NANOSECONDS) + " milli seconds"); }

Is Parallel: true
ParallelStream from Iterable Takes : 12517 milli seconds

And the so trivial calculate method.

public static Integer calculate(Integer input) { return input + 2; }

Does ContinguousSet actually have proper spliterator? Guava is not optimised for Java 8 whereas the JDK obviously is. — Boris the Spider
– Boris the Spider, Commented Oct 27, 2014 at 22:05
Does ContiguousSet.spliterator() return a sized spliterator, or one of unknown size? That might affect how the data is split among threads. You can use Spliterators.spliterator(Iterator, long, int) to create a spliterator from an iterator and the size of the set. — David Conrad
– David Conrad, Commented Oct 27, 2014 at 22:48
Where does an Iterable come into play in the second example? If there is one in there, note that an Iterable is inherently sequential, since the only access to elements is via hasNext/next. — Stuart Marks
– Stuart Marks, Commented Oct 28, 2014 at 0:02

Brian Goetz · Accepted Answer · 2014-10-28 03:31:49Z

Not all spliterators are created equally. One of the tasks of a spliterator is to decompose the source into two parts, that can be processed in parallel. A good spliterator will divide the source roughly in half (and will be able to continue to do so recursively.)

Now, imagine you are writing a spliterator for a source that is only described by an Iterator. What quality of decomposition can you get? Basically, all you can do is divide the source into "first" and "rest". That's about as bad as it gets. The result is a computation tree that is very "right-heavy".

The spliterator that you get from a data structure has more to work with; it knows the layout of the data, and can use that to give better splits, and therefore better parallel performance. The spliterator for ArrayList can always divide in half, and retains knowledge of exactly how much data is in each half. That's really good. The spliterator from a balanced tree can get good distribution (since each half of the tree has roughly half the elements), but isn't quite as good as the ArrayList spliterator because it doesn't know the exact sizes. The spliterator for a LinkedList is about as bad as it gets; all it can do is (first, rest). And the same for deriving a spliterator from an iterator.

Now, all is not necessarily lost; if the work per element is high, you can overcome bad splitting. But if you're doing a small amount of work per element, you'll be limited by the quality of splits from your spliterator.

Misha · Accepted Answer · 2014-10-28 09:23:37Z

There are several problems with your benchmark.

Stream<Integer> cannot be compared to IntStream because of boxing overhead.
You aren't doing anything with the result of the calculation, which makes it hard to know whether the code is actually being run
You are benchmarking with System.nanoTime instead of using a proper benchmarking tool.

Here's a JMH-based benchmark:

import com.google.common.collect.ContiguousSet; import com.google.common.collect.DiscreteDomain; import com.google.common.collect.Range; import java.util.stream.IntStream; import java.util.stream.Stream; import org.openjdk.jmh.annotations.Benchmark; import org.openjdk.jmh.runner.Runner; import org.openjdk.jmh.runner.RunnerException; import org.openjdk.jmh.runner.options.OptionsBuilder; public class Ranges { final static int SIZE = 10_000_000; @Benchmark public long intStream() { Stream<Integer> st = IntStream.rangeClosed(1, SIZE).boxed(); return st.parallel().mapToInt(x -> x).sum(); } @Benchmark public long contiguousSet() { ContiguousSet<Integer> cs = ContiguousSet.create(Range.closed(1, SIZE), DiscreteDomain.integers()); Stream<Integer> st = cs.stream(); return st.parallel().mapToInt(x -> x).sum(); } public static void main(String[] args) throws RunnerException { new Runner( new OptionsBuilder() .include(".*Ranges.*") .forks(1) .warmupIterations(5) .measurementIterations(5) .build() ).run(); } }

And the output:

Benchmark Mode Samples Score Score error Units b.Ranges.contiguousSet thrpt 5 13.540 0.924 ops/s b.Ranges.intStream thrpt 5 27.047 5.119 ops/s

So IntStream.range is about twice as fast as ContiguousSet, which is perfectly reasonable, given that ContiguousSet doesn't implement its own Spliterator and uses the default from Set

Thank you for the tip about JMH. I was not aware of its existence.

Collectives™ on Stack Overflow

Unexpected parallelstream performance in Java 8

2 Answers 2

Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Related