Why does IntStream.range(0, 100000).parallel().foreach take longer than normal for loop

Question

I am just starting to learn about the Streams and parallel in Java and I was wondering why a normal for loop takes less time than IntStream paralleled at adding items to an array.

package parallel; import java.util.stream.IntStream; public class Parallel { public static void main(String[] args) { final int[] intArray = new int[100000]; long startTime = System.currentTimeMillis(); IntStream.range(0, 100000).parallel().forEach(i -> intArray[i]=i); long endTime = System.currentTimeMillis(); System.out.println("Parallel time: " + (endTime-startTime)); final int[] intArray2 = new int[100000]; try { Thread.sleep(100); } catch (InterruptedException e) { // TODO Auto-generated catch block e.printStackTrace(); } startTime = System.currentTimeMillis(); for(int i = 0; i < 100000; i++){ intArray2[i] = i; } endTime = System.currentTimeMillis(); System.out.println("Non parallel time: " + (endTime-startTime)); } }

Getting results like this.

Parallel time: 110

Non parallel time: 7

First, benchmarks are notoriously difficult to perform accurately. And microbenchmarks are worse then useless. I will point out that your array has already been constructed and you loop and set a value while the parallel foreach doesn't start with a preinitialised value. — Elliott Frisch
– Elliott Frisch, Commented Nov 10, 2014 at 6:50
There are two different arrays being filled (intArray and intArray2), so both cases start from a "clean" array. — cello
– cello, Commented Nov 10, 2014 at 7:00
The problem is your measurement methodology, full stop. The biggest problem with this measurement approach is that it prints a number, which humans are tempted to interpret as having meaning. It does not. — Brian Goetz
– Brian Goetz, Commented Dec 19, 2019 at 17:06

Zombies · Accepted Answer · 2019-12-19 14:43:29Z

The operation you perform for each element is very simple, it's just an assignment, which is very fast. In the parallel version, you have a lot of overhead by starting the multiple threads that handle the operations. This alone will likely already take longer than what the very simple operation takes when applied non-parallel.

Also, in the non-parallel version, the values are written very linearly to the array, which CPU architecture already has single-threat/single-core optimizations for quite awhile, as do compilers and intermediary compilers (which convert code like C to assembly). In the parallel version though, you'll might get conflicts as each thread tries to write to the same array (although on different positions, but probably still on the same cache line), and as several threads access different parts of the array, you might also get cache misses which slow things down.

With a more expensive operation, the overhead of the parallel version becomes smaller compared to the total costs, which will in the end result in faster execution than the non-parallel case.

I doubt it would cause so much difference though - a flawed benchmark is probably part of the explanation too.
sure, it should be executed a lot of times after a decent JVM warmup and then averaged, as otherwise there will be too much random noise in the results. But even with a correct setup my guess would be that with the above code, the parallel-one would be still be slower (just based on personal experience, a correct benchmark would prove me right or wrong ;) ).
You suggest that parallel().forEach would distribute the work across threads interleaving i values, e.g. each of 4 threads doing i+=4 from different starting points. That would make the interface unusably bad for doing anything with an array; surely a sane implementation would break the range up into contiguous sub-ranges of i values so each thread only touches some of the cache lines. Thread startup overhead and flawed benchmarking (no warmup for the JVM, CPU frequency, or array page faults) is still enough to explain a big perf difference for a simple memset loop over only 400kB.

Collectives™ on Stack Overflow

Why does IntStream.range(0, 100000).parallel().foreach take longer than normal for loop

1 Answer 1

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Linked

Related