Why is Arrays.copyOf 2 times faster than System.arraycopy for small arrays?

Question

I was recently playing with some benchmarks and found very interesting results that I can't explain right now. Here is the benchmark:

@BenchmarkMode(Mode.Throughput) @Fork(1) @State(Scope.Thread) @Warmup(iterations = 10, time = 1, batchSize = 1000) @Measurement(iterations = 10, time = 1, batchSize = 1000) public class ArrayCopy { @Param({"1","5","10","100", "1000"}) private int size; private int[] ar; @Setup public void setup() { ar = new int[size]; for (int i = 0; i < size; i++) { ar[i] = i; } } @Benchmark public int[] SystemArrayCopy() { final int length = size; int[] result = new int[length]; System.arraycopy(ar, 0, result, 0, length); return result; } @Benchmark public int[] javaArrayCopy() { final int length = size; int[] result = new int[length]; for (int i = 0; i < length; i++) { result[i] = ar[i]; } return result; } @Benchmark public int[] arraysCopyOf() { final int length = size; return Arrays.copyOf(ar, length); } }

Result:

Benchmark (size) Mode Cnt Score Error Units ArrayCopy.SystemArrayCopy 1 thrpt 10 52533.503 ± 2938.553 ops/s ArrayCopy.SystemArrayCopy 5 thrpt 10 52518.875 ± 4973.229 ops/s ArrayCopy.SystemArrayCopy 10 thrpt 10 53527.400 ± 4291.669 ops/s ArrayCopy.SystemArrayCopy 100 thrpt 10 18948.334 ± 929.156 ops/s ArrayCopy.SystemArrayCopy 1000 thrpt 10 2782.739 ± 184.484 ops/s ArrayCopy.arraysCopyOf 1 thrpt 10 111665.763 ± 8928.007 ops/s ArrayCopy.arraysCopyOf 5 thrpt 10 97358.978 ± 5457.597 ops/s ArrayCopy.arraysCopyOf 10 thrpt 10 93523.975 ± 9282.989 ops/s ArrayCopy.arraysCopyOf 100 thrpt 10 19716.960 ± 728.051 ops/s ArrayCopy.arraysCopyOf 1000 thrpt 10 1897.061 ± 242.788 ops/s ArrayCopy.javaArrayCopy 1 thrpt 10 58053.872 ± 4955.749 ops/s ArrayCopy.javaArrayCopy 5 thrpt 10 49708.647 ± 3579.826 ops/s ArrayCopy.javaArrayCopy 10 thrpt 10 48111.857 ± 4603.024 ops/s ArrayCopy.javaArrayCopy 100 thrpt 10 18768.866 ± 445.238 ops/s ArrayCopy.javaArrayCopy 1000 thrpt 10 2462.207 ± 126.549 ops/s

So there are two strange things here:

Arrays.copyOf is 2 times faster than System.arraycopy for small arrays (1,5,10 size). However, on a large array of size 1000 Arrays.copyOf becomes almost 2 times slower. I know that both methods are intrinsics, so I would expect the same performance. Where does this difference come from?
Manual copy for a 1-element array is faster than System.arraycopy. It's not clear to me why. Does anybody know?

VM version: JDK 1.8.0_131, VM 25.131-b11

Since copyOf internally uses arraycopy, your benchmark is the problem. — Andreas
– Andreas, Commented Jun 11, 2017 at 18:42
@Andreas You are not right. Arrays.copyOf is JVM intrinsic. Once the method is JIT-compiled, Java code in Arrays.java is not executed at all. — apangin
– apangin, Commented Jun 11, 2017 at 19:54
@Andreas Do you see any particular problem with the benchmark? It wisely uses JMH framework to avoid common benchmarking pitfalls. — apangin
– apangin, Commented Jun 11, 2017 at 19:56
@apangin That's a distinction without a difference. The same applies to any Java code. That doesn't make any arbitrary method a 'JVM intrinsic'. — user207421
– user207421, Commented Jun 11, 2017 at 23:38
@EJP OK, I'll reword. JIT compiler does not look at the bytecode of Arrays.copyOf, because it has internal knowledge of what this method should do. — apangin
– apangin, Commented Jun 12, 2017 at 0:27

hagrawal7777 · Accepted Answer · 2021-11-10 06:28:59Z

Your System.arraycopy benchmark is not semantically equivalent to Arrays.copyOf.

It will be if you replace

 System.arraycopy(ar, 0, result, 0, length);

with

 System.arraycopy(ar, 0, result, 0, Math.min(ar.length, length));

With this change, the performance of both benchmarks will also become similar.

Why is the first variant slower then?

Without knowing how length relates to ar.length JVM needs to perform additional bounds check and be prepared to throw IndexOutOfBoundsException when length > ar.length.
This also breaks the optimization to eliminate redundant zeroing. You know, every allocated array must be initialized with zeros. However, JIT can avoid zeroing if it sees that the array is filled right after creation. But -prof perfasm clearly shows that the original System.arraycopy benchmark spends a significant amount of time clearing the allocated array:
```
 0,84% 0x000000000365d35f: shr $0x3,%rcx 0,06% 0x000000000365d363: add $0xfffffffffffffffe,%rcx 0,69% 0x000000000365d367: xor %rax,%rax 0x000000000365d36a: shl $0x3,%rcx 21,02% 0x000000000365d36e: rep rex.W stos %al,%es:(%rdi) ;*newarray 
```

Manual copy appeared faster for small arrays, because unlike System.arraycopy it does not perform any runtime calls to VM functions.

Aren't ar.length and length the same? What effect does Math.min(ar.length, length) have?
@EJP I'm not sure what compiler you mean, but HotSpot JIT compiler (actually, both of them - C1 and C2) definitely knows about System.arraycopy() and translates the call to a graph of IR nodes. Data-flow analysis then helps to reduce this graph.
@apangin thanks! Do you know why copyOf becomes slower on larger arrays?
@EJP Who made the term disappear? 'JIT' is still officially used at HotSpot website, bug tracker, OpenJDK mailing lists and in public presentations by Oracle engineers.

Collectives™ on Stack Overflow

Why is Arrays.copyOf 2 times faster than System.arraycopy for small arrays?

1 Answer 1

10 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

10 Comments

Linked

Related