95

I was surprised to see in the Java source that System.arraycopy is a native method.

Of course the reason is because it's faster. But what native tricks is the code able to employ that make it faster?

Why not just loop over the original array and copy each pointer to the new array - surely this isn't that slow and cumbersome?

6 Answers 6

88

In native code, it can be done with a single memcpy / memmove, as opposed to n distinct copy operations. The difference in performance is substantial.

Sign up to request clarification or add additional context in comments.

8 Comments

Actually, only some subcases of arraycopy could be implemented using memcpy / memmove. Others require a runtime type check for each element copied.
@Stephen C, interesting - why is that?
@Péter Török - consider copying from an Object[] populated with String objects to a String[]. See last paragraph of java.sun.com/javase/6/docs/api/java/lang/…
Peter, Object[] and byte[] + char[] are the most often copied ones, none of them requires an explicit type check. The compiler is smart enough NOT to check unless needed and virtually in 99.9% of the case it's not. Funny part is the small sized copies (less than a cache line) are quite dominant, so "memcpy" for small sized stuff being fast is truly important.
@jainilvachhani both memcpy and memmove are O(n), however cos of f.e. simd optimizations they are few times faster, so you may say they are O(n/x), where x is dependant on optimizations used in these functions
|
16

It can't be written in Java. Native code is able to ignore or elide the difference between arrays of Object and arrays of primitives. Java can't do that, at least not efficiently.

And it can't be written with a single memcpy(), because of the semantics required by overlapping arrays.

8 Comments

Fine, so memmove then. Although I don't think it makes much difference in context of this question.
Not memmove() either, see @Stephen C's comments on another answer.
Saw that already, since that happened to be my own answer ;-) But thanks anyway.
@Geek Arrays that overlap. If the source and target arrays and the same and only the offsets are different, the behaviour is carefully specified, and memcpy() does not comply.
It can't be written in Java? Couldn't one write one generic method to handle subclasses of Object, and then one for each of the primitive types?
|
12

It is, of course, implementation dependent.

HotSpot will treat it as an "intrinsic" and insert code at the call site. That is machine code, not slow old C code. This also means the problems with the signature of the method largely go away.

A simple copy loop is simple enough that obvious optimisations can be applied to it. For instance loop unrolling. Exactly what happens is again implementation dependent.

1 Comment

this is a very decent answer :), esp. the mentioning the intrinsics. w/o them simple iteration might be faster since it's usually unrolled anyways by the JIT
4

In my own tests System.arraycopy() for copying multiple dimension arrays is 10 to 20 times faster than interleaving for loops:

float[][] foo = mLoadMillionsOfPoints(); // result is a float[1200000][9] float[][] fooCpy = new float[foo.length][foo[0].length]; long lTime = System.currentTimeMillis(); System.arraycopy(foo, 0, fooCpy, 0, foo.length); System.out.println("native duration: " + (System.currentTimeMillis() - lTime) + " ms"); lTime = System.currentTimeMillis(); for (int i = 0; i < foo.length; i++) { for (int j = 0; j < foo[0].length; j++) { fooCpy[i][j] = foo[i][j]; } } System.out.println("System.arraycopy() duration: " + (System.currentTimeMillis() - lTime) + " ms"); for (int i = 0; i < foo.length; i++) { for (int j = 0; j < foo[0].length; j++) { if (fooCpy[i][j] != foo[i][j]) { System.err.println("ERROR at " + i + ", " + j); } } } 

This prints:

System.arraycopy() duration: 1 ms loop duration: 16 ms 

1 Comment

Even though this question is old, just for the record: This is NOT a fair benchmark (let alone the question if such a benchmark would make sense in the first place). System.arraycopy does a shallow copy (only the references to the inner float[]s are copied), whereas your nested for-loops performs a deep copy (float by float). A change to fooCpy[i][j] will be reflected in foo using System.arraycopy, but won't be using the nested for-loops.
4

There are a few reasons:

  1. The JIT is unlikely to generate as efficient low level code as a manually written C code. Using low level C can enable a lot of optimizations that are close to impossible to do for a generic JIT compiler.

    See this link for some tricks and speed comparisons of hand written C implementations (memcpy, but the principle is the same): Check this Optimizing Memcpy improves speed

  2. The C version is pretty much independant of the type and size of the array members. It is not possible to do the same in java since there is no way to get the array contents as a raw block of memory (eg. pointer).

7 Comments

Java code can get optimised. In fact what actually happens is machine code is generated which is more efficient than the C.
I agree that sometimes JITed code will be better localy optimised since it knows which processor it is runnimg on. However, since it is "just in time" it will never be able to use all those non-local optimisations that take longer to execute. Also, it will never be able to match the hand crafted C code (which could also take the processor in account and partially negate the JIT advantages, either by compilng for a specific processor or by some kind of runtime check).
I think that the Sun JIT compiler team would dispute many of those points. For instance, I believe that HotSpot does global optimization to remove unnecessary method dispatching, and there's no reason why a JIT cannot generate processor specific code. Then there is the point that a JIT compiler can do branch optimization based on the execution behavior of the current application run.
@Stephen C - excelent point about the branch optimisations, aldough you could also perform static performance profiling with C/C++ compilers to achieve the similar effect. I also think that the hotspot has 2 modes of operation - desktop applications will not use all of the available optimizations to achieve a reasonable startup time, while the server applications will be optimized more aggressively. All in all, you get some advantages, but you also loose some.
System.arrayCopy is not implemented using C, which sort of invalidates this answer
|
1

Late to the party. In my opinion, System.arraycopy is native mainly due to the distinction between primitive types and class types in Java. It's impossible to write a single method to deal with both, say int array, and String array.

Plus, System.arraycopy was introduced way before the pseudo generic type.

I find no one links a native implementation. In OpenJDK, the arraycopy Java method is implemented to eventually call copy_conjoint_atomic() which copies the arrary with a (while) loop, or call C++ memmove() if the element type is so-called primitive.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.