2

I have these 2 codes in Java and C++, which are supposed to do the same thing.

My intuition was that the size (and also content) of object code would be the same for R1 and R2. It's the case for the C++ (difference of 4 bytes if compiled without -O1). There is a bigger difference for the Java bytecode (R2 is longer), which is surprising for me.

Maybe I'm not looking at the right things and my question might not be relevant, but is it normal that the Java bytecode is so "close" from the source code, and does it mean that it's always more "efficient"/"optimized" to write everything in one-line instead of using local variables ?

C++

int A(int a) { return 0; } int B(int b) { return 0; } int C(int c) { return 0; } int D(int d) { return 0; } int R1() { return A(B(C(3)+D(3))); } int R2() { int d = D(3); int c = C(3); int b = B(c + d); return A(b); } // Then R1() and R2() are called in the main() 

Java

class MyClass { static int A(int a) { return 0; } static int B(int b) { return 0; } static int C(int c) { return 0; } static int D(int d) { return 0; } static int R1() { return A(B(C(3)+D(3))); } static int R2() { int d = D(3); int c = C(3); int b = B(c + d); return A(b); } // Then R1 and R2 are called in the Main() } 

When I compiled both of them (g++ -O1 version 9.4 and javac version 11.0.17), and disassemble R1 and R2, I get this:

C++ (g++ -O1 prog.cpp)

R1: 1251: f3 0f 1e fa endbr64 1255: 53 push %rbx 1256: bf 03 00 00 00 mov $0x3,%edi 125b: e8 9d ff ff ff callq 11fd <_Z1Ci> 1260: 89 c3 mov %eax,%ebx 1262: bf 03 00 00 00 mov $0x3,%edi 1267: e8 bb ff ff ff callq 1227 <_Z1Di> 126c: 8d 3c 03 lea (%rbx,%rax,1),%edi 126f: e8 5f ff ff ff callq 11d3 <_Z1Bi> 1274: 89 c7 mov %eax,%edi 1276: e8 2e ff ff ff callq 11a9 <_Z1Ai> 127b: 5b pop %rbx 127c: c3 retq R2: <exact same as R1> 

Java (javap -c MyClass)

javap -c Appel static int R1(); Code: 0: iconst_3 1: invokestatic #8 // Method C:(I)I 4: iconst_3 5: invokestatic #9 // Method D:(I)I 8: iadd 9: invokestatic #10 // Method B:(I)I 12: invokestatic #11 // Method A:(I)I 15: ireturn static int R2(); Code: 0: iconst_3 1: invokestatic #9 // Method D:(I)I 4: istore_0 5: iconst_3 6: invokestatic #8 // Method C:(I)I 9: istore_1 10: iload_1 11: iload_0 12: iadd 13: invokestatic #10 // Method B:(I)I 16: istore_2 17: iload_2 18: invokestatic #11 // Method A:(I)I 21: ireturn 
3
  • 1
    Unrelated: Amazing what a difference a few compiler versions can make: godbolt.org/z/z55x5M13P Commented Jan 31, 2023 at 21:30
  • @user4581301 That has nothing to do with the compiler version. GCC 9.4 produces the exact same output. It only calls the functions if they are not defined in the same compilation unit. Commented Jan 31, 2023 at 21:33
  • A fair point. Remove the ability to see what A and friends do and the output is identical. Well, almost identical. Good enough for me. Commented Jan 31, 2023 at 21:35

2 Answers 2

7

No. The detail you're missing is where the optimisation happens.

In C-land, the application that reads in your source code (gcc for example) is the one that does most of the optimization (though more and more is done by the CPU itself, in its pipeline and microcode translation engines - not that there's a heck of a lot you can do to affect this as a programmer). Hence, it's that application (gcc) that has an -o (optimization level) option, and that is the application that potentially is going to churn through a ton of CPU cycles analysing your code to death.

In java-land, that is not how it works. javac is on rails: The spec decrees exactly what it should generate, pretty much down to the byte - in contrast to C-land where the spec is stacked to the gills with 'mays' and 'coulds' - compilers have a ton of leeway, notably including the bit width of your basic 'word', whereas java locks all of that down in spec, regardless of the bit-width of the CPU architecture you end up running on.

The optimization is done by java.exe - the runtime. And its approach is, as a rule, more efficient than C can ever be, because unlike C, the runtime gets the benefit of being able to check 'live behaviour', not something a C compiler can do (that's why C compilers tend to have a lot of 'hinting' systems, where you can inform the compiler about what you suspect the runtime behaviour is likely to be).

All modern JVMs work by running code inefficiently (that seemingly inefficient code that javac produced, which you notices with javap -v), and in fact run it even more inefficiently than that, as the JVM will add a bunch of hooks to add bookkeeping. For example, the JVM will track how often a method is invoked, how long it takes to run, and for example, for each if, counts of how often each branch is taken. All making it run even slower.

The JVM does this, because for 99% (literally) of all programs out there, about 99% (again, literally, not an exaggeration) of CPU/memory is 'spent' on less than 1% of the code, but the trick is, trying to predict which 1% that is. With all that bookkeeping, java will know, and will then analyse the bytecode of that 1%, and gets to use the runtime behaviour observed so far due to all that bookkeeping, to come up with fine-tuned machine code. This means java gets to write code that branch predicts (ensures the machine code does not have to jump around for most often taken branch path), except it's not a prediction: java.exe knows which one is the 'most often taken path'. Vs gcc which has to guess, optionally assisted by the programmer with branch hints in the source file.

That's just one of thousands of places where java.exe can apply machine code optimization.

That still means ~99% of the code runs very inefficiently. But, given that this takes less than 1% of CPU/memory, it just doesn't matter.

Java is slower than C due to various factors, but 'optimizing instructions' is not one of them:

  • Java cannot use architecture-specific features that affect the underlying language model, such as 80-bit width variables in what used to be the coprocessor. Project Valhalla is trying to fix that.
  • More generally java has a hard time interfacing directly with arch/OS-local low-level API.
  • That bookkeeping, and the garbage collector, tend to have 'ramp up time'. They start off slow and become faster over time. Vs. C code which pretty much springs into existence running as fast as it ever will.

Naturally then, java is neither popular nor a good idea for simple command line one-off tools like ls or /bin/true. Java is fantastic at performance when writing, say, a web request responder. Those run for a long time and that hotspot process really helps there.

Sign up to request clarification or add additional context in comments.

4 Comments

There used to be server and client HotSpot engines, tweaked for fast throughput on servers and fast startup times on clients. Do they still exist?
For JVMs as per the letter of that word, no. However, if you squint your eyes a bit, android is a client-oriented JVM engine. GraalVM is a trick to get fast startup times, and the more modular nature of modern JVMs (including jlink to 'treeshake' the JVM down even further) is the answer. However, almost nobody uses these tools: Java is rarely used in places where 'startup' time matters. When your average major software vendor ships desktop apps in electron, I think we can safely conclude the world at large stopped caring.
@Queeg the separation between “client” and “server” has become obsolete, as both converged to the same implementation that will only differ in the defaults for some configuration options. E.g., how long the JVM will wait before compiling a method.
@rzwitserloot thanks for your detailed answer :) Do you know some articles/book/etc. that explains this in a deeper way ?
1

You are likely comparing apples and oranges.

While the C++ compiler produces machine code, the java compiler produces bytecode to run on a virtual machine. The virtual machine either interprets it or uses a just-in-time compiler to transfer it into machine language.

I doubt you used the JIT and looked at the produced machine code.

2 Comments

Absolutely, but my point was on "what is happening with Java" more that the comparison between both languges :)
What you are missing are all the optimization steps performed by the JIT. But you compare to an already-optimized C++ output.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.