Speed difference for single line String concatenation

Question

So I've been lead to believe that using the "+" operator to append Strings on a single line was just as efficient as using a StringBuilder (and definitely much nicer on the eyes). Today though I was having some speed issues with a Logger that was appending variables and strings, it was using a "+" operator. So I made a quick test case and to my surprise found that using a StringBuilder was quicker!

The basics are I used the average of 20 runs for each number of appends, with 4 different methods(shown below).

Results, times (in milliseconds)

 # of Appends 10^1 10^2 10^3 10^4 10^5 10^6 10^7 StringBuilder(capacity) 0.65 1.25 2 11.7 117.65 1213.25 11570 StringBuilder() 0.7 1.2 2.4 12.15 122 1253.7 12274.6 "+" operator 0.75 0.95 2.35 12.35 127.2 1276.5 12483.4 String.format 4.25 13.1 13.25 71.45 730.6 7217.15 -

Graph of percentage Difference from the fastest algorithm.

% Difference in String timings

I've checked out the byte code, it's different for each string comparison method.

Here is what I'm using for the methods, and you can see the whole test class here.

public static String stringSpeed1(float a, float b, float c, float x, float y, float z){ StringBuilder sb = new StringBuilder(72).append("[").append(a).append(",").append(b).append(",").append(c).append("]["). append(x).append(",").append(y).append(",").append(z).append("]"); return sb.toString(); } public static String stringSpeed2(float a, float b, float c, float x, float y, float z){ StringBuilder sb = new StringBuilder().append("[").append(a).append(",").append(b).append(",").append(c).append("]["). append(x).append(",").append(y).append(",").append(z).append("]"); return sb.toString(); } public static String stringSpeed3(float a, float b, float c, float x, float y, float z){ return "["+a+","+b+","+c+"]["+x+","+y+","+z+"]"; } public static String stringSpeed4(float a, float b, float c, float x, float y, float z){ return String.format("[%f,%f,%f][%f,%f,%f]", a,b,c,x,y,z); }

I've now tried with floats, ints, and strings. All of which show more or less the same time difference.

Questions

The "+" operator is clearly not becoming the same byte code, and the time is very different from the optimal. So what gives?
The behavior of the algorithms betwen 100 and 10000 number of appends is very odd to me, so does anyone have an explanation?

When you run these, are you optimizing? Frequently operators and function calls, when not optimized, can behave oddly in memory and cause odd performance anomalies that are not there when properly optimized. — MobA11y
– MobA11y, Commented May 23, 2013 at 19:46
So I'm using the standard default compiling on my computer, Mac. java version "1.6.0_45". I'm not using any particular optimization flags — greedybuddha
– greedybuddha, Commented May 23, 2013 at 19:47

MobA11y · Accepted Answer · 2013-05-23 23:04:12Z

I did not like two things about your test case. First, you ran all the tests within the same process. When dealing with "large" (ambiguous I know), but when dealing with anything where how your proces interracts with memory is your primary concern, you should always benchmark in a separate run. Just the fact that we've spun up the garbage collect can effect results from earlier runs. The way you factored your results kind of confused me. What I did was take each on individual runs and knocked a zero off of the number of times I ran it. I also let it run for a number of "reps", timing each rep. Then printed out the number of milliseconds each run took. Here is my code:

import java.util.Random; public class blah { public static void main(String[] args){ stringComp(); } private static void stringComp() { int SIZE = 1000000; int NUM_REPS = 5; for(int j = 0; j < NUM_REPS; j++) { Random r = new Random(); float f; long start = System.currentTimeMillis(); for (int i=0;i<SIZE;i++){ f = r.nextFloat(); stringSpeed3(f,f,f,f,f,f); } System.out.print((System.currentTimeMillis() - start)); System.out.print(", "); } } public static String stringSpeed1(float a, float b, float c, float x, float y, float z){ StringBuilder sb = new StringBuilder(72).append("[").append(a).append(",").append(b).append(",").append(c).append("]["). append(x).append(",").append(y).append(",").append(z).append("]"); return sb.toString(); } public static String stringSpeed2(float a, float b, float c, float x, float y, float z){ StringBuilder sb = new StringBuilder().append("[").append(a).append(",").append(b).append(",").append(c).append("]["). append(x).append(",").append(y).append(",").append(z).append("]"); return sb.toString(); } public static String stringSpeed3(float a, float b, float c, float x, float y, float z){ return "["+a+","+b+","+c+"]["+x+","+y+","+z+"]"; } public static String stringSpeed4(float a, float b, float c, float x, float y, float z){ return String.format("[%f,%f,%f][%f,%f,%f]", a,b,c,x,y,z); } }

Now my results:

stringSpeed1(SIZE = 10000000): 11548, 11305, 11362, 11275, 11279 stringSpeed2(SIZE = 10000000): 12386, 12217, 12242, 12237, 12156 stringSpeed3(SIZE = 10000000): 12313, 12016, 12073, 12127, 12038 stringSpeed1(SIZE = 1000000): 1292, 1164, 1170, 1168, 1172 stringSpeed2(SIZE = 1000000): 1364, 1228, 1230, 1224, 1223 stringSpeed3(SIZE = 1000000): 1370, 1229, 1227, 1229, 1230 stringSpeed1(SIZE = 100000): 246, 115, 115, 116, 113 stringSpeed2(SIZE = 100000): 255, 122, 123, 123, 121 stringSpeed3(SIZE = 100000): 257, 123, 129, 124, 125 stringSpeed1(SIZE = 10000): 113, 25, 14, 13, 13 stringSpeed2(SIZE = 10000): 118, 23, 24, 16, 14 stringSpeed3(SIZE = 10000): 120, 24, 16, 17, 14 //This run SIZE is very interesting. stringSpeed1(SIZE = 1000): 55, 22, 8, 6, 4 stringSpeed2(SIZE = 1000): 54, 23, 7, 4, 3 stringSpeed3(SIZE = 1000): 58, 23, 7, 4, 4 stringSpeed1(SIZE = 100): 6, 6, 6, 6, 6 stringSpeed2(SIZE = 100): 6, 6, 5, 6, 6 stirngSpeed3(SIZE = 100): 8, 6, 7, 6, 6

As you can see from my results, on values that are in the "middle ranges" each consecutive rep gets faster. This, I believe, is explained by the JVM getting running and grabbing onto the memory it needs. As the "size" increases this effect is not allowed to take over, because there is too much memory for the garbage collector to let go of, and for the process to latch back onto. Also, when you're doing a "repetitive" benchmark like this, when most of your process can exist in lower levels of cache, rather than in RAM, your process is even more senstive to branch predictors. These are very smart, and would catch on to what your process is doing, and I imagine the JVM amplifies this. This also helps explain why the values on initial loops are slower, and why the way you were approaching benchmarking this was a poor solution. This is why I think your results for values that aren't "large" are skewed and seem odd. Then as the "memory footprint" of your benchmark increased this branch prediction has less effect (percentage wise) than the large strings you were appending being shifted around in RAM.

Simplified conclusion: Your results for "large" runs are reasonably valid, and seem similar to mine(though I still don't completely understand how you got your results, but the percentages seem to line up well in comparison). However, your results for smaller runs are not valid, because of the nature of your test.

I think you're onto something with the branch predictors, I figured it was the gc skewing the smaller runs, but I really love your drop from run 1,2 to 3... that says volumes to me. As for the larger runs, you are showing the same effect, the methods are not equivalent
You are postulating many assumptions here, most of which are definitively wrong. The most obvious reason for the speed-up for lower batch sizes is that the JVM starts with interpreting the byte code, after a few runs the byte code is compiled into native machine code (causing a one time penalty, but a significant speedup on subsequent runs) and after more runs, the JVM may even recompile the byte code several times using different optimization strategies based on collected run time statistics.
Also, I'm not sure what you mean by how I got my results. I just totaled the times for each repetition, similar to what you did but with +=, then divided by the number of reps at the end.
@jarnbjo: What I meant by the way you factored your results(poor wording) was in your percentages. I didn't understand your approach to presenting your results, I looked at it again and it makes more sense now!
@jarnbjo: I did not mean my hypothesis for my conclusions to be difinitive(note the "I believe"), I was sharing results and guessing as to the reason. I'm more of a C++ guy than a java guy. Your explanation seems reasonable, though I doubt branch prediction and memory management have nothing to do with it, and I did mention that the JVM was probably amplifying these effects, I just don't know the innards of the JVM to put that to more concise terms. It is likely a combination of the two explanations

jarnbjo · Accepted Answer · 2013-05-23 20:31:28Z

3

The Java Language Specification does not specify how the string concatenation is performed, but I doubt that your compiler does anything but the equivalent of:

new StringBuilder("["). append(a). append(","). append(b). append(","). append(c). append("]["). append(x). append(","). append(y). append(","). append(z). append("]"). toString();

You can use "javap -c ..." to decompile your class file and verify this.

If you measure any significant and repetitive difference in runtime between your methods, I would much rather assume that the garbage collector is running at different times, than that there is any actual, significant performance difference. Creating the StringBuilders with different initial capacities may of course have some impact, but it should be insignificant compared to the effort required to e.g. format the floats.

answered May 23, 2013 at 20:31

jarnbjo

34.4k7 gold badges75 silver badges95 bronze badges

6 Comments

greedybuddha Over a year ago

I have decompiled the code, and the "+" operator is different than the StringBuilder(), though I didnt try against StringBuilder("[");, ill look at that

jarnbjo Over a year ago

Can you add the output of javap to your question?

greedybuddha Over a year ago

I linked it in the question as well. gist.github.com/parnell/5c34f244e6f4588cc9ac. All of them are fairly similar, but the few differences seem to start causing noticeable differences at large volume runs.

jarnbjo Over a year ago

If I'm not completely wrong, based on the byte code, your stringSpeed5 method is implemented as StringBuilder sb = new StringBuilder("[")... ; return sb.toString(); The equivalent method with string concatenation would have been String s = "[" + ...; return s;. If you don't use a local variable, but implement the 5th method directly as return new StringBuilder("[")...toString(); you should get the same byte code.

greedybuddha Over a year ago

So that looks like the magic. Between the "new StringBuilder("[");" and returning directly that finally makes the byte code equivalent. This means two things to me. 1) the answer to the other question is still wrong, as the original poster assigns a new StringBuilder(capacity) and returns it, which doesn't lead to equivalent bytecode. 2) I'm very disappointed with the java compiler not being smart enough to make assign a new object and immediately returning it just returning it... I was sure that was something it automatically did.

|

Collectives™ on Stack Overflow

Speed difference for single line String concatenation

2 Answers 2

5 Comments

6 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

6 Comments

Linked

Related