Do modern C++ compilers inline functions which are called exactly once?

Question

As in, say my header file is:

class A { void Complicated(); }

And my source file

void A::Complicated() { ...really long function... }

Can I split the source file into

void DoInitialStuff(pass necessary vars by ref or value) { ... } void HandleCaseA(pass necessary vars by ref or value) { ... } void HandleCaseB(pass necessary vars by ref or value) { ... } void FinishUp(pass necessary vars by ref or value) { ... } void A::Complicated() { ... DoInitialStuff(...); switch ... HandleCaseA(...) HandleCaseB(...) ... FinishUp(...) }

Entirely for readability and without any fear of impact in terms of performance?

Maybe, maybe not. The compiler programmer might be your best bet, depedning upon what compiler you are using. — DumbCoder
– DumbCoder, Commented Aug 17, 2011 at 16:44
None of this even happens to be in a loop? Exactly how much time do you hope to gain from avoiding the overhead of a couple of function calls? A nanosecond? — UncleBens
– UncleBens, Commented Aug 17, 2011 at 16:45
Small functions that get called a lot benefit from inlining. A function that's sufficiently larger than the function call overhead will not benefit from inlining, so I wouldn't worry about it. — Gabe
– Gabe, Commented Aug 17, 2011 at 16:45
Declare your internal functions as static to give them file scope. They may be inlined even if you don't do this. But if they are not static, they will have to be exported, which means a non-inlined version will have to be generated even if it's never used. — Conspicuous Compiler
– Conspicuous Compiler, Commented Aug 17, 2011 at 16:47
@UncleBens: Yes it could be in a loop. I am just saying in the code the function would only be referenced once. — Cookie
– Cookie, Commented Aug 17, 2011 at 16:50

Steve Jessop · Accepted Answer · 2011-08-17 17:41:17Z

You should mark the functions static so that the compiler know they are local to that translation unit.

Without static the compiler cannot assume (barring LTO / WPA) that the function is only called once, so is less likely to inline it.

Demonstration using the LLVM Try Out page.

That said, code for readability first, micro-optimizations (and such tweaking is a micro-optimization) should only come after performance measures.

Example:

#include <cstdio> static void foo(int i) { int m = i % 3; printf("%d %d", i, m); } int main(int argc, char* argv[]) { for (int i = 0; i != argc; ++i) { foo(i); } }

Produces with static:

; ModuleID = '/tmp/webcompile/_27689_0.bc' target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64" target triple = "x86_64-unknown-linux-gnu" @.str = private constant [6 x i8] c"%d %d\00" ; <[6 x i8]*> [#uses=1] define i32 @main(i32 %argc, i8** nocapture %argv) nounwind { entry: %cmp4 = icmp eq i32 %argc, 0 ; <i1> [#uses=1] br i1 %cmp4, label %for.end, label %for.body for.body: ; preds = %for.body, %entry %0 = phi i32 [ %inc, %for.body ], [ 0, %entry ] ; <i32> [#uses=3] %rem.i = srem i32 %0, 3 ; <i32> [#uses=1] %call.i = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([6 x i8]* @.str, i64 0, i64 0), i32 %0, i32 %rem.i) nounwind ; <i32> [#uses=0] %inc = add nsw i32 %0, 1 ; <i32> [#uses=2] %exitcond = icmp eq i32 %inc, %argc ; <i1> [#uses=1] br i1 %exitcond, label %for.end, label %for.body for.end: ; preds = %for.body, %entry ret i32 0 } declare i32 @printf(i8* nocapture, ...) nounwind

Without static:

; ModuleID = '/tmp/webcompile/_27859_0.bc' target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64" target triple = "x86_64-unknown-linux-gnu" @.str = private constant [6 x i8] c"%d %d\00" ; <[6 x i8]*> [#uses=1] define void @foo(int)(i32 %i) nounwind { entry: %rem = srem i32 %i, 3 ; <i32> [#uses=1] %call = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([6 x i8]* @.str, i64 0, i64 0), i32 %i, i32 %rem) ; <i32> [#uses=0] ret void } declare i32 @printf(i8* nocapture, ...) nounwind define i32 @main(i32 %argc, i8** nocapture %argv) nounwind { entry: %cmp4 = icmp eq i32 %argc, 0 ; <i1> [#uses=1] br i1 %cmp4, label %for.end, label %for.body for.body: ; preds = %for.body, %entry %0 = phi i32 [ %inc, %for.body ], [ 0, %entry ] ; <i32> [#uses=3] %rem.i = srem i32 %0, 3 ; <i32> [#uses=1] %call.i = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([6 x i8]* @.str, i64 0, i64 0), i32 %0, i32 %rem.i) nounwind ; <i32> [#uses=0] %inc = add nsw i32 %0, 1 ; <i32> [#uses=2] %exitcond = icmp eq i32 %inc, %argc ; <i1> [#uses=1] br i1 %exitcond, label %for.end, label %for.body for.end: ; preds = %for.body, %entry ret i32 0 }

I think you ought to put them in an unnamed namespace, according to "principle".
Clang inlined it in both the static and nonstatic versions in your example... I don't think there's a significant difference in whether or not it inlines the call. There's a difference in whether or not the function's code will be emitted or not.
@GMan: I favor static because I find it more readable for a human being, an unnamed namespace require the human to remember he's in one, and it isn't obvious. static does not require this "context".
@Matthieu I must be missing something. I am looking at your demo, and it has inlined the call. There is no call to foo in the entirety of main.

Blindy · Accepted Answer · 2011-08-17 16:44:44Z

7

Depends on aliasing (pointers to that function) and function length (a large function inlined in a branch could throw the other branch out of cache, thus hurting performance).

Let the compiler worry about that, you worry about your code :)

answered Aug 17, 2011 at 16:44

Blindy

68k10 gold badges96 silver badges141 bronze badges

14 Comments

Cookie Over a year ago

Could a compiler also split up longer functions itself into smaller functions?

Oliver Charlesworth Over a year ago

The OP is asking about the comparison between writing the code as above, and writing all the code inline manually. So I guess he/she wants an answer that explains whether breaking a function up will ever have a downside.

Blindy Over a year ago

That sounds a bit less likely, what would be the benefit? I think you're more likely to find the compiler copy your blocks over and over one after the other :)

Oliver Charlesworth Over a year ago

@Blindy: Your answer says "don't worry about it", which isn't quite the same thing! There are clearly cases where function-call overhead will significantly impact performance; the OP wants to know whether he'll ever be at risk of incurring this cost if he writes his code for clarity.

Oliver Charlesworth Over a year ago

@Blindy: If this code forms part of the inner-loop that iterates millions of times a second, then yes, it really does matter.

|

Mark Ransom · Accepted Answer · 2011-08-17 16:56:48Z

7

A complicated function is likely to have its speed dominated by the operations within the function; the overhead of a function call won't be noticeable even if it isn't inlined.

You don't have much control over the inlining of a function, the best way to know is to try it and find out.

A compiler's optimizer might be more effective with shorter pieces of code, so you might find it getting faster even if it's not inlined.

edited Aug 17, 2011 at 16:56

answered Aug 17, 2011 at 16:47

Mark Ransom

310k44 gold badges423 silver badges660 bronze badges

5 Comments

Oliver Charlesworth Over a year ago

And what about a simple function?

Mark Ransom Over a year ago

@Oli, simple functions are more likely to be automatically inlined anyway.

phkahler Over a year ago

Don't have much control? That's why there is an "inline" keyword... Simple functions can not be inlined unless they are declared static. Or more specifically a non-inlined version must be available to outside callers. Also, if a large function is broken into 15 small ones (like OP wants to do), that will be 15 times the call overhead.

Bill Over a year ago

@phkahler: The inline keyword doesn't force the compiler to inline the function.

phkahler Over a year ago

@Bill: Right, it's just a suggestion which most compilers listen too (unlike the old register keyword). The fact that it exists and is used refutes the posters second sentence that says you don't have much control.

Mark B · Accepted Answer · 2011-08-17 17:05:11Z

If you split up your code into logical groupings the compiler will do what it deems best: If it's short and easy, the compiler should inline it and the result is the same. If however the code is complicated, making an extra function call might actually be faster than doing all the work inlined, so you leave the compiler the option to do that too. On top of all that, the logically split code can be far easier for a maintainer to grok and avoid future bugs.

If the function is called only once, I am unsure that we would see a speed up by not inlining the code... unless you actually want to skip the function call and having the function inlined therefore trash your instruction cache. Is it current ?
My thought was that by breaking the code down into logical function blocks the compiler may be able to utilize registers more intelligently, among other possibilities I can't even imagine.

Benoît · Accepted Answer · 2011-08-17 20:46:45Z

I suggest you create a helper class to break your complicated function into method calls, much like you were proposing, but without the long, boring and unreadable task of passing arguments to each and every one of these smaller functions. Pass these arguments only once by making them member variables of the helper class.

Don't focus on optimization at this point, make sure your code is readable and you'll be fine 99% of the time.

Collectives™ on Stack Overflow

Do modern C++ compilers inline functions which are called exactly once?

5 Answers 5

4 Comments

14 Comments

5 Comments

2 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

4 Comments

14 Comments

5 Comments

2 Comments

Comments

Linked

Related