11

As in, say my header file is:

class A { void Complicated(); } 

And my source file

void A::Complicated() { ...really long function... } 

Can I split the source file into

void DoInitialStuff(pass necessary vars by ref or value) { ... } void HandleCaseA(pass necessary vars by ref or value) { ... } void HandleCaseB(pass necessary vars by ref or value) { ... } void FinishUp(pass necessary vars by ref or value) { ... } void A::Complicated() { ... DoInitialStuff(...); switch ... HandleCaseA(...) HandleCaseB(...) ... FinishUp(...) } 

Entirely for readability and without any fear of impact in terms of performance?

8
  • 1
    Maybe, maybe not. The compiler programmer might be your best bet, depedning upon what compiler you are using. Commented Aug 17, 2011 at 16:44
  • 2
    None of this even happens to be in a loop? Exactly how much time do you hope to gain from avoiding the overhead of a couple of function calls? A nanosecond? Commented Aug 17, 2011 at 16:45
  • 2
    Small functions that get called a lot benefit from inlining. A function that's sufficiently larger than the function call overhead will not benefit from inlining, so I wouldn't worry about it. Commented Aug 17, 2011 at 16:45
  • 11
    Declare your internal functions as static to give them file scope. They may be inlined even if you don't do this. But if they are not static, they will have to be exported, which means a non-inlined version will have to be generated even if it's never used. Commented Aug 17, 2011 at 16:47
  • 1
    @UncleBens: Yes it could be in a loop. I am just saying in the code the function would only be referenced once. Commented Aug 17, 2011 at 16:50

5 Answers 5

11

You should mark the functions static so that the compiler know they are local to that translation unit.

Without static the compiler cannot assume (barring LTO / WPA) that the function is only called once, so is less likely to inline it.

Demonstration using the LLVM Try Out page.

That said, code for readability first, micro-optimizations (and such tweaking is a micro-optimization) should only come after performance measures.


Example:

#include <cstdio> static void foo(int i) { int m = i % 3; printf("%d %d", i, m); } int main(int argc, char* argv[]) { for (int i = 0; i != argc; ++i) { foo(i); } } 

Produces with static:

; ModuleID = '/tmp/webcompile/_27689_0.bc' target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64" target triple = "x86_64-unknown-linux-gnu" @.str = private constant [6 x i8] c"%d %d\00" ; <[6 x i8]*> [#uses=1] define i32 @main(i32 %argc, i8** nocapture %argv) nounwind { entry: %cmp4 = icmp eq i32 %argc, 0 ; <i1> [#uses=1] br i1 %cmp4, label %for.end, label %for.body for.body: ; preds = %for.body, %entry %0 = phi i32 [ %inc, %for.body ], [ 0, %entry ] ; <i32> [#uses=3] %rem.i = srem i32 %0, 3 ; <i32> [#uses=1] %call.i = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([6 x i8]* @.str, i64 0, i64 0), i32 %0, i32 %rem.i) nounwind ; <i32> [#uses=0] %inc = add nsw i32 %0, 1 ; <i32> [#uses=2] %exitcond = icmp eq i32 %inc, %argc ; <i1> [#uses=1] br i1 %exitcond, label %for.end, label %for.body for.end: ; preds = %for.body, %entry ret i32 0 } declare i32 @printf(i8* nocapture, ...) nounwind 

Without static:

; ModuleID = '/tmp/webcompile/_27859_0.bc' target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64" target triple = "x86_64-unknown-linux-gnu" @.str = private constant [6 x i8] c"%d %d\00" ; <[6 x i8]*> [#uses=1] define void @foo(int)(i32 %i) nounwind { entry: %rem = srem i32 %i, 3 ; <i32> [#uses=1] %call = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([6 x i8]* @.str, i64 0, i64 0), i32 %i, i32 %rem) ; <i32> [#uses=0] ret void } declare i32 @printf(i8* nocapture, ...) nounwind define i32 @main(i32 %argc, i8** nocapture %argv) nounwind { entry: %cmp4 = icmp eq i32 %argc, 0 ; <i1> [#uses=1] br i1 %cmp4, label %for.end, label %for.body for.body: ; preds = %for.body, %entry %0 = phi i32 [ %inc, %for.body ], [ 0, %entry ] ; <i32> [#uses=3] %rem.i = srem i32 %0, 3 ; <i32> [#uses=1] %call.i = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([6 x i8]* @.str, i64 0, i64 0), i32 %0, i32 %rem.i) nounwind ; <i32> [#uses=0] %inc = add nsw i32 %0, 1 ; <i32> [#uses=2] %exitcond = icmp eq i32 %inc, %argc ; <i1> [#uses=1] br i1 %exitcond, label %for.end, label %for.body for.end: ; preds = %for.body, %entry ret i32 0 } 
Sign up to request clarification or add additional context in comments.

4 Comments

I think you ought to put them in an unnamed namespace, according to "principle".
Clang inlined it in both the static and nonstatic versions in your example... I don't think there's a significant difference in whether or not it inlines the call. There's a difference in whether or not the function's code will be emitted or not.
@GMan: I favor static because I find it more readable for a human being, an unnamed namespace require the human to remember he's in one, and it isn't obvious. static does not require this "context".
@Matthieu I must be missing something. I am looking at your demo, and it has inlined the call. There is no call to foo in the entirety of main.
7

Depends on aliasing (pointers to that function) and function length (a large function inlined in a branch could throw the other branch out of cache, thus hurting performance).

Let the compiler worry about that, you worry about your code :)

14 Comments

Could a compiler also split up longer functions itself into smaller functions?
The OP is asking about the comparison between writing the code as above, and writing all the code inline manually. So I guess he/she wants an answer that explains whether breaking a function up will ever have a downside.
That sounds a bit less likely, what would be the benefit? I think you're more likely to find the compiler copy your blocks over and over one after the other :)
@Blindy: Your answer says "don't worry about it", which isn't quite the same thing! There are clearly cases where function-call overhead will significantly impact performance; the OP wants to know whether he'll ever be at risk of incurring this cost if he writes his code for clarity.
@Blindy: If this code forms part of the inner-loop that iterates millions of times a second, then yes, it really does matter.
|
7

A complicated function is likely to have its speed dominated by the operations within the function; the overhead of a function call won't be noticeable even if it isn't inlined.

You don't have much control over the inlining of a function, the best way to know is to try it and find out.

A compiler's optimizer might be more effective with shorter pieces of code, so you might find it getting faster even if it's not inlined.

5 Comments

And what about a simple function?
@Oli, simple functions are more likely to be automatically inlined anyway.
Don't have much control? That's why there is an "inline" keyword... Simple functions can not be inlined unless they are declared static. Or more specifically a non-inlined version must be available to outside callers. Also, if a large function is broken into 15 small ones (like OP wants to do), that will be 15 times the call overhead.
@phkahler: The inline keyword doesn't force the compiler to inline the function.
@Bill: Right, it's just a suggestion which most compilers listen too (unlike the old register keyword). The fact that it exists and is used refutes the posters second sentence that says you don't have much control.
0

If you split up your code into logical groupings the compiler will do what it deems best: If it's short and easy, the compiler should inline it and the result is the same. If however the code is complicated, making an extra function call might actually be faster than doing all the work inlined, so you leave the compiler the option to do that too. On top of all that, the logically split code can be far easier for a maintainer to grok and avoid future bugs.

2 Comments

If the function is called only once, I am unsure that we would see a speed up by not inlining the code... unless you actually want to skip the function call and having the function inlined therefore trash your instruction cache. Is it current ?
My thought was that by breaking the code down into logical function blocks the compiler may be able to utilize registers more intelligently, among other possibilities I can't even imagine.
0

I suggest you create a helper class to break your complicated function into method calls, much like you were proposing, but without the long, boring and unreadable task of passing arguments to each and every one of these smaller functions. Pass these arguments only once by making them member variables of the helper class.

Don't focus on optimization at this point, make sure your code is readable and you'll be fine 99% of the time.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.