C++ / VS2008: Performance of Macros vs. Inline functions

Question

All,

I'm writing some performance sensitive code, including a 3d vector class that will be doing lots of cross-products. As a long-time C++ programmer, I know all about the evils of macros and the various benefits of inline functions. I've long been under the impression that inline functions should be approximately the same speed as macros. However, in performance testing macro vs inline functions, I've come to an interesting discovery that I hope is the result of me making a stupid mistake somewhere: the macro version of my function appears to be over 8 times as fast as the inline version!

First, a ridiculously trimmed down version of a simple vector class:

 class Vector3d { public: double m_tX, m_tY, m_tZ; Vector3d() : m_tX(0), m_tY(0), m_tZ(0) {} Vector3d(const double &tX, const double &tY, const double &tZ): m_tX(tX), m_tY(tY), m_tZ(tZ) {} static inline void CrossAndAssign ( const Vector3d& cV1, const Vector3d& cV2, Vector3d& cV ) { cV.m_tX = cV1.m_tY * cV2.m_tZ - cV1.m_tZ * cV2.m_tY; cV.m_tY = cV1.m_tZ * cV2.m_tX - cV1.m_tX * cV2.m_tZ; cV.m_tZ = cV1.m_tX * cV2.m_tY - cV1.m_tY * cV2.m_tX; } #define FastVectorCrossAndAssign(cV1,cV2,cVOut) { \ cVOut.m_tX = cV1.m_tY * cV2.m_tZ - cV1.m_tZ * cV2.m_tY; \ cVOut.m_tY = cV1.m_tZ * cV2.m_tX - cV1.m_tX * cV2.m_tZ; \ cVOut.m_tZ = cV1.m_tX * cV2.m_tY - cV1.m_tY * cV2.m_tX; } };

Here's my sample benchmarking code:

Vector3d right; Vector3d forward(1.0, 2.2, 3.6); Vector3d up(3.2, 1.4, 23.6);

 clock_t start = clock(); for (long l=0; l < 100000000; l++) { Vector3d::CrossAndAssign(forward, up, right); // static inline version } clock_t end = clock(); std::cout << end - start << endl; clock_t start2 = clock(); for (long l=0; l<100000000; l++) { FastVectorCrossAndAssign(forward, up, right); // macro version } clock_t end2 = clock(); std::cout << end2 - start2 << endl;

The end result: With optimizations turned completely off, the inline version takes 3200 ticks, and the macro version 500 ticks... With optimization turned on (/O2, maximize speed, and other speed tweaks), I can get the inline version down to 1100 ticks, which is better but still not the same.

So I appeal to all of you: is this really true? Have I made a stupid mistake somewhere? Or are inline functions really this much slower -- and if so, why?

Yes, changing code and not checking that it produces the same result is the mother of stupid mistakes. — Hans Passant
– Hans Passant, Commented Sep 28, 2010 at 7:24
Question: you did perform the tests with optimizations enabled right ? It is customary for compilers not to inline everything in debug because an inline function does not appear in the stack-frame, making it harder to debug. — Matthieu M.
– Matthieu M., Commented Sep 28, 2010 at 7:45
"With optimizations turned completely off, the inline version takes [longer]". Well, what do you expect when you turn off inlining?? — sbi
– sbi, Commented Sep 28, 2010 at 8:57

Sjoerd · Accepted Answer · 2010-09-29 23:42:55Z

NOTE: After posting this answer, the original question was edited to remove this problem. I'll leave the answer as it is instructive on several levels.

The loops differ in what they do!

if we manually expand the macro, we get:

for (long l=0; l<100000000; l++) right.m_tX = forward.m_tY * up.m_tZ - forward.m_tZ * up.m_tY; right.m_tY = forward.m_tZ * up.m_tX - forward.m_tX * up.m_tZ; right.m_tZ = forward.m_tX * up.m_tY - forward.m_tY * up.m_tX;

Note the absense of curly brackets. So the compiler sees this as:

for (long l=0; l<100000000; l++) { right.m_tX = forward.m_tY * up.m_tZ - forward.m_tZ * up.m_tY; } right.m_tY = forward.m_tZ * up.m_tX - forward.m_tX * up.m_tZ; right.m_tZ = forward.m_tX * up.m_tY - forward.m_tY * up.m_tX;

Which makes it obvious why the second loop is so much faster.

Udpate: This is also a good example of why macros are evil :)

Oh, thank you for giving this perfect example why one should always use braces, even for one-liner bodies. Absolutely +1, eagle eye!
I wouldn't say that macros are evil per se, though. They bite you when you're careless (like, not wrapping a multiple-line macro in do { ... } while (0)).
@DevSolar: why wrapping a macro in do { ... } while(0) when { ... } works perfectly ? Is it this important to force the user to put a semi colon after it ?
@Matthieu M.: Yes it is. 1) Omitting the semicolon results in compiler error, forcing the macro call to mimick a proper function call. (Makes it easier when you want to change the macro into a function later on.) But more important, 2) try using your { ... } macro in the if part of a if ... else. Suddenly you must not put the semicolon... Also see c-faq.com/cpp/multistmt.html
Submitter response: Ack, in optimizing my posted code for readability I removed what seemed like extraneous braces. In my real code, the braces are there, and the loop does do exactly what you would expect it should do. I've updated the sample. So unfortunately, this is not the answer.

Philipp · Accepted Answer · 2010-09-28 06:35:47Z

please note that if you use the inline keyword, this is only a hint for the compiler. If you turn optimizations off, this might cause the compiler not to inline the function. You should go to Project Settings/C++/Optimization/ and make sure to turn Optimization on. What settings have you used for "Inline Function Expansion"?

Turning on Full Optimization, both my functions are returning a time of 0, so I suspect the entire loops are being optimized out because they don't do anything useful. I'll have to play around with this some more.
you might access the results (for example add all the results) and later output the final sum or something like that.

justin · Accepted Answer · 2010-09-28 06:48:30Z

it also depends optimizations and compiler settings. also look for your compiler's support for an always inline/force inline declaration. inlining is as fast as a macro.

by default, the keyword is a hint -- force inline/always inline (for the most part) returns the control to the programmer of the original intention of the keyword.

finally, gcc (for example) can be directed to inform you when such a function is not inlined as directed.

Necrolis · Accepted Answer · 2010-09-28 06:52:27Z

Apart from what Philipp mentioned, if your using MSVC, you can use __forceinline or the gcc __attrib__ equivalent to correct the probelems with inlining.

However, there is another possible problem lurking, using a macro will cause the parameters of the macro to be re-evaluated at each point, so if you call the macro like so:

FastVectorCrossAndAssign(getForward(), up, right);

it will expand to:

right.m_tX = getForward().m_tY * up.m_tZ - getForward().m_tZ * up.m_tY; right.m_tY = getForward().m_tZ * up.m_tX - getForward().m_tX * up.m_tZ; right.m_tZ = getForward().m_tX * up.m_tY - getForward().m_tY * up.m_tX;

not want you want when your concerned with speed :) (especially if getForward() isn't a lightweight function, or does some incrementing each call, if its an inline function, the compiler might fix the amount of calls, provided it isn't volatile, that still won't fix everything though)

Collectives™ on Stack Overflow

C++ / VS2008: Performance of Macros vs. Inline functions

4 Answers 4

6 Comments

2 Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

6 Comments

2 Comments

Comments

Comments

Linked

Related