C++ Function static local, performance?

Question

I have a function that must return true of false depending on the value of a member enum representing the type of operator.

I'm wondering what would be the fastest between the following options, as I'm not sure of what implicit optimizations the compiler is going to do, if any.

 inline bool isBinaryOper( void ) const // Fastest i assume. { static const bool arr[] = { true, // E_PLUS true, // E_MINUS true, // E_MULTIPLY true, // E_DIVIDE false, // E_LPARENT false, // E_RPARENT false, // E_COMMA false // E_SEMICOLON }; return arr[ size_t( this->_eType ) ]; // Assuming values are valid indexes. }

Or :

 inline bool isBinaryOper( void ) const { switch( this->_eType ) { case E_PLUS : return true; case E_MINUS : return true; case E_MULTIPLY : return true; case E_DIVIDE : return true; case E_LPARENT : return false; case E_RPARENT : return false; case E_COMMA : return false; case E_SEMICOLON : return false; default : ... }; }

Or, which I guess is very similar to the previous one :

 inline bool isBinaryOper( void ) const { if ( this->_eType == E_PLUS ) return true; else if ( this->_eType == E_MINUS ) return true; // etc... }

Which one would be the fastest, and why ?

You might be able to make this even faster if those E_ constants were all powers of base-2 and could combine into a bitmask. Then: return !(this->_eType & (E_LPARENT | E_RPARENT | ...)) — tadman
– tadman, Commented May 14, 2014 at 15:08
@tadman Thanks, i'll consider this options, but i'm not sure yet how many operators i'll have but i guess there should be enough bits. — Virus721
– Virus721, Commented May 14, 2014 at 15:10
You could add a boolean field and initialize the operator type when you first set _eType once, inline bool isBinaryOper() const { return _isBinary; }. — user1129665
– user1129665, Commented May 14, 2014 at 15:15
If you use the array and then later change the enum, e.g. add a new value or re-arrange it, your code will break. The switch statement will either still work for re-arranged values or at least produce a meaningful error if there is a default case. If not, the compiler should issue a warning. So I would use the more maintainable one which is the switch. — Jens
– Jens, Commented May 14, 2014 at 16:26

Community · Accepted Answer · 2020-06-20 09:12:55Z

This question strikes me as being an instance of premature optimization, but for what its worth, I'd go with the switch statement even though it is likely to be slightly slower, because:

You're not going to notice the slowdown.
Assuming you fill in the default: case, the switch implementation protects you against invalid data or changes to the enum definition, which will simplify debugging.
Both gcc and clang (and probably other good compilers) will optimize the switch into either a binary search or a jump table, depending on how the alternatives are ordered and the precise characteristics of the target platform. In neither case will it simply do a linear sequence of checks with each possible value, like the if ... else if ... else if ... option, which is almost certainly the slowest.

That saves you from thinking about how to order the alternatives, particularly since you might need various boolean functions with different orderings. Unless you are an expert in computer architectures, you can reasonably assume that your compiler understands it better.

Ugh, I hate comments like this. There is nothing at all wrong with understanding what your language compiler is doing under the covers so that you can write more efficient code.
@JonathanWood: It's precisely because I have some understanding of what my compiler is doing under the covers that I am happy to live with it optimising my code.
Yes i'm also asking this to know better what's implicitly happening, not only for performance. ANd yes it is premature optimizations, i'll probably save a lot more time on other things, but it doesn't forbid me to ask about it.
@Virus721: In that case, look at my point #3 :). Or use the -S option to gcc or clang to actually see what code is produced. With your alternatives arranged so that the trues are all before the falses, you'll probably get two conditional branches, one predictable (out of range) and the other one not. If you rearrange the options so that true and false alternate, you'll probably get a jump table, which is much less predictable but still fast enough.

Jonathan Wood · Accepted Answer · 2014-05-14 15:04:38Z

0

Using the value as an index into an array is considerably faster than a switch statement.

Your second and third code blocks would perform about equally. But the first one quickly gets the index and uses it to access the desired array element. That would be my preference; however, you'll probably want to add error checking as well to ensure the argument is within the expected range.

answered May 14, 2014 at 15:04

Jonathan Wood

68.1k86 gold badges308 silver badges542 bronze badges

18 Comments

Mike Seymour Over a year ago

Do you have figures or reasoning to back up that claim? I don't see that an array lookup would necessarily be faster than a jump table (which is a typical implementation for a dense switch statement).

Virus721 Over a year ago

@Jonathan Wood. Thanks for your answer. I had the design of a hash table in mind for the first option, but i was more concerned about the static local variable. Is there a test performed when entering a function with static local variables initizations or any other implicit things ?

Jonathan Wood Over a year ago

A typical implementation of an array lookup would simply add the index times the size of each element to the base address of the array. No comparisons or loops needed. The switch or if statements would require comparing each item. The array index would be many times faster.

Jonathan Wood Over a year ago

@Virus721: There is no performance hit for accessing static data. In fact, in some cases it might be faster as you don't need to find it through class pointers.

Jens Over a year ago

@JonathanWood Why would the switch require comparing each item? A switch is designed with a jump-table in mind using the enum as index.

|

Marius Bancila · Accepted Answer · 2014-05-14 15:11:44Z

0

If your enumeration is partitioned so that all the values that return true come before all the ones that return false, then you can do this:

inline bool isBinaryOper() const { return this->_eType < E_LPARENT; }

answered May 14, 2014 at 15:11

Marius Bancila

16.4k9 gold badges53 silver badges97 bronze badges

1 Comment

Virus721 Over a year ago

Thanks for your answer, but i'll have more than one table like this. It might be ordered in one table but not in the other, so i'm not sure this is gonna be possible. Though spliting a qwitch into multple parts like this, to make some kind of binary search might help.

ach · Accepted Answer · 2014-05-14 16:26:03Z

I would say that array lookup is very likely to be the most efficient thing. There's simply no "fat" in it for an optimizing compiler to trim off.

Of course, the table will most possibly be placed in other segment (.rdata instead of .text), and the table will therefore eat up more cache lines. However the chance that you will encounter any negative effect from that is negligible.

Of course, a compiler is likely to implement a switch with dense case values into a table lookup. That will give a huge improvement over the 'naïve' cascaded-if implementation. However, there is no guarantee that that will be done in the most straightforward way.

A very simple quick-and-dirty experiment corroborates my reasoning:

#include <stdio.h> #include <time.h> enum E { E0, E1, E2, E3, E4, E5, E6, E7, }; bool f1(E x) { if (x > E7 || x < E0) throw "ohbadbad"; static const bool t[] = { true, true, true, true, false, false, false, false, }; return t[x]; } bool f2(E x) { switch (x) { case E0: return true; case E1: return true; case E2: return true; case E3: return true; case E4: return false; case E5: return false; case E6: return false; case E7: return false; default: throw "ohbadbad"; } } int main(int argc, char* argv[]) { bool (*f)(E) = (argc > 1 && argv[1][0] == 's') ? f2 : f1; clock_t t = clock(); int r = 0; for (int i = 0; i < 10000; ++i) for (int j = 0; j < 100000; ++j) r += f((E)(j & E7)); printf("%d %I64d\n", r, __int64(clock() - t)); return 0; }

Compiled with MSVC++ 16 for both x86 and x64 (with -O2 option), f1 gives a more than 3 times better clock than f2.

Analyzing the object code, it is very easy to see why: the switch is indeed implemented using a table - but it is a table of labels. The code fetches an address from the table and then jumps to that address. One branch effectively does return 0, another return 1. Not only that is an unnecessary step, but also results in frequent branch mispredictions.

You could reformulate the switch statement to something like bool f2(E x) { bool result = false; switch(x) { case E0: result = true; break; // ... } return result; Could you run this with your compiler? I just don't have one here right now.
@Jens, for x86 the difference became 35% instead of three-times. Then, because f2 does not check its argument for validity, I also omitted argument control from f1 to make the two functions fully equivalent. The difference raised insignificantly, to 36%. For x64 it is only 20% though!

Collectives™ on Stack Overflow

C++ Function static local, performance?

4 Answers 4

21 Comments

18 Comments

1 Comment

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

21 Comments

18 Comments

1 Comment

3 Comments

Linked

Related