Unexpected performance results with recursion

Question

I wonder why i get unexpected performances with these two pairs of obvious examples of recursion.

The same recursive function is faster inside a struct (rec2 VS rec1) and the same recursive template function is faster with a dummy parameter (rec4 VS rec3) !

Are the C++ functions faster with more arguments ?!

Here is the code tried :

#include <QDebug> #include <QElapsedTimer> constexpr std::size_t N = 28; std::size_t counter = 0; // non template function which take 1 argument void rec1(std::size_t depth) { ++counter; if ( depth < N ) { rec1(depth + 1); rec1(depth + 1); } } // non template member which take 2 arguments (implicit this) struct A { void rec2(std::size_t depth) { ++counter; if ( depth < N ) { rec2(depth + 1); rec2(depth + 1); } } }; // template function which take 0 argument template <std::size_t D> void rec3() { ++counter; rec3<D - 1>(); rec3<D - 1>(); } template <> void rec3<0>() { ++counter; } // template function which take 1 (dummy) argument struct Foo { int x; }; template <std::size_t D> void rec4(Foo x) { ++counter; rec4<D - 1>(x); rec4<D - 1>(x); } template <> void rec4<0>(Foo x) { ++counter; } int main() { QElapsedTimer t; t.start(); rec1(0); qDebug() << "time1" << t.elapsed(); qDebug() << "counter" << counter; counter = 0; A a; t.start(); a.rec2(0); qDebug()<< "time2" << t.elapsed(); qDebug()<< "counter" << counter; counter = 0; t.start(); rec3<N>(); qDebug()<< "time3" << t.elapsed(); qDebug()<< "counter" << counter; counter = 0; t.start(); rec4<N>(Foo()); qDebug()<< "time4" << t.elapsed(); qDebug()<< "counter" << counter; qDebug() << "fin"; return 0; }

I get this output :

time1 976 counter 536870911 time2 341 counter 536870911 time3 241 counter 536870911 time4 201 counter 536870911 fin

I have : Windows 8.1/i7 3630QM/latest Qt chaintool/c++14 enabled

calling-convention of member-functions (thiscall) and free-standing (cdecl) function are different ,which may account for time differences between them. — engf-010
– engf-010, Commented Aug 18, 2015 at 23:02
@templatetypedef : I don't know what "optimization level" is, but i use the release mode inside Qt creator, with all the default settings. — Cevik
– Cevik, Commented Aug 18, 2015 at 23:28
@engf-010 : maybe... I also tried to put the first function inside a namespace but it's the same. Can we have a cdecl function inside a namespace too ? — Cevik
– Cevik, Commented Aug 18, 2015 at 23:30
@cevik: yes ,calling conventions only specify how parameters are passed ,namespaces is just an fancy naming method ,so those things are unrelated. — engf-010
– engf-010, Commented Aug 18, 2015 at 23:39

1201ProgramAlarm · Accepted Answer · 2015-08-19 23:39:56Z

I was finally able to look at this on Visual Studio 2015 Community. Examining the disassembly of the compiled code, rec1 and rec2 are recursive. They are very similar in the generated code, and although rec2 has more instructions it runs slightly faster. rec3 and rec4 both generate a series of functions for all the different values of D in the template parameter, and in this case the compiler has eliminated many of the function calls, eliminated others, and added a larger value to count. (For example, rec4<10> just adds 2047 to count and returns.)

So the performance difference you see is mostly due to how the compiler optimizes each version, with slight differences in how the code flows thru the CPU also a factor.

My results (time measured in seconds), compiled with /Ox /O2:

time1 1.03411 counter 536870911 time2 0.970455 counter 536870911 time3 0.000866 counter 536870911 time4 0.000804 counter 536870911

Ok, thanks you. In fact, i need to call a specific function for each depth. So i expect the optimisation to be less significant if i add a call through a function pointer array.

Collectives™ on Stack Overflow

Unexpected performance results with recursion

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related