0

I was playing around with profiling certain things in C++ and I came across this very weird thing that I don't have an explanation for. Basically, the first function call made in main always takes significantly longer than any subsequent calls. This makes me think that the first function call has some kind of overhead. I'm not sure what this overhead would come from, so if anyone has any insights on why this is happening, I'd appreciate it. Also, I'm using Clang++

The a simple code snippet that shows this point is

#include <iostream> #include <time.h> const int size = 1000000; using namespace std; void foo() { int arr[size]; for(int i = 0; i < size; i++){ arr[i] = 1; } } int main(){ for(int i = 0; i < 10; i++){ clock_t t = clock(); foo(); t = clock() - t; cout << "cycles: " << t << endl; } return 0; } 

and the output shows that the first call always takes about 2000 more cycles than the rest of them

cycles: 5185 cycles: 3049 cycles: 2981 cycles: 2830 cycles: 2851 cycles: 2767 cycles: 2694 cycles: 2570 cycles: 2517 cycles: 2490 
3
  • I suspect that the OS lazily loading in code pages is responsible for this. Commented Apr 1, 2021 at 21:24
  • Unless this is compiled with optimizations enabled none of this data is meaningful. Commented Apr 1, 2021 at 21:25
  • @tadam I tried optimizations off and on and on a few different compilers and they gave me similar results. Commented Apr 1, 2021 at 21:50

1 Answer 1

3

My thought is that in the first function call the function code itself is loaded up into memory like the L1 Cache and then the function code is executed. By the next function call, the function code will already be in the cache, so the CPU wont have to spend cycles doing that and it can just run the function again. This is why the first call to foo takes longer than the rest.

Also, note that calling another function whose code is next to foo in memory can bring foo into memory as well since the OS loads pages of memory at a time. I've tested this and it seems to sometimes work, so take that with a grain of salt. I guess the compiler makes it non deterministic where the binaries of function definitions are located. So it's possible for the function code to be in cache before the first time you call it, but if not, it'll take a few extra cycles to put it there before the CPU actually runs the code.

Sign up to request clarification or add additional context in comments.

9 Comments

Also, the CPU is usually basically idle. When you start a tight loop, it'll increase it's clock speed, so the computer actually gets faster after a few calls
This is not an answer, this is speculation.
Your results look like you're timing unoptimized code. The optimizer should recognize that no observable result is used, and can optimize all of it away. However, unoptimized code tends to be big and bloated, which also supports your hypothesis of it being a cache effect loading the function. There is also the branch predictor in the cpu that may play out, where it "learns" your loop during the first call.
@JoshuaSegal Are you sure? godbolt.org/z/M96fEvx9d
It's not the small function code but the memory associated with int arr[size] Once cached, subsequent accesses are quicker.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.