2

I run following two codes in C using GCC compiler.

#include <stdio.h> #include <time.h> int main() { int i, j, k, l, step, count = 0; double duration; float multi; const clock_t begin_time = clock(); for (step = 1; step < 10000; step++) for (k = 1; k < 27000; k++) { for (i = 1; i < 5; i++) for (j = 1; j < 9; j++) { count++; // INSTRUCTION-1 } }; duration = (double) (clock() - begin_time) / CLOCKS_PER_SEC; printf("C program count = %d \n", count); printf("clock = %f \n", duration); } 

The second code is as follows:

#include <stdio.h> #include <time.h> int main() { int i, j, k, l, step, count = 0; double duration; float multi; const clock_t begin_time = clock(); for (step = 1; step < 10000; step++) for (k = 1; k < 27000; k++) { for (i = 1; i < 5; i++) for (j = 1; j < 9; j++) { count++; // INSTRUCTION-1 multi = 9.56587458 * 8.547458748; // INSTRUCTION-2 } }; duration = (double) (clock() - begin_time) / CLOCKS_PER_SEC; printf("C program count = %d \n", count); printf("clock = %f \n", duration); } 

Both the codes are almost same. The only difference is that the first code has only one instruction inside the loop, whereas the second code has two instructions inside the loop. Therefore, I was expecting the second code should take longer to execute. However, to my surprise, The execution time of the first code was 22.45 seconds, whereas for the second code was 17.96 seconds. Why is the second code executed faster than the first code, even if it involves significantly more computations?

CPU used was Intel Xeon E5-2670V 2.5 GHz 2 CPU-IvyBridge (20-cores), if this information is relevant.

15
  • 6
    For starters, you compiled with optimization disabled. Without optimization, the compiler is in a “dumb” mode and is not expected to produce efficient output. Commented Dec 11, 2020 at 19:08
  • 2
    How many times did you run your benchmark? Was your computer completely unoccupied during the test? Commented Dec 11, 2020 at 19:08
  • 3
    Specify the specific version of GCC you used and all the command-line switches. Commented Dec 11, 2020 at 19:09
  • 2
    I downloaded, compiled [with -O3], and ran both versions. The disassembly was identical. The execution time was: 0.000002 Note that in the 2nd example, the optimizer could [and probably did] detect that multi = 9.56587458 * 8.547458748; was fixed [and loop invariant], so it would have migrated it out. Because it was set but not used, that would explain the code elimination. Indeed, if you had compiled with warnings (e.g. -Wall), the compiler would flag the multi = ... statement. Commented Dec 11, 2020 at 19:26
  • 2
    From the first comment (by Eric), you're compiling without optimization, so it's a bit pointless to worry about performance. Although the tests should be run several times and take the minimum time, I ran the unoptimized programs twice. The times: code1: 19.430483 code2: 20.511513 and code1: 19.435285 code2: 18.116939. So, we got conflicting results on two runs. As you surmised, a 19 second test is too long to quibble about the variations (unrelated system loading / time slicing can overshadow the results). With -O0, the only asm diff was two additional movss in 2nd code Commented Dec 11, 2020 at 20:23

2 Answers 2

3

If you look at the assembly output at https://godbolt.org/z/136qsc

This function:

double multiply(void) { return 9.56587458 * 8.547458748; ; } 

doesn't include a multiply instruction, or the constants 9.56587458 or 8.547458748. The compiler notices that the result can be calculated at compile time, so there's no reason to have that code in the output or to execute it at runtime. You aren't doing what you think you are doing with your two examples, so it makes sense that the one that adds no additional complexity isn't significantly slower.

Sign up to request clarification or add additional context in comments.

1 Comment

You are correct, but your observation does not explain why the second function is significantly faster than the first one.
3

If you compile with optimizations enabled (otherwise all performance tests do not have any sense ), the compiler will optimize all of your loops out and it will only assign the count with compile-time calculated value. Double multiplication will be optimized out (as you do not use the result at all) https://godbolt.org/z/9MebGc

But even if we change the program to force loops the expected difference will be very small (because loops will have much more instructions than the multiplication of two doubles). And it actually is:

void test(volatile double x, volatile double y) { volatile unsigned i, j, k, l, step, count = 0; double duration; double multi = 0.0; const clock_t begin_time = clock(); for (step = 1; step < 1000; step++) for (k = 1; k < 27000; k++) { for (i = 1; i < 5; i++) for (j = 1; j < 9; j++) { count++; // INSTRUCTION-1 multi += x * y; // INSTRUCTION-2 } }; duration = (double) (clock() - begin_time) / CLOCKS_PER_SEC; printf("C program count = %d \n", count); printf("clock = %f \n", duration); printf("%f\n", multi); } void test2(volatile double x, volatile double y) { volatile unsigned i, j, k, l, step, count = 0; double duration; double multi; const clock_t begin_time = clock(); for (step = 1; step < 1000; step++) for (k = 1; k < 27000; k++) { for (i = 1; i < 5; i++) for (j = 1; j < 9; j++) { count++; // INSTRUCTION-1 } }; duration = (double) (clock() - begin_time) / CLOCKS_PER_SEC; printf("C program count = %d \n", count); printf("clock = %f \n", duration); printf("%f\n", multi); } int main() { srand(time(NULL)); test((double)rand()/rand(), (double)rand()/rand()); test2((double)rand()/rand(), (double)rand()/rand()); } 

and the result:

C program count = 863104032 clock = 1.467764 962658070.362825 C program count = 863104032 clock = 1.413076 0.000000 

https://godbolt.org/z/hh4n18

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.