I run following two codes in C using GCC compiler.
#include <stdio.h> #include <time.h> int main() { int i, j, k, l, step, count = 0; double duration; float multi; const clock_t begin_time = clock(); for (step = 1; step < 10000; step++) for (k = 1; k < 27000; k++) { for (i = 1; i < 5; i++) for (j = 1; j < 9; j++) { count++; // INSTRUCTION-1 } }; duration = (double) (clock() - begin_time) / CLOCKS_PER_SEC; printf("C program count = %d \n", count); printf("clock = %f \n", duration); } The second code is as follows:
#include <stdio.h> #include <time.h> int main() { int i, j, k, l, step, count = 0; double duration; float multi; const clock_t begin_time = clock(); for (step = 1; step < 10000; step++) for (k = 1; k < 27000; k++) { for (i = 1; i < 5; i++) for (j = 1; j < 9; j++) { count++; // INSTRUCTION-1 multi = 9.56587458 * 8.547458748; // INSTRUCTION-2 } }; duration = (double) (clock() - begin_time) / CLOCKS_PER_SEC; printf("C program count = %d \n", count); printf("clock = %f \n", duration); } Both the codes are almost same. The only difference is that the first code has only one instruction inside the loop, whereas the second code has two instructions inside the loop. Therefore, I was expecting the second code should take longer to execute. However, to my surprise, The execution time of the first code was 22.45 seconds, whereas for the second code was 17.96 seconds. Why is the second code executed faster than the first code, even if it involves significantly more computations?
CPU used was Intel Xeon E5-2670V 2.5 GHz 2 CPU-IvyBridge (20-cores), if this information is relevant.
-O3], and ran both versions. The disassembly was identical. The execution time was: 0.000002 Note that in the 2nd example, the optimizer could [and probably did] detect thatmulti = 9.56587458 * 8.547458748;was fixed [and loop invariant], so it would have migrated it out. Because it was set but not used, that would explain the code elimination. Indeed, if you had compiled with warnings (e.g.-Wall), the compiler would flag themulti = ...statement.code1: 19.430483 code2: 20.511513andcode1: 19.435285 code2: 18.116939. So, we got conflicting results on two runs. As you surmised, a 19 second test is too long to quibble about the variations (unrelated system loading / time slicing can overshadow the results). With-O0, the only asm diff was two additionalmovssin 2nd code