2

I had prepared two sample code for showing thread having int variable calculation is faster than thread having double variable.

Only difference between two code is, in first i am using only integers and in other i am using only double.

Time difference between them is almost 30%.

Reason might be very simple/basic, but can anyone please give me the possible reason(s)?

Note: please ignore the logic of the code, because it is just prepared for demo.

Using integer :

 #include <stdio.h> #include <pthread.h> pthread_t pth1,pth2,pth3,pth4; void *threadfunc1(void *parm) { int i,j,k,l; j = 0; k = 0; l = 5; for (i = 0; i < 5000000; i ++) { j = k + 152; k = j + 21; l = j + k + (j * 5) + (k * 2) + (l * 3); j = k + ((l + j)/ k) + j + k + (l / k); j = 0; k = 0; l = 5; } printf("Completed Thread 1\n"); return NULL ; } void *threadfunc2(void *parm) { int i,j,k,l; j = 0; k = 0; l = 5; for (i = 0; i < 5000000; i ++) { j = k + 152; k = j + 21; l = j + k + (j * 5) + (k * 2) + (l * 3); j = k + ((l + j)/ k) + j + k + (l / k); j = 0; k = 0; l = 5; } printf("Completed Thread 2\n"); return NULL ; } int main () { pthread_create(&pth1, NULL, threadfunc1, "foo"); pthread_create(&pth2, NULL, threadfunc2, "foo"); pthread_join( pth1, NULL); pthread_join( pth2, NULL); return 1; } 

Using double:

 #include <stdio.h> #include <pthread.h> pthread_t pth1,pth2,pth3,pth4; void *threadfunc1(void *parm) { double i,j,k,l; j = 0; k = 0; l = 5; for (i = 0; i < 5000000; i ++) { j = k + 152; k = j + 21; l = j + k + (j * 5) + (k * 2) + (l * 3); j = k + ((l + j)/ k) + j + k + (l / k); j = 0; k = 0; l = 5; } printf("Completed Thread 1\n"); return NULL ; } void *threadfunc2(void *parm) { double i,j,k,l; j = 0; k = 0; l = 5; for (i = 0; i < 5000000; i ++) { j = k + 152; k = j + 21; l = j + k + (j * 5) + (k * 2) + (l * 3); j = k + ((l + j)/ k) + j + k + (l / k); j = 0; k = 0; l = 5; } printf("Completed Thread 2\n"); return NULL ; } int main () { pthread_create(&pth1, NULL, threadfunc1, "foo"); pthread_create(&pth2, NULL, threadfunc2, "foo"); pthread_join( pth1, NULL); pthread_join( pth2, NULL); return 1; } 
8
  • 4
    This has nothing to do with threads. Floating-point operations are simply much slower than integral operations. Commented Dec 2, 2013 at 6:35
  • ok. I was wondering it is in the case of threads only. Let me get the benchmark for simple code without thread. thanks @JonathonReinhart Commented Dec 2, 2013 at 6:36
  • 1
    stackoverflow.com/questions/2550281/… Commented Dec 2, 2013 at 6:37
  • yes got that @JonathonReinhart and @ jeyaram found the difference in benchmark with simple code also. thanks Commented Dec 2, 2013 at 6:40
  • 1
    Unless l + j is a[n integer] multiple of k, the expression (l + j)/ k has completely different meaning when the types of l, j, and k are floating point types as opposed to integer types. Commented Dec 2, 2013 at 6:47

2 Answers 2

2

This difference is because of usage of floating point. For example, have a look at the following simple program:

#include <stdlib.h> #include <stdio.h> int main(int argc, char *argv[]) { TYPE i,s=0; for (i = 0; i < 100; i++) { s += i; } printf("Sum=%d\n", s); return 0; } 

Compile it with gcc -o main main.c and have a look on its main() function disassembly for TYPE defined as fixed (left) and double (right): fixed vs float, no optimization Arrows show for(){} loop from main. Target is X86 processor.

For gcc -O3 -o main main.c fixed point still wins: enter image description here

Thus fixed point is more preferable for high speed computations if algorithm allows its usage. And this situation remains almost the same if double is replaced with a float.

Moreover some processors have no floating point at all and use special optimized emulation libraries (for instance - TI C64x+ family). In that case difference between performance of fixed and floating point will ~10x.

Sign up to request clarification or add additional context in comments.

6 Comments

Why do you have to use legacy target?
@AkiSuihkonen I meant that I used x86 compatible processor, not exactly 8086 or similar processor. You think it will be better to correct it?
The stack based FP processor is probably slower (having 80 bit internal precision) than its xmm based counterpart.
@AkiSuihkonen Undoubtedly, SSE can be advantageous, but does compiler itself can add it to program code? I think instrinsics should be used to do that. In addition it is not very comfortable to use it in a loops like in the question.
My gcc 4.6.3 on x64 produces SSE instructions by default.
|
0

Floating point arithmetic operations take more CPU cycles than integers, the HW is much (much much) more complex.

This has nothing to do with threads.

Also most processors have more parallel execution resources for integers than they have for floating point as integer operations are used more than floating point in general.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.