Why thread having int variable calculation is faster than thread having double variable? [duplicate]

Question

I had prepared two sample code for showing thread having int variable calculation is faster than thread having double variable.

Only difference between two code is, in first i am using only integers and in other i am using only double.

Time difference between them is almost 30%.

Reason might be very simple/basic, but can anyone please give me the possible reason(s)?

Note: please ignore the logic of the code, because it is just prepared for demo.

Using integer :

 #include <stdio.h> #include <pthread.h> pthread_t pth1,pth2,pth3,pth4; void *threadfunc1(void *parm) { int i,j,k,l; j = 0; k = 0; l = 5; for (i = 0; i < 5000000; i ++) { j = k + 152; k = j + 21; l = j + k + (j * 5) + (k * 2) + (l * 3); j = k + ((l + j)/ k) + j + k + (l / k); j = 0; k = 0; l = 5; } printf("Completed Thread 1\n"); return NULL ; } void *threadfunc2(void *parm) { int i,j,k,l; j = 0; k = 0; l = 5; for (i = 0; i < 5000000; i ++) { j = k + 152; k = j + 21; l = j + k + (j * 5) + (k * 2) + (l * 3); j = k + ((l + j)/ k) + j + k + (l / k); j = 0; k = 0; l = 5; } printf("Completed Thread 2\n"); return NULL ; } int main () { pthread_create(&pth1, NULL, threadfunc1, "foo"); pthread_create(&pth2, NULL, threadfunc2, "foo"); pthread_join( pth1, NULL); pthread_join( pth2, NULL); return 1; }

Using double:

 #include <stdio.h> #include <pthread.h> pthread_t pth1,pth2,pth3,pth4; void *threadfunc1(void *parm) { double i,j,k,l; j = 0; k = 0; l = 5; for (i = 0; i < 5000000; i ++) { j = k + 152; k = j + 21; l = j + k + (j * 5) + (k * 2) + (l * 3); j = k + ((l + j)/ k) + j + k + (l / k); j = 0; k = 0; l = 5; } printf("Completed Thread 1\n"); return NULL ; } void *threadfunc2(void *parm) { double i,j,k,l; j = 0; k = 0; l = 5; for (i = 0; i < 5000000; i ++) { j = k + 152; k = j + 21; l = j + k + (j * 5) + (k * 2) + (l * 3); j = k + ((l + j)/ k) + j + k + (l / k); j = 0; k = 0; l = 5; } printf("Completed Thread 2\n"); return NULL ; } int main () { pthread_create(&pth1, NULL, threadfunc1, "foo"); pthread_create(&pth2, NULL, threadfunc2, "foo"); pthread_join( pth1, NULL); pthread_join( pth2, NULL); return 1; }

This has nothing to do with threads. Floating-point operations are simply much slower than integral operations. — Jonathon Reinhart
– Jonathon Reinhart, Commented Dec 2, 2013 at 6:35
ok. I was wondering it is in the case of threads only. Let me get the benchmark for simple code without thread. thanks @JonathonReinhart — Vishwadeep Singh
– Vishwadeep Singh, Commented Dec 2, 2013 at 6:36
yes got that @JonathonReinhart and @ jeyaram found the difference in benchmark with simple code also. thanks — Vishwadeep Singh
– Vishwadeep Singh, Commented Dec 2, 2013 at 6:40
Unless l + j is a[n integer] multiple of k, the expression (l + j)/ k has completely different meaning when the types of l, j, and k are floating point types as opposed to integer types. — R.. GitHub STOP HELPING ICE
– R.. GitHub STOP HELPING ICE, Commented Dec 2, 2013 at 6:47

Michael · Accepted Answer · 2013-12-02 08:12:26Z

This difference is because of usage of floating point. For example, have a look at the following simple program:

#include <stdlib.h> #include <stdio.h> int main(int argc, char *argv[]) { TYPE i,s=0; for (i = 0; i < 100; i++) { s += i; } printf("Sum=%d\n", s); return 0; }

Compile it with gcc -o main main.c and have a look on its main() function disassembly for TYPE defined as fixed (left) and double (right): fixed vs float, no optimization Arrows show for(){} loop from main. Target is X86 processor.

For gcc -O3 -o main main.c fixed point still wins: enter image description here

Thus fixed point is more preferable for high speed computations if algorithm allows its usage. And this situation remains almost the same if double is replaced with a float.

Moreover some processors have no floating point at all and use special optimized emulation libraries (for instance - TI C64x+ family). In that case difference between performance of fixed and floating point will ~10x.

@AkiSuihkonen I meant that I used x86 compatible processor, not exactly 8086 or similar processor. You think it will be better to correct it?
The stack based FP processor is probably slower (having 80 bit internal precision) than its xmm based counterpart.
@AkiSuihkonen Undoubtedly, SSE can be advantageous, but does compiler itself can add it to program code? I think instrinsics should be used to do that. In addition it is not very comfortable to use it in a loops like in the question.

egur · Accepted Answer · 2013-12-02 07:26:44Z

Floating point arithmetic operations take more CPU cycles than integers, the HW is much (much much) more complex.

This has nothing to do with threads.

Also most processors have more parallel execution resources for integers than they have for floating point as integer operations are used more than floating point in general.

Collectives™ on Stack Overflow

Why thread having int variable calculation is faster than thread having double variable? [duplicate]

2 Answers 2

6 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Linked

Related