Recently, I read a post on Stack Overflow about finding integers that are perfect squares. As I wanted to play with this, I wrote the following small program:
PROGRAM PERFECT_SQUARE IMPLICIT NONE INTEGER*8 :: N, M, NTOT LOGICAL :: IS_SQUARE N=Z'D0B03602181' WRITE(*,*) IS_SQUARE(N) NTOT=0 DO N=1,1000000000 IF (IS_SQUARE(N)) THEN NTOT=NTOT+1 END IF END DO WRITE(*,*) NTOT ! should find 31622 squares END PROGRAM LOGICAL FUNCTION IS_SQUARE(N) IMPLICIT NONE INTEGER*8 :: N, M ! check if negative IF (N.LT.0) THEN IS_SQUARE=.FALSE. RETURN END IF ! check if ending 4 bits belong to (0,1,4,9) M=IAND(N,15) IF (.NOT.(M.EQ.0 .OR. M.EQ.1 .OR. M.EQ.4 .OR. M.EQ.9)) THEN IS_SQUARE=.FALSE. RETURN END IF ! try to find the nearest integer to sqrt(n) M=DINT(SQRT(DBLE(N))) IF (M**2.NE.N) THEN IS_SQUARE=.FALSE. RETURN END IF IS_SQUARE=.TRUE. RETURN END FUNCTION When compiling with gfortran -O2, running time is 4.437 seconds, with -O3 it is 2.657 seconds. Then I thought that compiling with ifort -O2 could be faster since it might have a faster SQRT function, but it turned out running time was now 9.026 seconds, and with ifort -O3 the same. I tried to analyze it using Valgrind, and the Intel compiled program indeed uses many more instructions.
My question is why? Is there a way to find out where exactly the difference comes from?
EDITS:
- gfortran version 4.6.2 and ifort version 12.0.2
- times are obtained from running
time ./a.outand is the real/user time (sys was always almost 0) - this is on Linux x86_64, both gfortran and ifort are 64-bit builds
- ifort inlines everything, gfortran only at -O3, but the latter assembly code is simpler than that of ifort, which uses xmm registers a lot
- fixed line of code, added
NTOT=0before loop, should fix issue with other gfortran versions
When the complex IF statement is removed, gfortran takes about 4 times as much time (10-11 seconds). This is to be expected since the statement approximately throws out about 75% of the numbers, avoiding to do the SQRT on them. On the other hand, ifort only uses slightly more time. My guess is that something goes wrong when ifort tries to optimize the IF statement.
EDIT2:
I tried with ifort version 12.1.2.273 it's much faster, so looks like they fixed that.
time <program>for each one? And were these 32-bit builds or 64-bit builds?valgrind --tool=callgrind --dump-instr=yesalso gives the assembly code, but that's really complex (many differences) and depends on the level of optimization.