I get a 512^3 array representing a Temperature distribution from a simulation (written in Fortran). The array is stored in a binary file that's about 1/2G in size. I need to know the minimum, maximum and mean of this array and as I will soon need to understand Fortran code anyway, I decided to give it a go and came up with the following very easy routine.
integer gridsize,unit,j real mini,maxi double precision mean gridsize=512 unit=40 open(unit=unit,file='T.out',status='old',access='stream',& form='unformatted',action='read') read(unit=unit) tmp mini=tmp maxi=tmp mean=tmp do j=2,gridsize**3 read(unit=unit) tmp if(tmp>maxi)then maxi=tmp elseif(tmp<mini)then mini=tmp end if mean=mean+tmp end do mean=mean/gridsize**3 close(unit=unit) This takes about 25 seconds per file on the machine I use. That struck me as being rather long and so I went ahead and did the following in Python:
import numpy mmap=numpy.memmap('T.out',dtype='float32',mode='r',offset=4,\ shape=(512,512,512),order='F') mini=numpy.amin(mmap) maxi=numpy.amax(mmap) mean=numpy.mean(mmap) Now, I expected this to be faster of course, but I was really blown away. It takes less than a second under identical conditions. The mean deviates from the one my Fortran routine finds (which I also ran with 128-bit floats, so I somehow trust it more) but only on the 7th significant digit or so.
How can numpy be so fast? I mean you have to look at every entry of an array to find these values, right? Am I doing something very stupid in my Fortran routine for it to take so much longer?
EDIT:
To answer the questions in the comments:
- Yes, also I ran the Fortran routine with 32-bit and 64-bit floats but it had no impact on performance.
- I used
iso_fortran_envwhich provides 128-bit floats. - Using 32-bit floats my mean is off quite a bit though, so precision is really an issue.
- I ran both routines on different files in different order, so the caching should have been fair in the comparison I guess ?
- I actually tried open MP, but to read from the file at different positions at the same time. Having read your comments and answers this sounds really stupid now and it made the routine take a lot longer as well. I might give it a try on the array operations but maybe that won't even be necessary.
- The files are actually 1/2G in size, that was a typo, Thanks.
- I will try the array implementation now.
EDIT 2:
I implemented what @Alexander Vogt and @casey suggested in their answers, and it is as fast as numpy but now I have a precision problem as @Luaan pointed out I might get. Using a 32-bit float array the mean computed by sum is 20% off. Doing
... real,allocatable :: tmp (:,:,:) double precision,allocatable :: tmp2(:,:,:) ... tmp2=tmp mean=sum(tmp2)/size(tmp) ... Solves the issue but increases computing time (not by very much, but noticeably). Is there a better way to get around this issue? I couldn't find a way to read singles from the file directly to doubles. And how does numpy avoid this?
Thanks for all the help so far.
minval(),maxval(), andsum()? Furthermore, you are mixing IO with the operations in Fortran, but not in Python - that is not a fair comparison ;-)