7
$\begingroup$

I'm doing something where the most sensible approach seems to be to open a file and write to it as I process data from some large tables. Old school. While testing the program, I remembered the low level write statements I use to use to do this kind of thing in FORTRAN and how fast they seemed to be, and I wondered how WriteString compared, so I did a little test. It took 10.875 seconds to write 1,000,000 records with record number, "Hello World." That seems pretty fast, but how does that compare to other languages? I use only Mathematica these days so I can't easily do any comparisons.

funit = OpenWrite[NotebookDirectory[] <> "writetest.csv"]; Timing[Do[WriteString[funit, i, ",Hello World\n"];, {i, 1, 1000000}]] (* {10.875000, Null} *) 
$\endgroup$
5
  • $\begingroup$ Can you please post plain code and not an image. $\endgroup$ Commented Feb 7, 2014 at 6:08
  • $\begingroup$ You can probably do a little bit better with Scan[WriteString[funit, #, ",Hello World\n"] &, Range[1000000]] $\endgroup$ Commented Feb 7, 2014 at 6:25
  • $\begingroup$ Probably faster, unless you need to write as it happens, to build list of outputs and fire it out in one go. $\endgroup$ Commented Feb 7, 2014 at 7:03
  • $\begingroup$ Mathematica has much worse performance here not because of write commands, but the loop (and perhaps type conversion). I have tried Timing[(StringJoin@ConstantArray["Hello World\n", 1000000]) >> "/tmp/writetest2.csv"], which takes less than a second. But I haven't found a fast way to output the line number at the same time. $\endgroup$ Commented Feb 7, 2014 at 9:59
  • $\begingroup$ I'd like to see more of these types of questions :) thanks @George $\endgroup$ Commented Mar 24, 2016 at 20:52

2 Answers 2

8
$\begingroup$

Update: I thought to summarize all results in a small table, to make it easy to see. Thanks for george2079 for adding the C++ and Python results (may be I'll do Java later) results in seconds. Lower is better. Notice that Fortran was run on a virtual machine (VBox).

Mathematica graphics

Grid[{ {"Mathematica", "Matlab (elapsed)", Column[{"Fortran", "Virtual machine)"}, Alignment -> Center], "C++", "Python"}, {Grid[{ {"AbsoluteTiming", "Timing", "Command Line"}, {6, 8.9, 7.3}} ], 9.2, Grid[{ {"elasped", "CPU_TIMING"}, {0.5, 0.25} }], 0.06, 0.44} }, Frame -> All] 

Original answer

I am no expert in any of these, so there might be better way to do this in Matlab and Fortran. But this is what I get. All on same PC, windows 7. The linux is on a VM installed on top of windows. The VM is 32 bit Linux mint.

Mathematica 9.01, 64 bit windows 7

funit = OpenWrite[NotebookDirectory[] <> "writetest.csv"]; Timing[Do[WriteString[funit, i, ",Hello World\n"];, {i, 1, 1000000}]] Close[funit] (* {5.912438, Null} *) (* {5.928038, Null} *) (* {6.006038, Null} *) 

Version using AbsoluteTiming based on comment below

funit = OpenWrite[NotebookDirectory[] <> "writetest.csv"]; AbsoluteTiming[Do[WriteString[funit, i, ",Hello World\n"];, {i, 1, 1000000}]] Close[funit] {9.009644, Null} {8.890629, Null} {8.866126, Null} 

Matlab 2013a, 32bit, on windows 7 64 bit

%w.m file if(~isdeployed) cd(fileparts(which(mfilename))); end fid = fopen('writetest.csv','W'); %notice, W and not w, faster tic; for i=1:1000000 fprintf(fid,'%s\n','Hello World'); end toc fclose(fid); 

result

EDU>> w Elapsed time is 9.321961 seconds. EDU>> w Elapsed time is 9.265512 seconds. EDU>> w Elapsed time is 9.297699 seconds. 

Mathematica graphics

gfortran version 4.8.1 (Ubuntu/Linaro 4.8.1-10ubuntu9)

>gfortran -v Target: i686-linux-gnu gcc version 4.8.1 (Ubuntu/Linaro 4.8.1-10ubuntu9) program w implicit none integer, parameter :: out = 10 integer :: i real :: start, finish call cpu_time(start) open(out, file='writetest.csv', status='replace', action='write') DO i = 1,1000000 write(out,*) ',Hello World' END DO call cpu_time(finish) print '("Time = ",f6.3," seconds.")',finish-start end program w >gfortran -Wextra -Wall -pedantic -fcheck=all -fwhole-file w.f90 >./a.out Time = 0.272 seconds. >./a.out Time = 0.260 seconds. >./a.out Time = 0.244 seconds. > 

Mathematica graphics

I am using CPU_TIME to measure Fortran CPU.

Returns a REAL value representing the elapsed CPU time in seconds. This is useful for testing segments of code to determine execution time.

Based on comment below. I redid the timing For fortran, I am only familiar with CPU_TIME. But Linux itself has the command /usr/bin/time so this below measures the whole program timing from the shell itself.

program w integer :: i integer, parameter :: out = 10 open(out, file='writetest.csv', status='replace', action='write') DO i = 1,1000000 write(out,*) ',Hello World' END DO end program w 

result

>gfortran -Wextra -Wall -pedantic -fcheck=all -fwhole-file w.f90 >time ./a.out real 0m0.523s %this is total ELAPSED wall clock time user 0m0.024s sys 0m0.240s >time ./a.out real 0m0.486s user 0m0.048s sys 0m0.200s >time ./a.out real 0m0.502s user 0m0.048s sys 0m0.196s 

So, the whole Fortran program took 0.5 seconds in wall clock time. Not much difference from earlier.

Mathematica Timing is

evaluates expr, and returns a list of the time in seconds used, together with the result obtained

Mathematica AbsoluteTiming

evaluates expr, returning a list of the absolute number of seconds in real time that have elapsed, together with the result obtained

and Matlab's tic/toc.

tic starts a stopwatch timer to measure performance. The function records the internal time at execution of the tic command. Display the elapsed time with the toc function.

If so, then Fortran is about 24 times faster than Mathematica and 38 times faster than Matlab.

Will try C++ later if I can or someone else can try.

$\endgroup$
6
  • 1
    $\begingroup$ I think absolute timing would be more appropriate here. I don't think I/O is counted under CPU time. $\endgroup$ Commented Feb 7, 2014 at 11:40
  • $\begingroup$ @Ajasja added version using AbsoluteTiming $\endgroup$ Commented Feb 7, 2014 at 12:23
  • $\begingroup$ use time() for the fortran test $\endgroup$ Commented Feb 7, 2014 at 12:40
  • $\begingroup$ @george2079 add timing using /usr/bin/time $\endgroup$ Commented Feb 7, 2014 at 14:10
  • 1
    $\begingroup$ How can one trust these comparison when they're done on different systems? Obviously hard drive speed can dramatically affect these timings. $\endgroup$ Commented Feb 8, 2014 at 0:10
3
$\begingroup$

C++:

 #include <fstream> main(){ std::ofstream f("test.csv"); for (int i=0;i<1000000;++i)f<<",Hello World\n"; } 

/usr/bin/time: 0.03user .02 sys .06 elapsed

python:

 f=open('test.csv','w') for i in range(1000000):f.write(',Hello World\n') 

.39 user .051 sys .44 elapsed

Mathematica as a command line kernel script:

 f=OpenWrite["test.csv"]; Do[ WriteString[f,",Hello World"] ,{1000000}]; 

/usr/bin/time math -scipt test.m

4.08 user 2.3 sys 7.3 elapsed

(about 1 sec elapsed without the Do loop, just start up and open the file) Incidentally, If I put Timing[] around the loop it reports 5.68s so that seem consistent.

$\endgroup$
1
  • $\begingroup$ For C++, the I/O performance will depend much more on the libraries used and the particular implementation than on the language. I remember that on Windows XP I used to get slight different results with cstdio and iostream using MinGW, significantly different results using MinGW and Visual Studio, and different results again on Linux on the computer. $\endgroup$ Commented Feb 7, 2014 at 18:44

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.