7

I have a C app (VStudio 2010, win7 64bit) running on a machine with dual xeon chips, meaning 12 physical and 24 logical cores, and 192 gig of ram. EDIT: THE OS is win7 (ie, Windows 7, 64 bit).

The app has 24 threads (each thread has its own logical core) doing calculations and filling up a different part of a massive C structure. The structure, when all the threads are finished (and the threads are all perfectly balanced so they complete at the same time), is about 60 gigabytes.

(I have control over the hardware setup, so I am going to be using 6 2tb drives running RAID 0, which means the physical limits on writing will be approximately 6x the average sequential write speed, or about 2 gig/second.)

What is the most efficient way to get this to disk? Obviously, the i/o time will dwarf the compute time. From my research on this topic, it seems like write() (as opposed to fwrite()) is the way to go. But what other optimizations can I do on the software side, in terms of setting buffer sizes, etc. Would mmap be more efficient?

5
  • please add a tag on which language you want to write in. that helps others find this question easily. Commented Dec 9, 2011 at 18:34
  • How long does the computation take? Commented Dec 9, 2011 at 18:45
  • I see a mmap tag. Is that available for your system? Commented Dec 9, 2011 at 18:47
  • Just write it. It will be quickly copied to the file system cache with a memory-to-memory copy. From which it will be written to disk, long after your program exited. You've got plenty of RAM. Commented Dec 9, 2011 at 18:50
  • My mistake about mmap; I didn't realize that is not available if using visual c (which I am, not c++). The computation takes about .5 seconds. Commented Dec 9, 2011 at 19:20

2 Answers 2

8

mmap(), or boost mmap is almost always the best approach. The OS is smarter than you, let it worry about what to cache!

You didn't say what OS, but on Linux the madvise, or equivalent boost hints can really boost performance.

Sign up to request clarification or add additional context in comments.

1 Comment

+1, Always, always let somebody else sweat as many details as possible!
6

It is hard to judge the best thing for your situation.

The first optimization to make is to preallocate the file. That way your file system does not need to keep extending its size. That should optimize some disk operations. However, avoid writing actual zeros to disk. Just set the length.

Then you have choices between mmap and write. This also depends on the operating system you use. On a Unix I would try both mmap and pwrite. pwrite is useful because each of your threads can write into the file at the desired file position without fighting over file offsets.

mmap could be good because instead of making copies into file cache, your threads would be writing directly into file cache. 60 GB is probably too large to mmap the entire file, so each thread will likely need its own mmap window onto the file which it can move around.

In Windows you would probably want to try using overlapped, asynchronous IO. That can only be done with Win32 API calls.

2 Comments

Windows has the equivalent of mmap (CreateFileMapping, MapViewOfFile), and it's likely to be good for the same reasons Zan listed.
And for the same reasons (it's what the OS uses) mapped files are good performance on Windows as well. Plus windows can map a file on a network drive. Unix didn't used to be able to do mmap over nfs - has that changed?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.