I have a C app (VStudio 2010, win7 64bit) running on a machine with dual xeon chips, meaning 12 physical and 24 logical cores, and 192 gig of ram. EDIT: THE OS is win7 (ie, Windows 7, 64 bit).
The app has 24 threads (each thread has its own logical core) doing calculations and filling up a different part of a massive C structure. The structure, when all the threads are finished (and the threads are all perfectly balanced so they complete at the same time), is about 60 gigabytes.
(I have control over the hardware setup, so I am going to be using 6 2tb drives running RAID 0, which means the physical limits on writing will be approximately 6x the average sequential write speed, or about 2 gig/second.)
What is the most efficient way to get this to disk? Obviously, the i/o time will dwarf the compute time. From my research on this topic, it seems like write() (as opposed to fwrite()) is the way to go. But what other optimizations can I do on the software side, in terms of setting buffer sizes, etc. Would mmap be more efficient?
mmaptag. Is that available for your system?