2

I have a very large disk drive (2TB), but not very much RAM (8GB). I'd like to be able to run some big data experiments on a large file (~200GB) that exists on my disk's filesystem. I understand that will be very expensive in terms of disk bandwidth, but I don't mind the high I/O usage.

How could I load this huge file into a C++ array, so that I could perform reads and writes to the file at locations of my choosing? Does mmap work for this purpose? What parameter options should I be using to do this? I don't want to trigger the OOM killer at any point of running my program.

I know that mmap supports file-backed and anonymous mappings but I'm not entirely sure which to use. What about between using a private vs shared mapping?

1 Answer 1

2

It only makes sense to use a file-backed mapping to mmap a file, not an anonymous mapping. If you want to write to the mapped memory and have the changes get written back to the file, then you need to use a shared mapping. With a file-backed, shared mapping, you don't need to worry about the OOM killer, so as long as your process is 64-bit, there's no problem with just mapping the entire file into memory. (And even if you weren't 64-bit, the problem would be lack of address space, not lack of RAM, so the OOM killer still wouldn't affect you; your mmap would just fail.)

2

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.