2

I am starting a small project for a key-value store, in C++. I am wondering how C++ std streams compare to mmap in terms of scalability and performance. How does using ifstream::seekg on a file that wouldn't fit in RAM compare to using mmap/lseek?

3
  • 2
    Why don't you make a small test and see? Anyways, there're a lot of variables like portability, distribution, actual problem to solve and so on. Commented Nov 29, 2015 at 15:37
  • 1
    What kind of data? What size? What computer? Commented Nov 29, 2015 at 15:46
  • 1
    This is basically covered in stackoverflow.com/questions/5588605/mmap-vs-read although iostreams introduce additional overhead to read. Commented Nov 29, 2015 at 15:58

1 Answer 1

2

Ultimately, any Linux user-land application is using syscalls(2), including the C++ I/O library.

With great care, mmap and madvise (or lseek + read & posix_fadvise) could be more efficient that C++ streams (which are using read and other syscalls(2)...); but a misuse of syscalls (e.g. read-ing too small buffer) can give catastrophic performance

Also, Linux has a very good page cache (used to contain parts of recently accessed file data). And performance also depends upon the file system (and the hardware -SSD and mechanical hard disks are different beasts- and computer).

Maybe you should not reinvent your own thing and use sqlite, or gdbm, or redis, or mongodb, or postgresql, or memcached, etc...

Performance and trade-offs depend strongly on the actual use (a single 4Gbytes log file on your laptop is not the same as petabytes of video or genomics data in a datacenter). So benchmark (and notice that many tools like the ones I mentioned can be tuned wisely).

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.