Java: are there situations where disk is as fast as memory?

Question

I'm writing some code to access an inverted index. I have two interchangeable class which perform the reads on the index. One reads the index from the disk, buffering part of it. The other load the index completely in memory, as a byte[][] (the index size is around 7Gb) and read from this multidimensional array. One would expect to have better performances while having the whole data in memory. But my measures state that working with the index on disk it's as fast as having it in memory. (The time spent to load the index in memory isn't counted in the performances)

Why is this happening? Any ideas?

Further information: I've run the code enabling HPROF. Both working "on disk" or "in memory", the most used code it's NOT the one directly related to the reads. Also, for my (limited) understanding, the gc profiler doesn't show any gc related issue.

UPDATE #1: I've instrumented my code to monitor I/O times. It seems that most of the seeks on memory take 0-2000ns, while most of the seeks on disk take 1000-3000ns. The second metric seems a bit too low for me. Is it due disk caching by Linux? Is there a way to exclude disk caching for benchmarking purposes?

UPDATE #2: I've graphed the response time for every request to the index. The line for the memory and for the disk match almost exactly. I've done some other tests using the O_DIRECT flag to open the file (thanks to JNA!) and in that case the disk version of the code is (obviously) slower than memory. So, I'm concluding that the "problem" was because the aggressive Linux disk caching, which is pretty amazing.

UPDATE #3: http://www.nicecode.eu/java-streams-for-direct-io/

The memory version might be slowed down by garbage collections if you are close to your maximum heap size - have you monitored GCs? — assylias
– assylias, Commented Mar 19, 2013 at 18:01
Two possibilities: 1) OS caches disk reads 2) the code performance is not actually constrained by the speed of data access. — Cyrille Ka
– Cyrille Ka, Commented Mar 19, 2013 at 18:01
Even slowed down by GC RAM is still faster than disc (although depends on what kind of disc we're talking about...). — m0skit0
– m0skit0, Commented Mar 19, 2013 at 18:02
You also might be swapping to disk as a result of allocating more heap than physical memory. Hard to tell without profiling. — Mel Nicholson
– Mel Nicholson, Commented Mar 19, 2013 at 18:02
Both working "on disk" or "in memory", the most used code it's NOT the one directly related to the reads. So... you have your answer, no ? — Cyrille Ka
– Cyrille Ka, Commented Mar 19, 2013 at 18:05

Jon Skeet · Accepted Answer · 2013-03-19 18:02:16Z

Three possibilities off the top of my head:

The operating system is already keeping all of the index file in memory via its file system cache. (I'd still expect an overhead, mind you.)
The index isn't the bottleneck of the code you're testing.
Your benchmarking methodology isn't quite right. (It can be very hard to do benchmarking well.)

The middle option seems the most likely to me.

if memory it's faster than disk, and if the code performs the same number of reads for memory and for disk, shouldn't the memory version be faster?
@MatteoCatena: Yes. But if you don't perform many reads, but you spend a lot of time doing other things, then the difference may get lost in the noise.

m0skit0 · Accepted Answer · 2013-03-19 18:02:12Z

2

No, disk can never be as fast as RAM (RAM is actually in the order of 100,000 times faster for magnetic discs). Most likely the OS is mapping your file in memory for you.

answered Mar 19, 2013 at 18:02

m0skit0

26k13 gold badges84 silver badges131 bronze badges

6 Comments

Matteo Over a year ago

Can you be more detailed in your answer, please? It seems strange to me that the OS caches in RAM a 7GB file.

m0skit0 Over a year ago

Of course I don't mean the whole file, but the OS might be preloading in buffers when your process is not executing, anticipating your readings.

Matteo Over a year ago

Are there way to confirm this?

m0skit0 Over a year ago

Checking the OS source code (if available). Maybe some profiling tools can give you more insight as well. The question is are you sure your issue lies here?. Check Jon Skeet's answer, specially nº2.

Matteo Over a year ago

Do you know anyway to exclude disk caching in a benchmarking on Linux?

|

Collectives™ on Stack Overflow

Java: are there situations where disk is as fast as memory?

2 Answers 2

2 Comments

6 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

6 Comments

Related