18

I have the following 10000000x2 matrix:

0 0 1 1 2 2 .. .. 10000000 10000000 

Now I want to save this matrix to int[][] array:

import com.google.common.base.Stopwatch; static void memory(int size) throws Exception { System.out.println("Memory"); Stopwatch s = Stopwatch.createStarted(); int[][] l = new int[size][2]; for (int i = 0; i < size; i++) { l[i][0] = i; l[i][1] = i; } System.out.println("Keeping " + size + " rows in-memory: " + s.stop()); } public static void main(String[] args) throws Exception { int size = 10000000; memory(size); memory(size); memory(size); memory(size); memory(size); } 

The output:

Keeping 10000000 rows in-memory: 2,945 s Keeping 10000000 rows in-memory: 408,1 ms Keeping 10000000 rows in-memory: 761,5 ms Keeping 10000000 rows in-memory: 543,7 ms Keeping 10000000 rows in-memory: 408,2 ms 

Now I want to save this matrix to disk:

import com.google.common.base.Stopwatch; import java.io.BufferedOutputStream; import java.io.FileOutputStream; static void file(int size, int fileIndex) throws Exception { Stopwatch s = Stopwatch.createStarted(); FileOutputStream outputStream = new FileOutputStream("D:\\file" + fileIndex); BufferedOutputStream buf = new BufferedOutputStream(outputStream); for (int i = 0; i < size; i++) { buf.write(bytes(i)); buf.write(bytes(i)); } buf.close(); outputStream.close(); System.out.println("Writing " + size + " rows: " + s.stop()); } public static void main(String[] args) throws Exception { int size = 10000000; file(size, 1); file(size, 2); file(size, 3); file(size, 4); file(size, 5); } 

The output:

Writing 10000000 rows: 715,8 ms Writing 10000000 rows: 636,6 ms Writing 10000000 rows: 614,6 ms Writing 10000000 rows: 598,0 ms Writing 10000000 rows: 611,9 ms 

Shouldn't be saving to memory much faster?

6
  • 19
    You're not taking into account that all modern operating systems have caches, so when you're writing to a file, it doesn't necessarily mean that the physical disk is going to be touched right away Commented Jul 31, 2014 at 6:38
  • 4
    The OutputStream is buffered, so it's only writing to memory until the buffer is full before writing it to disk...You could try flushing the buffer on each iteration or get rid of it altogether... Commented Jul 31, 2014 at 6:39
  • 1
    You are not writing to the file directly.. You are writing to a stream which is in memory. It will then be written to the hard disk asynchronously. Commented Jul 31, 2014 at 6:40
  • @MadProgrammer The default buffer size is just 8K. Surely OP is writing plenty more than that. Commented Jul 31, 2014 at 6:50
  • 1
    The comments are correct here that both the buffering in BufferedOutputStream and the OS buffer cache will mask the latency of the physical disk write. I'll just add that calling buf.flush() and then outputStream.getChannel().force(true) after the writes would force the write to go all the way to physical disk. Most applications wouldn't do this, but it's useful if you have a requirement for a durable write. Commented Jul 31, 2014 at 7:05

2 Answers 2

21

As said in the comments, you're not measuring anything useful. The JVM caches the write operation in its memory, which it then flushes to the operating system, which caches it in its memory before finally writing it to disk at some point.
But you're only measuring the time it takes the JVM to cache it in its own memory (which is all you can measure).

Anyway, you shouldn't bother with such micro optimisations.

Sign up to request clarification or add additional context in comments.

5 Comments

Optimizing the writing of a huge amount of data to disk is not exactly "micro".
@MarkoTopolnik true, but worrying about whether writing to an in memory cache designed to handle that write to disc or to build your own cache is...
But how could the file test be faster?! BufferedOutputStream surely does a little more than assigning an int to an array slot. This answer does not explain the outcome.
@usr: because the memory tests needs much more memory, which needs to be allocated from the OS, which might involve swapping other stuff to disk first, depending on the state of the machine. First iterations of anything only provide reliable results if you control your machine environment very carefully, and wall clock time measurements are seldom reproducable if anything else is running in the background (which is almost always the case with every non-ancient operating system),
@GuntramBlohm your comment is at least starting to explain the behavior. It's not like this answer contributes nothing, but it falls short of creating an understanding what is going on.
1

Your hard drive and operating system employ write buffering so that your system can continue operation in the face of multiple concurrent tasks (for example, programs reading and writing the disk). This can (and sometimes does) lead to data loss in the event of power failure on desktop class machines. Servers and laptops can also experience the issue (but usually employ sophisticated technology called a battery to mitigate the chances). Anyway, on Linux you might have to fsck and on Windows you might chkdsk when it happens.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.