6

Using the following code as a benchmark, the system can write 10,000 rows to disk in a fraction of a second:

void withSync() { int f = open( "/tmp/t8" , O_RDWR | O_CREAT ); lseek (f, 0, SEEK_SET ); int records = 10*1000; clock_t ustart = clock(); for(int i = 0; i < records; i++) { write(f, "012345678901234567890123456789" , 30); fsync(f); } clock_t uend = clock(); close (f); printf(" sync() seconds:%lf writes per second:%lf\n", ((double)(uend-ustart))/(CLOCKS_PER_SEC), ((double)records)/((double)(uend-ustart))/(CLOCKS_PER_SEC)); } 

In the above code, 10,000 records can be written and flushed out to disk in a fraction of a second, output below:

sync() seconds:0.006268 writes per second:0.000002 

In the Java version, it takes over 4 seconds to write 10,000 records. Is this just a limitation of Java, or am I missing something?

public void testFileChannel() throws IOException { RandomAccessFile raf = new RandomAccessFile(new File("/tmp/t5"),"rw"); FileChannel c = raf.getChannel(); c.force(true); ByteBuffer b = ByteBuffer.allocateDirect(64*1024); long s = System.currentTimeMillis(); for(int i=0;i<10000;i++){ b.clear(); b.put("012345678901234567890123456789".getBytes()); b.flip(); c.write(b); c.force(false); } long e=System.currentTimeMillis(); raf.close(); System.out.println("With flush "+(e-s)); } 

Returns this:

With flush 4263 

Please help me understand what is the correct/fastest way to write records to disk in Java.

Note: I am using the RandomAccessFile class in combination with a ByteBuffer as ultimately we need random read/write access on this file.

2
  • Your comparison isn't fair. You are using a ByteBuffer and calling .getBytes() in the Java version. If your idea is to test performance for your application then this is okay. But to compare to C this is unfair as you are doing different things. Commented Nov 9, 2012 at 7:07
  • 1
    It's more than fair. Using a ByteBuffer and .getBytes is actually faster (in my tests on my machine at least) than doing it in Java in any other way. If you have other suggestions on how to do random access in Java I am very open to hear them. Thanks! Commented Nov 9, 2012 at 11:40

4 Answers 4

5

Actually, I am surprised that test is not slower. The behavior of force is OS dependent but broadly it forces the data to disk. If you have an SSD you might achieve 40K writes per second, but with an HDD you won't. In the C example its clearly isn't committing the data to disk as even the fastest SSD cannot perform more than 235K IOPS (That the manufacturers guarantee it won't go faster than that :D )

If you need the data committed to disk every time, you can expect it to be slow and entirely dependent on the speed of your hardware. If you just need the data flushed to the OS and if the program crashes but the OS does not, you will not loose any data, you can write data without force. A faster option is to use memory mapped files. This will give you random access without a system call for each record.

I have a library Java Chronicle which can read/write 5-20 millions records per second with a latency of 80 ns in text or binary formats with random access and can be shared between processes. This only works this fast because it is not committing the data to disk on every record, but you can test that if the JVM crashes at any point, no data written to the chronicle is lost.

Sign up to request clarification or add additional context in comments.

9 Comments

I would expect that flush is pushing out the buffer to the OS. Without the blush it might be buffering a few lines at a time.
Your suggestions making a lot of sense! The geek in me wants to find a way to confirm this for sure... Maybe some tests that involve the power cable being disconnected (:
Try polling the file size and see in what multiples it grows.
If the data must be committed to disk then you want the sync system call: linux.die.net/man/2/sync
From the OS/X fsync(2) man page: "For applications that require tighter guarantees about the integrity of their data, Mac OS X provides the F_FULLFSYNC fcntl. The F_FULLFSYNC fcntl asks the drive to flush all buffered data to permanent storage. Applications, such as databases, that require a strict ordering of writes should use F_FULLF-SYNC to ensure that their data is written in the order they expect. Please see fcntl(2) for more detail."
|
1

This code is more similar to what you wrote in C. Takes only 5 msec on my machine. If you really need to flush after every write, it takes about 60 msec. Your original code took about 11 seconds on this machine. BTW, closing the output stream also flushes.

public static void testFileOutputStream() throws IOException { OutputStream os = new BufferedOutputStream( new FileOutputStream( "/tmp/fos" ) ); byte[] bytes = "012345678901234567890123456789".getBytes(); long s = System.nanoTime(); for ( int i = 0; i < 10000; i++ ) { os.write( bytes ); } long e = System.nanoTime(); os.close(); System.out.println( "outputstream " + ( e - s ) / 1e6 ); } 

4 Comments

Turning off flushing makes the code above execute in about 0.15 seconds on my machine (: The software we right needs to be able to guarantee that when it says data was saved, it really was saved.
So, with flushing, it's still only 60 msec... BTW, fflush does not actually write out to disc. What is your time with fsync for the C version? fsync is similar to os.getFD().sync() when you remove the BufferedOutputStream decoration. Syncing is really slow though: the test then takes 6 seconds here.
Either way, this method doesn't support random file acces. Using fsync doesn't slow down the C code significantly.
@Jacob From the manpage of fsync: Note that while fsync() will flush all data from the host to the drive (i.e. the "permanent storage device"), the drive itself may not physically write the data to the platters for quite some time and it may be written in an out-of-order sequence. You need to call fcntl with F_FULLFSYNC to be sure.
0

Java equivalent of fputs is file.write("012345678901234567890123456789"); , you are calling 4 functions and just 1 in C, delay seems obvious

6 Comments

That's not the reason that it's 5 orders of magnitude slower. There's something else that is resulting in a massive slow down
I appreciate your effort to reply, however my tests indicate using write() then flush() or other DirectFileAccess methods are marginally slower. Either way, we are talking about stuff that is disk bound not CPU bound. I can find no java code that is faster than this.
@dave: Virtual Machine vs compiled ;)
@DavidRF: Well if that's your view put that in your answer. Though I still don't think that your average JVM is the reason for 5 orders of magnitude slow down. Java programs are 10,000x slower than C? Quick tell the world not to write another line of Java!!! Either that or you are wrong. I'm going with the latter (because of Occam's razor).
But really, 4seconds vs 0.001 seconds to write 10,000 records to a random access file? That's 4,000 times slower! Is java really that bad?
|
0

i think this is most similar to your C version. i think the direct buffers in your java example are causing many more buffer copies than the C version. this takes about 2.2s on my (old) box.

 public static void testFileChannelSimple() throws IOException { RandomAccessFile raf = new RandomAccessFile(new File("/tmp/t5"),"rw"); FileChannel c = raf.getChannel(); c.force(true); byte[] bytes = "012345678901234567890123456789".getBytes(); long s = System.currentTimeMillis(); for(int i=0;i<10000;i++){ raf.write(bytes); c.force(true); } long e=System.currentTimeMillis(); raf.close(); System.out.println("With flush "+(e-s)); } 

Comments