13

As the title says, I'm looking for the fastest possible way to write integer arrays to files. The arrays will vary in size, and will realistically contain anywhere between 2500 and 25 000 000 ints.

Here's the code I'm presently using:

DataOutputStream writer = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(filename))); for (int d : data) writer.writeInt(d); 

Given that DataOutputStream has a method for writing arrays of bytes, I've tried converting the int array to a byte array like this:

private static byte[] integersToBytes(int[] values) throws IOException { ByteArrayOutputStream baos = new ByteArrayOutputStream(); DataOutputStream dos = new DataOutputStream(baos); for (int i = 0; i < values.length; ++i) { dos.writeInt(values[i]); } return baos.toByteArray(); } 

and like this:

private static byte[] integersToBytes2(int[] src) { int srcLength = src.length; byte[] dst = new byte[srcLength << 2]; for (int i = 0; i < srcLength; i++) { int x = src[i]; int j = i << 2; dst[j++] = (byte) ((x >>> 0) & 0xff); dst[j++] = (byte) ((x >>> 8) & 0xff); dst[j++] = (byte) ((x >>> 16) & 0xff); dst[j++] = (byte) ((x >>> 24) & 0xff); } return dst; } 

Both seem to give a minor speed increase, about 5%. I've not tested them rigorously enough to confirm that.

Are there any techniques that will speed up this file write operation, or relevant guides to best practice for Java IO write performance?

3
  • 2
    How do you want the file contents to be formatted, exactly? Commented Dec 5, 2010 at 12:55
  • Inlining the code yourself will make code which hasn't warmed up faster. However if you run the test for 5-10 seconds you will see if this has make a real improvement. (As he JVM will do this for you) Commented Dec 5, 2010 at 13:17
  • @Karl just a sequence of ints with no formatting. Commented Dec 5, 2010 at 14:35

6 Answers 6

26

I had a look at three options:

  1. Using DataOutputStream;
  2. Using ObjectOutputStream (for Serializable objects, which int[] is); and
  3. Using FileChannel.

The results are

DataOutputStream wrote 1,000,000 ints in 3,159.716 ms ObjectOutputStream wrote 1,000,000 ints in 295.602 ms FileChannel wrote 1,000,000 ints in 110.094 ms 

So the NIO version is the fastest. It also has the advantage of allowing edits, meaning you can easily change one int whereas the ObjectOutputStream would require reading the entire array, modifying it and writing it out to file.

Code follows:

private static final int NUM_INTS = 1000000; interface IntWriter { void write(int[] ints); } public static void main(String[] args) { int[] ints = new int[NUM_INTS]; Random r = new Random(); for (int i=0; i<NUM_INTS; i++) { ints[i] = r.nextInt(); } time("DataOutputStream", new IntWriter() { public void write(int[] ints) { storeDO(ints); } }, ints); time("ObjectOutputStream", new IntWriter() { public void write(int[] ints) { storeOO(ints); } }, ints); time("FileChannel", new IntWriter() { public void write(int[] ints) { storeFC(ints); } }, ints); } private static void time(String name, IntWriter writer, int[] ints) { long start = System.nanoTime(); writer.write(ints); long end = System.nanoTime(); double ms = (end - start) / 1000000d; System.out.printf("%s wrote %,d ints in %,.3f ms%n", name, ints.length, ms); } private static void storeOO(int[] ints) { ObjectOutputStream out = null; try { out = new ObjectOutputStream(new FileOutputStream("object.out")); out.writeObject(ints); } catch (IOException e) { throw new RuntimeException(e); } finally { safeClose(out); } } private static void storeDO(int[] ints) { DataOutputStream out = null; try { out = new DataOutputStream(new FileOutputStream("data.out")); for (int anInt : ints) { out.write(anInt); } } catch (IOException e) { throw new RuntimeException(e); } finally { safeClose(out); } } private static void storeFC(int[] ints) { FileOutputStream out = null; try { out = new FileOutputStream("fc.out"); FileChannel file = out.getChannel(); ByteBuffer buf = file.map(FileChannel.MapMode.READ_WRITE, 0, 4 * ints.length); for (int i : ints) { buf.putInt(i); } file.close(); } catch (IOException e) { throw new RuntimeException(e); } finally { safeClose(out); } } private static void safeClose(OutputStream out) { try { if (out != null) { out.close(); } } catch (IOException e) { // do nothing } } 
Sign up to request clarification or add additional context in comments.

6 Comments

Nice tests, but I get an error with the FileChannel: java.nio.channels.NonReadableChannelException. Do you know why?
I used @dacwe's method to write to the FileChannel, modified code is here pastebin.com/HhpcS7HX
i get the same exception, an idea why anyone?
The problem is the code tries to read and write from a write-only object; FileOutputStream supports only writing. Instead, the code should use a RandomAccessFile opened with "rw" instead.
Another little bug is that out.write(anInt); in a DataOutputStream writes a byte, not an integer. With integers the performance could be still worse. On the other side, you should wrap the FileOutputStream in a BufferedOutputStream.
|
7

I would use FileChannel from the nio package and ByteBuffer. This approach seems (on my computer) give 2 to 4 times better write performance:

Output from program:

normal time: 2555 faster time: 765 

This is the program:

public class Test { public static void main(String[] args) throws IOException { // create a test buffer ByteBuffer buffer = createBuffer(); long start = System.currentTimeMillis(); { // do the first test (the normal way of writing files) normalToFile(new File("first"), buffer.asIntBuffer()); } long middle = System.currentTimeMillis(); { // use the faster nio stuff fasterToFile(new File("second"), buffer); } long done = System.currentTimeMillis(); // print the result System.out.println("normal time: " + (middle - start)); System.out.println("faster time: " + (done - middle)); } private static void fasterToFile(File file, ByteBuffer buffer) throws IOException { FileChannel fc = null; try { fc = new FileOutputStream(file).getChannel(); fc.write(buffer); } finally { if (fc != null) fc.close(); buffer.rewind(); } } private static void normalToFile(File file, IntBuffer buffer) throws IOException { DataOutputStream writer = null; try { writer = new DataOutputStream(new BufferedOutputStream( new FileOutputStream(file))); while (buffer.hasRemaining()) writer.writeInt(buffer.get()); } finally { if (writer != null) writer.close(); buffer.rewind(); } } private static ByteBuffer createBuffer() { ByteBuffer buffer = ByteBuffer.allocate(4 * 25000000); Random r = new Random(1); while (buffer.hasRemaining()) buffer.putInt(r.nextInt()); buffer.rewind(); return buffer; } } 

3 Comments

Can you re-test using a direct memory buffer? That should make the write faster (as it has to copy to a direct buffer otherwise)
Also try a BufferOutputStream with 64K buffer size
Thanks, the FileChannel approach is much faster.
5

Benchmarks should be repeated every once in a while, shouldn't they? :) After fixing some bugs and adding my own writing variant, here are the results I get when running the benchmark on an ASUS ZenBook UX305 running Windows 10 (times given in seconds):

Running tests... 0 1 2 Buffered DataOutputStream 8,14 8,46 8,30 FileChannel alt2 1,55 1,18 1,12 ObjectOutputStream 9,60 10,41 11,68 FileChannel 1,49 1,20 1,21 FileChannel alt 5,49 4,58 4,66 

And here are the results running on the same computer but with Arch Linux and the order of the write methods switched:

Running tests... 0 1 2 Buffered DataOutputStream 31,16 6,29 7,26 FileChannel 1,07 0,83 0,82 FileChannel alt2 1,25 1,71 1,42 ObjectOutputStream 3,47 5,39 4,40 FileChannel alt 2,70 3,27 3,46 

Each test wrote an 800mb file. The unbuffered DataOutputStream took way to long so I excluded it from the benchmark.

As seen, writing using a file channel still beats the crap out of all other methods, but it matters a lot whether the byte buffer is memory-mapped or not. Without memory-mapping the file channel write took 3-5 seconds:

var bb = ByteBuffer.allocate(4 * ints.length); for (int i : ints) bb.putInt(i); bb.flip(); try (var fc = new FileOutputStream("fcalt.out").getChannel()) { fc.write(bb); } 

With memory-mapping, the time was reduced to between 0.8 to 1.5 seconds:

try (var fc = new RandomAccessFile("fcalt2.out", "rw").getChannel()) { var bb = fc.map(READ_WRITE, 0, 4 * ints.length); bb.asIntBuffer().put(ints); } 

But note that the results are order-dependent. Especially so on Linux. It appears that the memory-mapped methods doesn't write the data in full but rather offloads the job request to the OS and returns before it is completed. Whether that behaviour is desirable or not depends on the situation.

Memory-mapping can also lead to OutOfMemory problems so it is not always the right tool to use. Prevent OutOfMemory when using java.nio.MappedByteBuffer.

Here is my version of the benchmark code: https://gist.github.com/bjourne/53b7eabc6edea27ffb042e7816b7830b

Comments

3

I think you should consider using file channels (the java.nio library) instead of plain streams (java.io). A good starting point is this interesting discussion: Java NIO FileChannel versus FileOutputstream performance / usefulness

and the relevant comments below.

Cheers!

Comments

3

The main improvement you can have for writing int[] is to either;

  • increase the buffer size. The size is right for most stream, but file access can be faster with a larger buffer. This could yield a 10-20% improvement.

  • Use NIO and a direct buffer. This allows you to write 32-bit values without converting to bytes. This may yield a 5% improvement.

BTW: You should be able to write at least 10 million int values per second. With disk caching you increase this to 200 million per second.

Comments

0

Array is Serializable - can't you just use writer.writeObject(data);? That's definitely going to be faster than individual writeInt calls.

If you have other requirements on the output data format than retrieval into int[], that's a different question.

1 Comment

writeObject has significant overhead and uses writeInt in the end. It is much friendly way to write objects and I suspect is a better choice in most situations.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.