43

I made a method that takes a File and a String. It replaces the file with a new file with that string as its contents.

This is what I made:

public static void Save(File file, String textToSave) { file.delete(); try { BufferedWriter out = new BufferedWriter(new FileWriter(file)); out.write(textToSave); out.close(); } catch (IOException e) { } } 

However it is painfully slow. It sometimes takes over a minute.

How can I write large files with tens of thousands to maybe up to a million characters in them?

8
  • 9
    Deleting the file is unnecessary. You're overwriting it. Commented Jan 1, 2011 at 23:21
  • 1
    How much of the time is CPU time and how much I/O ("system") time? For large files creating the huge textToSave string might dominate the time. Commented Jan 1, 2011 at 23:34
  • 3
    Not directly related to your question: You might consider restructuring the out.close() statement so that it can be done in a finally block. In case an error is thrown on write, it would still close. Commented Jan 2, 2011 at 0:36
  • 2
    Don't ignore your IOexception, that can lead to your program failing in mysterious ways Commented Jan 2, 2011 at 10:34
  • 5
    Rather than deleting the file before writing, or overriding it directly, I would recommend writing to a temporary file, then renaming it over the old file afterwards. That means you don't risk replacing your old file with something corrupt if the IO fails halfway through. Commented Jan 2, 2011 at 12:08

7 Answers 7

28

Make sure you allocate a large enough buffer:

BufferedWriter out = new BufferedWriter(new FileWriter(file), 32768); 

What sort of OS are you running on? That can make a big difference too. However, taking a minute to write out a file of less-than-enormous size sounds like a system problem. On Linux or other *ix systems, you can use things like strace to see if the JVM is making lots of unnecessary system calls. (A very long time ago, Java I/O was pretty dumb and would make insane numbers of low-level write() system calls if you weren't careful, but when I say "a long time ago" I mean 1998 or so.)

edit — note that the situation of a Java program writing a simple file in a simple way, and yet being really slow, is an inherently odd one. Can you tell if the CPU is heavily loaded while the file is being written? It shouldn't be; there should be almost no CPU load from such a thing.

Sign up to request clarification or add additional context in comments.

8 Comments

Agreed. He might even be able to know the buffer size needed in advance since he is taking the String as param: textToSave.getBytes().length
@Rocky Madden yea that's a real good point. However dumping a string through the Java IO libraries should be pretty fast almost any way you do it.
getBytes() can be very expensive just to tune a buffer. I suggest you just make it 256K and not worry about it.
-1 because if you're writing a single huge string, you don't even need a character buffer - you could pass it to the FileWriter directly, and it would process it in a single batch. It might be worth having a buffer at the byte level (using OutputStreamWriter + BufferedOutputStream + FileOutputStream), because character encoding is done with a buffer whose size you don't control, and which i believe is quite small. But not at the character level.
Good answer. It turned out the reason it was writing so slow was actually not because of the writing method, but because I used such a long String. It was the computing of the String that took so long, and the writing didn't take as much time. My solution was to write the file in pieces, not all at once, so the String to write didn't become huge. Using your ideas helped, as well.
|
25

A simple test for you

char[] chars = new char[100*1024*1024]; Arrays.fill(chars, 'A'); String text = new String(chars); long start = System.nanoTime(); BufferedWriter bw = new BufferedWriter(new FileWriter("/tmp/a.txt")); bw.write(text); bw.close(); long time = System.nanoTime() - start; System.out.println("Wrote " + chars.length*1000L/time+" MB/s."); 

Prints

Wrote 135 MB/s. 

Comments

5

You could look into Java's NIO capabilities. It may support what you want to do.

Java NIO FileChannel versus FileOutputstream performance / usefulness

Comments

3

Try using memory mapped files:

FileChannel rwChannel = new RandomAccessFile("textfile.txt", "rw").getChannel(); ByteBuffer wrBuf = rwChannel.map(FileChannel.MapMode.READ_WRITE, 0, textToSave.length()); wrBuf.put(textToSave.getBytes()); rwChannel.close(); 

Comments

2

Hi I have created two approaches to create big files, run program on windows 7, 64-bit, 8 GB RAM machine, JDK 8 and below are results.
In both the cases, file of 180 MB created that contains number in each line from 1 to 20 million (2 crore in Indian system).

Java program memory grows gradually till 600 MB

First output

Approach = approach-1 (Using FileWriter) Completed file writing in milli seconds = 4521 milli seconds. 

Second output

Approach = approach-2 (Using FileChannel and ByteBuffer) Completed file writing in milli seconds = 3590 milli seconds. 

One observation - I am calculating position (pos variable) in approach#2, if I comment it out then only last string will be visible due to overwritten at position, but time reduced to nearly 2000 milli seconds.

Attaching code.

import java.io.FileWriter; import java.io.IOException; import java.io.RandomAccessFile; import java.nio.ByteBuffer; import java.nio.channels.FileChannel; import java.util.concurrent.TimeUnit; public class TestLargeFile { public static void main(String[] args) { writeBigFile(); } private static void writeBigFile() { System.out.println("--------writeBigFile-----------"); long nanoTime = System.nanoTime(); String fn = "big-file.txt"; boolean approach1 = false; System.out.println("Approach = " + (approach1 ? "approach-1" : "approach-2")); int numLines = 20_000_000; try { if (approach1) { //Approach 1 -- for 2 crore lines takes 4.5 seconds with 180 mb file size approach1(fn, numLines); } else { //Approach 2 -- for 2 crore lines takes nearly 2 to 2.5 seconds with 180 mb file size approach2(fn, numLines); } } catch (IOException e) { e.printStackTrace(); } System.out.println("Completed file writing in milli seconds = " + TimeUnit.MILLISECONDS.convert((System.nanoTime() - nanoTime), TimeUnit.NANOSECONDS)); } private static void approach2(String fn, int numLines) throws IOException { StringBuilder sb = new StringBuilder(); FileChannel rwChannel = new RandomAccessFile(fn, "rw").getChannel(); ByteBuffer wrBuf; int pos = 0; for (int i = 1; i <= numLines; i++) { sb.append(i).append(System.lineSeparator()); if (i % 100000 == 0) { wrBuf = rwChannel.map(FileChannel.MapMode.READ_WRITE, pos, sb.length()); pos += sb.length(); wrBuf.put(sb.toString().getBytes()); sb = new StringBuilder(); } } if (sb.length() > 0) { wrBuf = rwChannel.map(FileChannel.MapMode.READ_WRITE, pos, sb.length()); wrBuf.put(sb.toString().getBytes()); } rwChannel.close(); } private static void approach1(String fn, int numLines) throws IOException { StringBuilder sb = new StringBuilder(); for (int i = 1; i <= numLines; i++) { sb.append(i).append(System.lineSeparator()); } FileWriter fileWriter = new FileWriter(fn); fileWriter.write(sb.toString()); fileWriter.flush(); fileWriter.close(); } } 

Comments

0

This solution creates 20GB file containing string "ABCD...89\n" for 10 * 200 million times using Java NIO. Write performance on MacBook Pro (14-inch from 2021, M1 Pro, SSD AP1024R) is around 5.1 GB/s.

Code is following:

public static void main(String[] args) throws IOException { long number_of_lines = 1024 * 1024 * 200; int repeats = 10; byte[] buffer = "ABCD...89\n".getBytes(); FileChannel rwChannel = FileChannel.open(Path.of("textfile.txt"), StandardOpenOption.CREATE, StandardOpenOption.WRITE); // prepare buffer ByteBuffer wrBuf = ByteBuffer.allocate(buffer.length * (int) number_of_lines); for (int i = 0; i < number_of_lines; i++) wrBuf.put(buffer); long t1 = System.currentTimeMillis(); for(int i = 0; i < repeats; i++) { rwChannel.write(wrBuf); wrBuf.flip(); } while (wrBuf.hasRemaining()) { rwChannel.write(wrBuf); } long t2 = System.currentTimeMillis(); System.out.println("Time: " + (t2-t1)); System.out.println("Speed: " + ((double) number_of_lines * buffer.length*10 / (1024*1024)) / ((t2-t1) / (double) 1000) + " Mb/s"); } 

Comments

-3

In Java, the BufferWriter is very slow: Use the native methods directly, and call them as little as possible (give them as much data per call as you can).

 try{ FileOutputStream file=new FileOutputStream(file); file.write(content); file.close(); }catch(Throwable e){ D.error(e); }//try 

Also, deleting the file can take a while (maybe it is being copied to the recycle bin first). Just overwrite the file, like in the above code.

3 Comments

I have not had experience with BufferedWriter being "very slow" at all, and I've been writing server-side Java code for a really long time. I don't think it's what I'd use if I had some very serious mega-throughput application, maybe, but it's not that bad; how could it be?
likewise, I have never seen a call to File#delete() move a file to a recycle bin. Delete means delete.
Pointy: Yes, it probably was "a long time ago" that I traced the Java file writes through the MS debugger to see the inane number of system calls it was making on my machine.