1

I am coding a little java based tool to process mysqldump files, which can become quite large (up to a gigabyte for now). I am using this code to read and process the file:

BufferedReader reader = getReader(); BufferedWriter writer = getWriter(); char[] charBuffer = new char[CHAR_BUFFER_SIZE]; int readCharCout; StringBuffer buffer = new StringBuffer(); while( ( readCharCout = reader.read( charBuffer ) ) > 0 ) { buffer.append( charBuffer, 0, readCharCout ); //processing goes here } 

What is a good size for the charBuffer? At the moment it is set to 1000, but my code will run with an arbitrary size, so what is best practice or can this size be calculated depending on the file size?

Thanks in ahead, greetings philipp

14
  • 3
    Oracle's BufferedReader already uses a default buffer of 8192. Commented Sep 25, 2013 at 14:31
  • I don't know if there's a standard for this as it'll depend on your available memory. I would recommend experimenting with it at different sizes to see how it affects your performance Commented Sep 25, 2013 at 14:31
  • 1
    @StormeHawke AFAIK the best values are 4096 and 8192, which value to use depends entirely on your hard drive speed. Commented Sep 25, 2013 at 14:32
  • 1
    @SotiriosDelimanolis but then you get the wrong idea. Nor OpenJDK nor HotSpot source code are the codes. If the size is not specified in the javadoc, then you must not assume it will always be 8192, note that JRockit or IBM JVM can change this. Commented Sep 25, 2013 at 14:34
  • 1
    @NFE Just like Luiggi Mendoza stated, it depends on the implementation. I was referring to Oracle JDK 7's implementation. The OpenJDK also seems to use that size. Commented Sep 25, 2013 at 14:43

1 Answer 1

2

It should always be a power of 2. The optimal value is based on the OS and disk format. In code I've seen 4096 is often used, but the bigger the better.

Also, there are better ways to load a file into memory.

Sign up to request clarification or add additional context in comments.

3 Comments

More than a power of 2, it should be a power of 1024.
I tries a bunch of values which were a power of 2, all ran fine. But except for very small values I could not see any extraordinary performance gains, but I guess that is due to my implementation
I could be due to a lot of factors. Many OSes and disk controllers are smart enough to read blocks ahead of time. When that occurs, your buffer size doesn't matter much, except for the cost of round-tripping to the OS API.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.