15

I have a large text file with 20 million lines of text. When I read the file using the following program, it works just fine, and in fact I can read much larger files with no memory problems.

public static void main(String[] args) throws IOException { File tempFile = new File("temp.dat"); String tempLine = null; BufferedReader br = null; int lineCount = 0; try { br = new BufferedReader(new FileReader(tempFile)); while ((tempLine = br.readLine()) != null) { lineCount += 1; } } catch (Exception e) { System.out.println("br error: " +e.getMessage()); } finally { br.close(); System.out.println(lineCount + " lines read from file"); } } 

However if I need to append some records to this file before reading it, the BufferedReader consumes a huge amount of memory (I have just used Windows task manager to monitor this, not very scientific I know but it demonstrates the problem). The amended program is below, which is the same as the first one, except I am appending a single record to the file first.

public static void main(String[] args) throws IOException { File tempFile = new File("temp.dat"); PrintWriter pw = null; try { pw = new PrintWriter(new BufferedWriter(new FileWriter(tempFile, true))); pw.println(" "); } catch (Exception e) { System.out.println("pw error: " + e.getMessage()); } finally { pw.close(); } String tempLine = null; BufferedReader br = null; int lineCount = 0; try { br = new BufferedReader(new FileReader(tempFile)); while ((tempLine = br.readLine()) != null) { lineCount += 1; } } catch (Exception e) { System.out.println("br error: " +e.getMessage()); } finally { br.close(); System.out.println(lineCount + " lines read from file"); } } 

A screenshot of Windows task manager, where the large bump in the line shows the memory consumption when I run the second version of the program.

task manager screenshot

So I was able to read this file without running out of memory. But I have much larger files with more than 50 million records, which encounter an out of memory exception when I run this program against them? Can someone explain why the first version of the program works fine on files of any size, but the second program behaves so differently and ends in failure? I am running on Windows 7 with:

java version "1.7.0_05"
Java(TM) SE Runtime Environment (build 1.7.0_05-b05)
Java HotSpot(TM) Client VM (build 23.1-b03, mixed mode, sharing)

26
  • 1
    Is it the BufferedReader that takes all the memory? I'd rather suspect it'd be the FileWriter doing this. Commented Aug 30, 2012 at 17:39
  • 1
    Is there a reason for adding a BufferedWriter into the mix? Do you still get the same problem if you do new PrintWriter(new FileWriter(...))? Commented Aug 30, 2012 at 17:51
  • 2
    (Nothing to do with the question, but I have to point out the you could get an NPE in the finally block. The way to deal with this is to use Java SE 7's try-with-resource, or with Java SE 6 use separate try's for the finally and catch and avoid the use of nulls.) Commented Aug 30, 2012 at 17:54
  • 2
    I've tested the second version on file about 1.3GB with more than 30 millions lines and it runs fine. Heap consumption about 60 MB. Java 6 / Linux X86 Commented Aug 30, 2012 at 18:08
  • 2
    Are we sure it's actually heap which is huge. Appending to the file may require all sorts of rearrangement of disc which will be cached in RAM. Commented Aug 30, 2012 at 18:13

6 Answers 6

1

you can start a Java-VM with VM-Options

-XX:+HeapDumpOnOutOfMemoryError 

this will write a heap dump to a file, which can be analysed for finding leak suspects

Use a '+' to add an option and a '-' to remove an option.

If you are using Eclipse the Java Memory Analyzer Plugin MAT to get Heap-Dumps from running VMs with some nice analyses for Leak Suspects etc.

Sign up to request clarification or add additional context in comments.

Comments

0

Each time you execute the java following Java routine, you are creating a brand new object:

tempLine = br.readLine() 

I believe each time you call readLine() it is probably creating a new String object which is left on the heap each time the re-assignment is called to assign the value to tempLine.

Therefore, since GC isn't constantly being called thousands of objects can be left on the heap within seconds.

Some people say its a bad idea to call System.gc() every 1000 lines or so but I would be curious if that fixes your issue. Also, you could run this command after each line to basically mark each object as garbage collectable:

tempLine=null; 

2 Comments

I don't think that's the problem. When I run the readonly version of the program, the BufferedReader works just fine with no memory problems at all. The problem only occurs when I precede the reading of the file with a section which appends a line to the file using a printwriter.
What is your line count on the exception? Also, if you use JDK 1.6.0_22 or higher, I believe you get a multithreaded garbage collector and I am curious what behavior you get with that? Also, doesn't BufferedWriter allow you to increase the buffer size? Alternative: try using InputStreamReader and FileInputStream to read and then store the data in a char, then just write that char using a FileOutputStream.
0
 pw = new PrintWriter(new BufferedWriter(new FileWriter(tempFile, true))); 

did you try not using a BufferedWriter? If your appending a few lines to the end maybe you don't need a buffer? If you do, consider using a byte array (collections or String builder). Finally did you try the same in java 1.6_32? Might be a bug in the new version of one of the Writers.

Can you print the free memory after before and after pw.close(); ?

System.out.println("before wr close :" + Runtime.getRuntime().freeMemory()); 

and similar after close and after reader close

Comments

0

It could be because you may not be having linefeed/carriage return in your file at all. In this case, readLine() tries to create just one single string out of your file which is probably running out of mememory.

Java doc of readLine():

Reads a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.

1 Comment

That's not the problem unfortunately, the files are all delineated properly, and I'm getting the correct line counts as I parse the files.
0

Have you tried:

A) creating a new File instance to use for the reading, but pointing to the same file. and B) reading an entirely different file in the second part.

I'm wondering if either, the File object is still somehow attached to the PrintWriter or if the OS is doing something funny with the file handles. Those tests should show you where to focus.

This doesn't look to be a problem with the code, and your logic for thinking it shouldn't break seems sound, so it's got to be some underlying functionality.

1 Comment

Thanks @Glen Lamb, I think your suggestions make a lot of sense. However I had already spent too much time on this issue and finally decided to do it another way which avoided this problem altogether. If I ever get time to return to it, I'll post any results I get.
-3

you'll need to start java with a bigger heap. Try -Xmx1024m as a parameter on the java command.

Basically your going to need more memory than the size of the file.

1 Comment

Can you explain why I need a bigger heap for the 2nd program but not the 1st? The 1st version of the program works just fine, and uses a very small heap size. The BufferedReader processes the file 1 line at a time so it shouldn't need much memory at all?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.