Skip to main content
Removed unnecessary tags. This question seems to be more about List vs. Set, not Input or Output Streams
Link
Source Link
user6911980
user6911980

Removing duplicate lines from a text file

I have a text file that is sorted alphabetically, with around 94,000 lines of names (one name per line, text only, no punctuation.

Example:

Alice

Bob

Simon

Simon

Tom

Each line takes the same form, first letter is capitalized, no accented letters.

My code:

try{ BufferedReader br = new BufferedReader(new FileReader("orderedNames.txt")); PrintWriter out = new PrintWriter(new BufferedWriter(new FileWriter("sortedNoDuplicateNames.txt", true))); ArrayList<String> textToTransfer = new ArrayList(); String previousLine = ""; String current = ""; //Load first line into previous line previousLine = br.readLine(); //Add first line to the transfer list textToTransfer.add(previousLine); while((current = br.readLine()) != previousLine && current != null){ textToTransfer.add(current); previousLine = current; } int index = 0; for(int i=0; i<textToTransfer.size(); i++){ out.println(textToTransfer.get(i)); System.out.println(textToTransfer.get(i)); index ++; } System.out.println(index); }catch(Exception e){ e.printStackTrace(); } 

From what I understand is that, the first line of the file is being read and loaded into the previousLine variable like I intended, current is being set to the second line of the file we're reading from, current is then compared against the previous line and null, if it's not the same as the last line and it's not null, we add it to the array-list.

previousLine is then set to currents value so the next readLine for current can replace the current 'current' value to continue comparing in the while loop.

I cannot see what is wrong with this. If a duplicate is found, surely the loop should break?

Sorry in advance when it turns out to be something stupid.