First you are running out of memory because all rows are being added to a list.
Second you are using String.split() which is extremely slow.
Third never try processing CSV by writing your own parsing code as there are many edge cases around this format (need to handle escape of delimiter, quotes, etc).
The solution is to use a library for that, such as univocity-parsers. You should be able to read 1 million rows in less than a second.
To parse, just do this:
public static IterableResult<String[], ParsingContext> readCSV(String filePath) { File file = new File(filePath); //configure the parser here. By default all values are trimmed CsvParserSettings parserSettings = new CsvParserSettings(); //create the parser CsvParser parser = new CsvParser(parserSettings); //create an iterable over rows. This will not load everything into memory. IterableResult<String[], ParsingContext> rows = parser.iterate(file); return rows; }
Now you can use your method like this:
public static void main(String... args) { IterableResult<String[], ParsingContext> rows = readCSV("c:/path/to/input.csv"); try { for (String[] row : rows) { //process the rows however you want } } finally { //the parser closes itself but in case any errors processing the rows (outside of the control of the iterator), close the parser. rows.getContext().stop(); } }
This is just an example of how you can use the parser, but there are many different ways to use it.
Now for writing, you can do this:
public static void main(String... args) { //this is your output file File output = new File("c:/path/to/output.csv"); //configure the writer if you need to CsvWriterSettings settings = new CsvWriterSettings(); //create the writer. Here we write to a file CsvWriter writer = new CsvWriter(output, settings); //get the row iterator IterableResult<String[], ParsingContext> rows = readCSV("c:/temp"); try { //do whatever you need to the rows here for (String[] row : rows) { //then write it each one to the output. writer.writeRow(row); } } finally { //cleanup rows.getContext().stop(); writer.close(); } }
If all you want is to read the data, modify it and write it back to another file, you can just do this:
public static void main(String... args) throws IOException { CsvParserSettings parserSettings = new CsvParserSettings(); parserSettings.setProcessor(new AbstractRowProcessor() { @Override public void rowProcessed(String[] row, ParsingContext context) { //modify the row data here. } }); CsvWriterSettings writerSettings = new CsvWriterSettings(); CsvRoutines routines = new CsvRoutines(parserSettings, writerSettings); FileReader input = new FileReader("c:/path/to/input.csv"); FileWriter output = new FileWriter("c:/path/to/output.csv"); routines.parseAndWrite(input, output); }
Hope this helps.
Disclaimer: I'm the author of this libary. It's open source and free (Apache 2.0 license).
forandwhileloops:dividedList.add(removeWhiteSpace(row[0].split(",")));