3

I have a parser that works fine on smaller files of approx. 60000 lines or less but I have to parse a CSV file with over 10 million lines and this method just isn't working it hangs every 100 thousand lines for 10 seconds and I assume its the split method, Is there a faster way to parse data from a CSV to a string array?

Code in question:

 String[][] events = new String[rows][columns]; Scanner sc = new Scanner(csvFileName); int j = 0; while (sc.hasNext()){ events[j] = sc.nextLine().split(","); j++; } 
1
  • 1
    Are you sure, you want to keep that many entries in memory at the same time? Commented Jun 14, 2015 at 8:01

3 Answers 3

2

your code won't parse CSV files reliably. What if you had ',' or a line separator in a value? This is also very slow.

Get uniVocity-parsers to parse your files. It is 3 times faster than Apache Commons CSV, has many more features and we use it to process files with billions of rows.

To parse all rows into a list of Strings:

CsvParserSettings settings = new CsvParserSettings(); //lots of options here, check the documentation CsvParser parser = new CsvParser(settings); List<String[]> allRows = parser.parseAll(new FileReader(new File("path/to/input.csv"))); 

Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

Sign up to request clarification or add additional context in comments.

Comments

1

as a rule of thumb, using libraries is usually more efficient than in-house development. There are several libraries that provide reading/parsing csv files. One of the more popular ones is Apache Commons CSV

Comments

0

You might want to try a library I've just released: sesseltjonna-csv

It dynamically generates a CSV parser + databinding at runtime using ASM for improved performance.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.