How can I parse a csv in low memory, using some parser in Java?

Question

I used InputStream, and on parsing, if there is a "," in one column then it considers it as a separate column. ex - abc, xyz, "m,n" then the parsed output is abc , xyz, m, n Here m and n are considered as separate columns.

Perhaps java.io.StreamTokenizer is a possibility. Or a scanner generator like JFlex. You' have to know how how to set them up for the grammar of a CSV file, though; they're not "out-of-the-box" solutions. — Kevin Anderson
– Kevin Anderson, Commented Sep 13, 2017 at 10:31
What is the data structure of your file and what should you do with the results after parsing? How much memory can the program consume? — Mick Mnemonic
– Mick Mnemonic, Commented Sep 13, 2017 at 10:37
You don't need much memory to parse CSV. What you need memory for is to store it all. Solution: don't. Process it a line at a time. — user207421
– user207421, Commented Sep 13, 2017 at 12:01

Rahul Gupta · Accepted Answer · 2017-09-13 10:56:52Z

There are many thirdParty Csv parsing library like

I am using UniVocity csv parser which is very fast and automatically detect separator in rows. You can go through above given csv libraries.

Tamas Rev · Accepted Answer · 2017-09-13 10:23:00Z

I really like the Apache Commons CSVParser. This is almost verbatim from their user guide:

Reader reader = new FileReader("input.csv"); final CSVParser parser = new CSVParser(reader, CSVFormat.DEFAULT); try { for (final CSVRecord record : parser) { final String string = record.get("SomeColumn"); ... } } finally { parser.close(); reader.close(); }

This is simple, configurable and line-oriented.

You could configure it like this:

final CSVParser parser = new CSVParser(reader, CSVFormat.DEFAULT.withHeader().withDelimiter(';'));

For the record, this configuration is unnecessary, as the CSVFormat.DEFAULT works exactly the way you want it to.

This would be my first attempt to see whether it fits into the memory. If it doesn't, can you be a little more specific about low memory footprint?

thanks for replying CSVParser loads the whole file into the memory that is a problem. If the file size is 1GB then already the memory consumption is 1GB around.
@somey CSVParser can do both: reading all into memory, and reading record wise. See commons.apache.org/proper/commons-csv/apidocs/index.html
@somey how do you parse it? That part of the code can read stuff into the memory too. Can you please show us how you do it? Also, you could connect jvisualvm and see what exactly consumes that much memory. Maybe a gc run is needed?

Collectives™ on Stack Overflow

How can I parse a csv in low memory, using some parser in Java?

2 Answers 2

Comments

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Related