5

I want to parse the large CSV file as fast and efficient as possible.

Currently, I am using the openCSV library to parse my CSV file but it is taking approx 10sec to parse a CSV file which has 10776 records with 24 headings and I want to parse a CSV file with millions of records.

<dependency> <groupId>com.opencsv</groupId> <artifactId>opencsv</artifactId> <version>4.1</version> </dependency> 

I am using the openCSV library parsing using below code snippet.

public List<?> convertStreamtoObject(InputStream inputStream, Class clazz) throws IOException { HeaderColumnNameMappingStrategy ms = new HeaderColumnNameMappingStrategy(); ms.setType(clazz); Reader reader = new InputStreamReader(inputStream); CsvToBean cb = new CsvToBeanBuilder(reader) .withType(clazz) .withMappingStrategy(ms) .withSkipLines(0) .withSeparator('|') .withFieldAsNull(CSVReaderNullFieldIndicator.EMPTY_SEPARATORS) .withThrowExceptions(true) .build(); List<?> parsedData = cb.parse(); inputStream.close(); reader.close(); return parsedData; } 

I am looking for suggestions for another way to parse a CSV file with millions of records in less time frame.

--- updated the answer ----

 Reader reader = new InputStreamReader(in); CSVParser csvParser = new CSVParser(reader, CSVFormat.DEFAULT .withFirstRecordAsHeader() .withDelimiter('|') .withIgnoreHeaderCase() .withTrim()); List<CSVRecord> recordList = csvParser.getRecords(); for (CSVRecord csvRecord : recordList) { csvRecord.get("headername"); } 
16
  • Try BufferedInputStreamReader Commented Jun 5, 2019 at 5:29
  • @K.Nicholas I’m very sure that openCSV is smart enough to enable buffering one way or another if needed. Commented Jun 5, 2019 at 9:32
  • 2
    @K.Nicholas but you are the one who supposed to use BufferedInputStreamReader, which doesn’t gain anything, unless you assume that openCSV fails to enable buffering on its own. I just looked it up, this.br = (reader instanceof BufferedReader ? (BufferedReader) reader : new BufferedReader(reader));, so the OP doesn’t need to test with any buffered stream or reader, openCSV does already do that… Commented Jun 5, 2019 at 13:23
  • 1
    @K.Nicholas what is better, letting the OP try something that’s predictably no solution, or no answer at all? I don’t know, whether a better performance is possible in the OP’s case and where the bottleneck lies. That’s what profiling tools are for. Perhaps, it’s not the I/O but the Reflection magic that converts the CSV lines to instances of the Class argument. Perhaps, a different library performs better. Not enough information to answer that. The only thing that can be said for sure, is that additional buffering won’t help. Commented Jun 5, 2019 at 13:34
  • 1
    I added an Answer to this original of your duplicate Question. I used Apache Commons CSV to write and read/parse a million rows. The rows were similar to what you describe: 24 columns of an integer, an Instant, and 22 UUID columns as canonical hex strings. Takes 10 seconds to merely read the 850 meg file, and another two to parse the cell values back to objects. Doing ten thousand took about half a second versus the 10 seconds your reported, a time savings of 20-fold faster. Commented Jun 6, 2019 at 5:09

1 Answer 1

0

Answer

Reader reader = new InputStreamReader(in); CSVParser csvParser = new CSVParser(reader, CSVFormat.DEFAULT .withFirstRecordAsHeader() .withDelimiter('|') .withIgnoreHeaderCase() .withTrim()); List<CSVRecord> recordList = csvParser.getRecords(); for (CSVRecord csvRecord : recordList) { csvRecord.get("headername"); } 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.