How to load very large csv files in nodejs?

Question

I'm trying to load 2 big csv into nodejs, first one has a size of 257 597 ko and second one 104 330 ko. I'm using the filesystem (fs) and csv modules, here's my code :

fs.readFile('path/to/my/file.csv', (err, data) => { if (err) console.err(err) else { csv.parse(data, (err, dataParsed) => { if (err) console.err(err) else { myData = dataParsed console.log('csv loaded') } }) } })

And after ages (1-2 hours) it just crashes with this error message :

<--- Last few GCs ---> [1472:0000000000466170] 4366473 ms: Mark-sweep 3935.2 (4007.3) -> 3935.2 (4007. 3) MB, 5584.4 / 0.0 ms last resort GC in old space requested [1472:0000000000466170] 4371668 ms: Mark-sweep 3935.2 (4007.3) -> 3935.2 (4007. 3) MB, 5194.3 / 0.0 ms last resort GC in old space requested <--- JS stacktrace ---> ==== JS stack trace ========================================= Security context: 000002BDF12254D9 <JSObject> 1: stringSlice(aka stringSlice) [buffer.js:590] [bytecode=000000810336DC91 o ffset=94](this=000003512FC822D1 <undefined>,buf=0000007C81D768B9 <Uint8Array map = 00000352A16C4D01>,encoding=000002BDF1235F21 <String[4]: utf8>,start=0,end=263 778854) 2: toString [buffer.js:664] [bytecode=000000810336D8D9 offset=148](this=0000 007C81D768B9 <Uint8Array map = 00000352A16C4D01>,encoding=000002BDF1... FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memo ry 1: node::DecodeWrite 2: node_module_register 3: v8::internal::FatalProcessOutOfMemory 4: v8::internal::FatalProcessOutOfMemory 5: v8::internal::Factory::NewRawTwoByteString 6: v8::internal::Factory::NewStringFromUtf8 7: v8::String::NewFromUtf8 8: std::vector<v8::CpuProfileDeoptFrame,std::allocator<v8::CpuProfileDeoptFrame > >::vector<v8::CpuProfileDeoptFrame,std::allocator<v8::CpuProfileDeoptFrame> > 9: v8::internal::wasm::SignatureMap::Find 10: v8::internal::Builtins::CallableFor 11: v8::internal::Builtins::CallableFor 12: v8::internal::Builtins::CallableFor 13: 00000081634043C1

The biggest file is loaded but node runs out of memory for the other. It's probably easy to allocate more memory, but the main issue here is the loading time, it seems very long despite the size of files. So what is the correct way to do it? Python loads these csv really fast with pandas btw (3-5 seconds).

hatef · Accepted Answer · 2019-01-10 15:35:19Z

25

Stream works perfectly, it took only 3-5 seconds :

var csv = require('csv-parser') var data = [] fs.createReadStream('path/to/my/data.csv') .pipe(csv()) .on('data', function (row) { data.push(row) }) .on('end', function () { console.log('Data loaded') })

edited Jan 10, 2019 at 15:35

hatef

6,29930 gold badges47 silver badges48 bronze badges

answered May 23, 2018 at 12:00

François MENTEC

1,3044 gold badges16 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Amit Kumar Over a year ago

Read stream is also breaking.

Vinay Pandya Over a year ago

I think, here data array will be storing every of files, ultimately it will hold whole file in one variable. Instead of that user can directly perform some task with that data, for example: DB operation.

JacobW · Accepted Answer · 2018-05-22 14:22:02Z

fs.readFile will load the entire file into memory, but fs.createReadStream will read the file in chunks of the size you specify.

This will prevent it from running out of memory

Haroldo_OK · Accepted Answer · 2018-05-22 13:52:58Z

4

You may want to stream the CSV, instead of reading it all at once:

csv-parse has streaming support: http://csv.adaltas.com/parse/
or, you may want to take a look at csv-stream: https://www.npmjs.com/package/csv-stream

answered May 22, 2018 at 13:52

Haroldo_OK

7,3033 gold badges50 silver badges86 bronze badges

1 Comment

Patrick Portal Over a year ago

Beware I tried to use csv-parse once, but I was not able to throttle the readable event ; the parser read really fast and I had to allocate a lot of RAM for. Could be tricky for CSV files like 1GB... If I had to retry, I would search for a Promise like library or able to handle a promise / callback.

ruhui0806 · Accepted Answer · 2023-03-24 13:16:07Z

const parseOptions = (chunkSize, count) => { let parseObjList = [] for (let i = 0; i < (count / chunkSize); i++) { const from_line = (i * chunkSize) + 1 const to_line = (i + 1) * chunkSize; let parseObj = { delimiter: ',', from_line: from_line, to_line: to_line, skip_empty_lines: true } parseObjList.push(parseObj); } return parseObjList; } function parseJourney(filepath) { let chunksize = 10 const count = fs.readFileSync(filepath,'utf8').split('\n').length - 1; const parseObjList = parseOptions(chunksize, count) for (let i = 0; i < parseObjList.length; i++) { fs.createReadStream(filepath) .pipe(parse(parseObjList[i])) .on('data', function (row) { let journey_object = {}; if (journeyValidation(row)) { journeyHeaders.forEach((columnName, idx) => { journey_object[columnName] = row[idx]; }); logger.info(journey_object); Journey.create(journey_object).catch(error => { logger.error(error); }) } else { logger.error('Incorrect data type in this row: ' + row); } }) .on('end', function () { logger.info('finished'); }) .on('error', function (error) { logger.error(error.message); }); } }

call the function by passing the file path to it:

parseJourney('./filePath.csv')

Harshit Srivastava · Accepted Answer · 2024-03-02 08:18:53Z

const fs = require('fs'); const csv = require('csv-parser'); const database = require('./your-database-module'); // Replace with your database module const data = []; fs.createReadStream('file.csv') .pipe(csv()) .on('data', async (row) => { data.push(row); if (data.length > 5 && data.length < 10) { if (row['Subscription Date'].includes('2020')) { // Perform CRUD operation with database try { // Example: Insert row into database await database.insertRow(row); // Replace with your database insert operation console.log('Row inserted:', row); } catch (error) { console.error('Error inserting row:', error); } } } }) .on('end', () => { console.log('Data loaded'); });

When Read data and check some specific data from stream data and perform Some CRUD opration with database

Collectives™ on Stack Overflow

How to load very large csv files in nodejs?

5 Answers 5

2 Comments

Comments

1 Comment

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

Comments

1 Comment

Comments

Comments

Linked

Related