13

I'm trying to load 2 big csv into nodejs, first one has a size of 257 597 ko and second one 104 330 ko. I'm using the filesystem (fs) and csv modules, here's my code :

fs.readFile('path/to/my/file.csv', (err, data) => { if (err) console.err(err) else { csv.parse(data, (err, dataParsed) => { if (err) console.err(err) else { myData = dataParsed console.log('csv loaded') } }) } }) 

And after ages (1-2 hours) it just crashes with this error message :

<--- Last few GCs ---> [1472:0000000000466170] 4366473 ms: Mark-sweep 3935.2 (4007.3) -> 3935.2 (4007. 3) MB, 5584.4 / 0.0 ms last resort GC in old space requested [1472:0000000000466170] 4371668 ms: Mark-sweep 3935.2 (4007.3) -> 3935.2 (4007. 3) MB, 5194.3 / 0.0 ms last resort GC in old space requested <--- JS stacktrace ---> ==== JS stack trace ========================================= Security context: 000002BDF12254D9 <JSObject> 1: stringSlice(aka stringSlice) [buffer.js:590] [bytecode=000000810336DC91 o ffset=94](this=000003512FC822D1 <undefined>,buf=0000007C81D768B9 <Uint8Array map = 00000352A16C4D01>,encoding=000002BDF1235F21 <String[4]: utf8>,start=0,end=263 778854) 2: toString [buffer.js:664] [bytecode=000000810336D8D9 offset=148](this=0000 007C81D768B9 <Uint8Array map = 00000352A16C4D01>,encoding=000002BDF1... FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memo ry 1: node::DecodeWrite 2: node_module_register 3: v8::internal::FatalProcessOutOfMemory 4: v8::internal::FatalProcessOutOfMemory 5: v8::internal::Factory::NewRawTwoByteString 6: v8::internal::Factory::NewStringFromUtf8 7: v8::String::NewFromUtf8 8: std::vector<v8::CpuProfileDeoptFrame,std::allocator<v8::CpuProfileDeoptFrame > >::vector<v8::CpuProfileDeoptFrame,std::allocator<v8::CpuProfileDeoptFrame> > 9: v8::internal::wasm::SignatureMap::Find 10: v8::internal::Builtins::CallableFor 11: v8::internal::Builtins::CallableFor 12: v8::internal::Builtins::CallableFor 13: 00000081634043C1 

The biggest file is loaded but node runs out of memory for the other. It's probably easy to allocate more memory, but the main issue here is the loading time, it seems very long despite the size of files. So what is the correct way to do it? Python loads these csv really fast with pandas btw (3-5 seconds).

5 Answers 5

25

Stream works perfectly, it took only 3-5 seconds :

var csv = require('csv-parser') var data = [] fs.createReadStream('path/to/my/data.csv') .pipe(csv()) .on('data', function (row) { data.push(row) }) .on('end', function () { console.log('Data loaded') }) 
Sign up to request clarification or add additional context in comments.

2 Comments

Read stream is also breaking.
I think, here data array will be storing every of files, ultimately it will hold whole file in one variable. Instead of that user can directly perform some task with that data, for example: DB operation.
14

fs.readFile will load the entire file into memory, but fs.createReadStream will read the file in chunks of the size you specify.

This will prevent it from running out of memory

Comments

4

You may want to stream the CSV, instead of reading it all at once:

1 Comment

Beware I tried to use csv-parse once, but I was not able to throttle the readable event ; the parser read really fast and I had to allocate a lot of RAM for. Could be tricky for CSV files like 1GB... If I had to retry, I would search for a Promise like library or able to handle a promise / callback.
0
const parseOptions = (chunkSize, count) => { let parseObjList = [] for (let i = 0; i < (count / chunkSize); i++) { const from_line = (i * chunkSize) + 1 const to_line = (i + 1) * chunkSize; let parseObj = { delimiter: ',', from_line: from_line, to_line: to_line, skip_empty_lines: true } parseObjList.push(parseObj); } return parseObjList; } function parseJourney(filepath) { let chunksize = 10 const count = fs.readFileSync(filepath,'utf8').split('\n').length - 1; const parseObjList = parseOptions(chunksize, count) for (let i = 0; i < parseObjList.length; i++) { fs.createReadStream(filepath) .pipe(parse(parseObjList[i])) .on('data', function (row) { let journey_object = {}; if (journeyValidation(row)) { journeyHeaders.forEach((columnName, idx) => { journey_object[columnName] = row[idx]; }); logger.info(journey_object); Journey.create(journey_object).catch(error => { logger.error(error); }) } else { logger.error('Incorrect data type in this row: ' + row); } }) .on('end', function () { logger.info('finished'); }) .on('error', function (error) { logger.error(error.message); }); } } 

call the function by passing the file path to it:

parseJourney('./filePath.csv') 

Comments

0
const fs = require('fs'); const csv = require('csv-parser'); const database = require('./your-database-module'); // Replace with your database module const data = []; fs.createReadStream('file.csv') .pipe(csv()) .on('data', async (row) => { data.push(row); if (data.length > 5 && data.length < 10) { if (row['Subscription Date'].includes('2020')) { // Perform CRUD operation with database try { // Example: Insert row into database await database.insertRow(row); // Replace with your database insert operation console.log('Row inserted:', row); } catch (error) { console.error('Error inserting row:', error); } } } }) .on('end', () => { console.log('Data loaded'); }); 

When Read data and check some specific data from stream data and perform Some CRUD opration with database

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.