Skip to content

uhop/stream-json

Repository files navigation

stream-json NPM version

stream-json is a micro-library of Node.js stream components for creating custom JSON processing pipelines with a minimal memory footprint. It can parse JSON files far exceeding available memory. Even individual data items (keys, strings, and numbers) can be streamed piece-wise. A SAX-inspired event-based API is included.

Components:

  • Parser — streaming JSON parser producing a SAX-like token stream.
    • Optionally packs keys, strings, and numbers (controlled separately).
    • The main module creates a parser decorated with emit().
  • Filters edit a token stream:
    • Pick — selects matching subobjects, ignoring the rest.
    • Replace — substitutes matching subobjects with a replacement.
    • Ignore — removes matching subobjects entirely.
    • Filter — filters subobjects while preserving the JSON shape.
  • Streamers assemble tokens into JavaScript objects:
    • StreamValues — streams successive JSON values (for JSON Streaming or after pick()).
    • StreamArray — streams elements of a top-level array.
    • StreamObject — streams top-level properties of an object.
  • Essentials:
    • Assembler — reconstructs JavaScript objects from tokens (EventEmitter).
    • Disassembler — converts JavaScript objects into a token stream.
    • Stringer — converts a token stream back into JSON text.
    • Emitter — re-emits tokens as named events.
  • Utilities:
    • emit() — attaches token events to any stream.
    • withParser() — creates parser + component pipelines.
    • Batch — groups items into arrays.
    • Verifier — validates JSON text, pinpoints errors.
    • FlexAssembler — Assembler with custom containers (Map, Set, etc.) at specific paths.
    • Utf8Stream — sanitizes multibyte UTF-8 input.
  • JSONL (JSON Lines / NDJSON):
    • jsonl/Parser — parses JSONL into {key, value} objects. Faster than parser({jsonStreaming: true}) + streamValues() when items fit in memory.
    • jsonl/Stringer — serializes objects to JSONL text. Faster than disassembler() + stringer().
  • JSONC (JSON with Comments):
    • jsonc/Parser — streaming JSONC parser with comment and whitespace tokens.
    • jsonc/Stringer — converts JSONC token streams back to text.

All components are building blocks for custom data processing pipelines. They can be combined with each other and with custom code via stream-chain.

Distributed under the New BSD license.

Introduction

const {chain} = require('stream-chain'); const {parser} = require('stream-json'); const {pick} = require('stream-json/filters/pick.js'); const {ignore} = require('stream-json/filters/ignore.js'); const {streamValues} = require('stream-json/streamers/stream-values.js'); const fs = require('fs'); const zlib = require('zlib'); const pipeline = chain([ fs.createReadStream('sample.json.gz'), zlib.createGunzip(), parser(), pick({filter: 'data'}), ignore({filter: /\b_meta\b/i}), streamValues(), data => { const value = data.value; // keep data only for the accounting department return value && value.department === 'accounting' ? data : null; } ]); let counter = 0; pipeline.on('data', () => ++counter); pipeline.on('end', () => console.log(`The accounting department has ${counter} employees.`));

See the full documentation in Wiki.

Companion projects:

  • stream-csv-as-json streams huge CSV files in a format compatible with stream-json: rows as arrays of string values. If a header row is used, it can stream rows as objects with named fields.

Installation

npm install --save stream-json # or: yarn add stream-json

Use

The library is organized as small composable components based on Node.js streams and events. The source code is compact — read it to understand how things work and to build your own components.

Bug reports, simplifications, and new generic components are welcome — open a ticket or pull request.

Release History

  • 2.0.0 major rewrite: functional API based on stream-chain 3.x, bundled TypeScript definitions. New: JSONC parser/stringer, FlexAssembler. See Migrating from 1.x to 2.x.
  • 1.9.1 fixed a race condition in the Disassembler stream implementation. Thx, Noam Okman.
  • 1.9.0 fixed a slight deviation from the JSON standard. Thx Peter Burns.
  • 1.8.0 added an option to indicate/ignore JSONL errors. Thx, AK.
  • 1.7.5 fixed a stringer bug with ASCII control symbols. Thx, Kraicheck.

The full history is in the wiki: Release history.

About

The micro-library of Node.js stream components for creating custom JSON processing pipelines with a minimal memory footprint. It can parse JSON files far exceeding available memory streaming individual primitives using a SAX-inspired API.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Sponsor this project

  •  

Contributors