Skip to content

andreisugu/json-to-sqlite

Repository files navigation

JSON to SQLite Converter πŸš€

A powerful browser-based tool to convert large JSON files to SQLite databases entirely client-side. Perfect for GitHub Pages deployment!

License: MIT Next.js TypeScript Privacy First No Server Required

✨ Features

  • πŸ”’ 100% Private: Your data never leaves your device - everything runs in the browser
  • ⚑ Streaming Architecture: Handles large JSON files (1GB+) without memory crashes
  • πŸš€ Fast Processing: Batched inserts with transactions for optimal performance
  • 🎯 Smart Schema Detection: Automatically detects columns and types from your data
  • πŸ”§ Customizable: Configure table name, batch size, and schema sample size
  • πŸ“Š Real-time Progress: Live progress tracking with detailed statistics
  • πŸ’Ύ Zero Cost: No server required - perfect for GitHub Pages
  • 🎨 Beautiful UI: Modern, responsive interface with smooth animations

🌟 Why This Implementation is Special

This isn't just another JSON converter - it's a production-grade streaming data pipeline running entirely in your browser. Here's what sets it apart:

🌊 True Streaming (SAX-style)

Unlike traditional converters that load the entire file into RAM, this implementation uses a streaming parser that processes data in chunks:

  • Chunk-based processing: Files are read in 64KB chunks via the File API
  • Incremental parsing: JSON objects are parsed as chunks arrive using @streamparser/json
  • Batch inserts: Objects are written to SQLite in configurable batches (default: 1000 rows)
  • Memory efficiency: A 1GB file can be converted on a device with only 2GB RAM without crashes

Technical Flow: File β†’ 64KB Chunks β†’ Stream Parser β†’ Objects β†’ Batch Buffer (1000 rows) β†’ SQLite

πŸš€ Zero UI Blocking

The entire processing pipeline runs in a Web Worker (separate thread), meaning:

  • Main thread stays free: The UI remains completely responsive during conversion
  • No freezing: You can interact with the page, view logs, and monitor progress in real-time
  • Module Worker: Uses modern ES6 modules with native import statements
  • True parallelism: Parser and database operations run independently of UI rendering

πŸ”„ Dynamic Schema Evolution

The converter doesn't require predefined schemas - it discovers and adapts on the fly:

  • Runtime column discovery: New fields like configs_extraData_sharpnessDenoise are detected during processing
  • On-the-fly schema updates: Columns are created dynamically using ALTER TABLE statements
  • Automatic backfilling: Previously inserted rows get NULL values for new columns
  • Nested object flattening: Deeply nested structures are automatically flattened to SQL columns
  • Type inference: Automatically detects INTEGER, REAL, and TEXT types from sample data

Example: If row 5,000 introduces a new field, the table schema updates automatically without reprocessing.

πŸ” Privacy First

Your data stays 100% local - no servers, no uploads, no tracking:

  • Client-side only: All processing happens in your browser's JavaScript engine
  • WASM sandbox: SQLite runs in WebAssembly with no external access
  • No network calls: Data never leaves your device (after initial library loads)
  • Perfect for sensitive data: Medical records, financial data, personal information - all stays private
  • Offline capable: Full Progressive Web App (PWA) support - works completely offline after first visit
  • Install as app: Can be installed on your device and used like a native application

πŸ†š Comparison with Traditional Approaches

Feature This Tool Traditional Server-Based Python/CLI Tools
Privacy βœ… Data never uploaded ❌ Data sent to server βœ… Local processing
Setup βœ… Zero setup (just open URL) ❌ Server required ❌ Install Python + deps
Large Files βœ… Streaming (1GB+ files) ⚠️ Upload limits βœ… Can handle large files
UI Blocking βœ… Web Worker (responsive) N/A ❌ Blocks terminal
Cost βœ… Free (GitHub Pages) πŸ’° Server hosting costs βœ… Free
Platform βœ… Any modern browser ⚠️ Server dependent ❌ OS-specific install
Schema Evolution βœ… Dynamic discovery ⚠️ Often needs upfront schema ⚠️ Varies by tool
Accessibility βœ… Just share a link ❌ Need server access ❌ Need tool installed

πŸ—οΈ Architecture

This tool implements a streaming "Bucket Brigade" architecture with three independent stages:

Data Flow Pipeline

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Main Thread β”‚ πŸ“ File Selection & UI β”‚ (React) β”‚ πŸ“Š Progress Display β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ βš™οΈ Configuration β”‚ postMessage β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Web Worker β”‚ 🌊 Streaming JSON Parser (@streamparser/json) β”‚ (Module Type) β”‚ πŸ” Schema Detection & Evolution β”‚ β”‚ πŸ’Ύ SQLite Operations (SQL.js/WASM) β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ πŸ“¦ Batched Transaction Inserts β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ SQLite Database β”‚ πŸ’Ώ In-Memory WASM Database β”‚ (Binary) β”‚ ⬇️ Exported as .sqlite file β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ 

Processing Stages

  1. File Streaming (Main Thread)

    • Reads file in 64KB chunks using the File API
    • Sends raw chunk buffers to worker via postMessage
    • Non-blocking reads keep UI responsive
  2. Stream Parsing (Web Worker)

    • Uses @streamparser/json for true SAX-style parsing
    • Parses JSON incrementally without loading entire file
    • Handles partial objects across chunk boundaries
    • Emits complete objects for processing
  3. Schema Evolution (Web Worker)

    • Scans first N objects (default: 100) to build initial schema
    • Flattens nested objects into underscore-notation columns (user_address_city)
    • Detects data types (INTEGER, REAL, TEXT)
    • Dynamically adds columns when new fields appear
    • Backfills existing rows with NULL for new columns
  4. Batch Writing (Web Worker)

    • Buffers objects into batches (default: 1000 rows)
    • Wraps each batch in a SQLite transaction
    • Executes parameterized INSERT statements
    • Dramatically faster than individual inserts
  5. Export & Download (Main Thread)

    • Worker sends completed database as binary array
    • Main thread creates downloadable Blob
    • User downloads .sqlite file

Technology Stack

  • Next.js 14: React framework with static export for GitHub Pages
  • TypeScript: Type-safe development with full IDE support
  • Tailwind CSS: Modern, responsive styling
  • SQL.js (1.10.3): SQLite compiled to WebAssembly for browser execution
  • @streamparser/json: Streaming JSON parser for true SAX-style parsing
  • Web Workers (Module): ES6 module worker with native imports
  • File API: Browser-native file reading without server upload
  • WebAssembly: Native-speed SQLite execution in browser sandbox

πŸš€ Quick Start

GitHub Pages Deployment

This repository is configured for automatic deployment to GitHub Pages:

  1. Fork this repository
  2. Go to Settings β†’ Pages
  3. Set Source to "GitHub Actions"
  4. Push to the main branch - the site will deploy automatically
  5. Visit https://yourusername.github.io/json-to-sqlite/

The deployment workflow runs automatically on every push to the main branch.

Local Development

# Clone the repository git clone https://github.com/andreisugu/json-to-sqlite.git cd json-to-sqlite # Install dependencies npm install # Start development server npm run dev # Build for production npm run build # Open browser to http://localhost:3000

πŸ“– Usage

  1. Select JSON File: Click "Choose JSON File" and select your JSON file
  2. Configure Options:
    • Table Name: Name for your SQLite table (default: "data")
    • Schema Sample Size: Number of objects to scan for schema (default: 100)
    • Batch Size: Rows per transaction for performance (default: 1000)
  3. Start Conversion: Click "Start Conversion" and wait for processing
  4. Download: Once complete, download your SQLite database

πŸ’‘ Use Cases

This tool is perfect for:

πŸ₯ Sensitive Data Processing

  • Healthcare: Convert patient records to SQLite without HIPAA concerns
  • Financial: Process transaction data without uploading to servers
  • Legal: Handle confidential documents with complete privacy
  • Personal: Your diary, photos metadata, or browsing history stays local

πŸ“Š Data Analysis & Research

  • API exports: Convert API responses to queryable databases
  • Log analysis: Transform JSON logs into SQLite for SQL queries
  • Data migration: Move data between systems via universal SQLite format
  • Research data: Process survey results or experiment data offline

πŸš€ Development & Testing

  • Mock data: Convert JSON fixtures to SQLite test databases
  • Prototype databases: Quick database creation from JSON samples
  • Data exploration: Use SQL to explore complex JSON structures
  • CI/CD: Generate test databases in GitHub Actions (no server needed)

🌍 Offline & Low-Connectivity Scenarios

  • Field research: Convert data on laptops without internet
  • Remote locations: Process data where cloud access is limited
  • Air-gapped systems: Works on systems isolated from networks
  • Bandwidth constrained: No upload/download of large files to servers

πŸ“ Supported JSON Formats

The tool works with JSON arrays of objects:

[ { "id": 1, "name": "John Doe", "email": "john@example.com", "age": 30, "active": true }, { "id": 2, "name": "Jane Smith", "email": "jane@example.com", "age": 25, "active": false } ]

Nested Objects

Nested objects are automatically flattened:

{ "user": { "name": "John", "address": { "city": "New York", "zip": "10001" } } }

Becomes columns: user_name, user_address_city, user_address_zip

Arrays

Arrays are stored as JSON strings in the database.

βš™οΈ Configuration

Batch Size

  • Small files (<10MB): 500-1000 rows
  • Medium files (10-100MB): 1000-2000 rows
  • Large files (>100MB): 2000-5000 rows

Larger batch sizes = faster processing but more memory usage.

Schema Sample Size

  • Consistent data: 50-100 objects
  • Variable data: 200-500 objects
  • Highly variable: 500-1000 objects

More samples = better schema detection but slower startup.

πŸ”§ Technical Details

Memory Management

The tool uses multiple strategies to handle files larger than available RAM:

  1. Chunked Reading: Files are read in 64KB chunks via FileReader.readAsArrayBuffer()
  2. Streaming Parsing: @streamparser/json library parses JSON incrementally using SAX-style events
  3. Batched Inserts: Rows are buffered and inserted in configurable batches (default: 1000 rows)
  4. Worker Threads: Heavy processing isolated in Web Worker to prevent main thread blocking
  5. Buffer Management: Efficient string buffer handles incomplete objects across chunk boundaries
  6. Transaction Batching: SQLite transactions group inserts for 50-100x performance improvement

Memory Footprint (approximate, measured with Chrome DevTools):

  • Streaming parser overhead: ~10-20MB
  • Batch buffer: ~5-10MB (1000 objects)
  • SQLite in-memory database: Size of actual data + indexes
  • Total overhead: ~15-30MB (parser + buffer), plus the resulting database size

Schema Detection & Evolution

The schema system adapts dynamically as data is processed:

Initial Schema Building (First N objects):

  • Scans sample objects to discover all fields
  • Flattens nested objects using underscore notation (user_address_city)
  • Detects types based on JavaScript typeof and value patterns
  • Creates initial SQLite table with discovered columns

Dynamic Column Addition Process:

When a new field appears in row 5,000: 1. Detect new field (e.g., configs_extraData_sharpnessDenoise) 2. Infer type from value 3. Execute: ALTER TABLE data ADD COLUMN configs_extraData_sharpnessDenoise TEXT 4. Continue processing (existing rows automatically have NULL) 

Type Detection Rules:

  • Numbers that are integers β†’ INTEGER type
  • Numbers with decimal points β†’ REAL type
  • Boolean values β†’ INTEGER type (stored as 0 for false, 1 for true)
  • Everything else β†’ TEXT type (including objects stored as JSON strings, arrays, null)
  • Type conflicts resolve to TEXT

Performance

Typical performance on modern hardware (M1/M2 Mac, Ryzen 5000+, i7-11th gen+):

File Size Objects Time Speed
10MB ~10K 5-10s ~1MB/s
100MB ~100K 30-60s ~1.7MB/s
500MB ~500K 2-5min ~1.7-4MB/s
1GB ~1M 5-10min ~1.7-3MB/s

Performance Factors:

Positive factors (increase speed):

  • βœ… Larger batch sizes (but use more memory)
  • βœ… Flat object structures (vs deeply nested)
  • βœ… Fewer columns in schema
  • βœ… Modern hardware (faster CPU/better browser engine)

Limiting factors (decrease speed):

  • ❌ Frequent schema changes (ALTER TABLE operations)
  • ❌ Very complex nested objects
  • ❌ Low memory conditions

Optimization Tips:

  • Use 2000-5000 batch size for files > 100MB
  • Reduce schema sample size if data is consistent
  • Close other browser tabs to free memory
  • Use Chrome/Edge for best performance (V8 engine optimizations)

⚠️ Limitations

Browser Memory

  • Maximum file size: ~1-2GB (depends on available browser memory)
  • Resulting database: Must fit in memory (~1-2GB)
  • Larger files may crash the browser tab

Browser Compatibility

  • βœ… Chrome 90+
  • βœ… Firefox 88+
  • βœ… Safari 14+
  • βœ… Edge 90+
  • ❌ Internet Explorer (not supported)

Requires:

  • Web Workers (module type support)
  • File API
  • WebAssembly
  • ES6 modules
  • Service Workers (for offline functionality)

Progressive Web App (PWA) Support

This application is a full Progressive Web App with complete offline functionality:

Features:

  • βœ… Service Worker: Automatically caches all resources for offline use
  • βœ… Web App Manifest: Can be installed as a standalone app on any device
  • βœ… Offline-First: Works completely offline after first visit
  • βœ… Smart Caching: Network-first for HTML, cache-first for static assets
  • βœ… CDN Caching: External dependencies (SQL.js, JSON parser) are cached locally
  • βœ… Auto-Update: New versions are detected and installed automatically

How It Works:

  1. On first visit, the service worker caches all essential resources
  2. CDN dependencies (SQL.js, streaming parser, WASM) are cached
  3. Subsequent visits load instantly from cache
  4. Works completely offline - no internet needed after first load
  5. Updates are downloaded in the background and applied on next visit

Installation:

  • On mobile: Use "Add to Home Screen" from browser menu
  • On desktop: Look for install prompt in address bar or browser menu
  • Once installed, the app behaves like a native application

Cached Resources:

  • Application code (HTML, CSS, JavaScript)
  • Icons and manifest
  • Database worker script
  • External libraries: SQL.js, @streamparser/json
  • SQLite WebAssembly binary

This means you can:

  • πŸ”’ Convert sensitive files with zero network access
  • ✈️ Use the tool while traveling without internet
  • πŸ”οΈ Work in remote locations without connectivity
  • πŸš€ Experience instant loading after first visit
  • πŸ’Ύ Install as a dedicated app on your device

Module Workers: A Technical Achievement

This project uses ES6 Module Workers, which was non-trivial to implement:

The Challenge: Traditional web workers use importScripts(), which doesn't support modern ES modules. To use import statements for sql.js and @streamparser/json, we needed module workers.

The Solution:

// Main thread creates module worker const worker = new Worker('/workers/db-worker.js', { type: 'module' }); // Worker can use native ES6 imports (actual code from db-worker.js) import initSqlJs from 'https://esm.sh/sql.js@1.10.3'; import { JSONParser } from 'https://esm.sh/@streamparser/json@0.0.22';

Security Note: The CDN imports shown are the actual implementation used in production. While convenient for quick deployment, using CDN dependencies has security implications:

  • βœ… Pros: Easy setup, no bundling needed, automatic caching
  • ⚠️ Cons: Dependency on third-party CDN uptime, potential supply chain risk

For production deployments with sensitive data or stricter security requirements, consider:

  • Hosting libraries locally within your repository
  • Using Subresource Integrity (SRI) hashes to verify CDN resources
  • Implementing Content Security Policy (CSP) headers
  • Reviewing and auditing the library source code

Benefits:

  • βœ… Modern import syntax instead of importScripts()
  • βœ… Direct use of ES module libraries
  • βœ… Better code organization and dependency management
  • βœ… Type-safe imports with TypeScript
  • βœ… Leverages CDN module conversion (esm.sh)

This allows the worker to use cutting-edge libraries while keeping the main thread completely free for UI operations.

πŸ› Troubleshooting

Debugging

The app now includes comprehensive console logging! Open your browser's developer console (F12) to see:

  • Detailed processing information
  • Object structure and flattening
  • Schema detection steps
  • Batch insertion progress
  • Any errors or warnings

See DEBUGGING.md for a complete debugging guide.

"Out of Memory" Error

  • Reduce batch size
  • Try a smaller file
  • Close other tabs
  • Use a browser with more available memory

Slow Processing

  • Increase batch size
  • Reduce schema sample size
  • Ensure no other heavy processes are running

Invalid JSON

  • Ensure your JSON is valid (use a validator)
  • Check for unescaped special characters
  • Verify the JSON is an array of objects

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • SQL.js - SQLite compiled to WebAssembly
  • Inspired by the need for privacy-focused data processing tools

πŸ“§ Contact

Andrei Șugubete - @andreisugu

Project Link: https://github.com/andreisugu/json-to-sqlite


Made with ❀️ for the privacy-conscious developer community

About

A Client-Side Privacy Friendly webtool to convert JSON -> SQLITE | GitHub Pages

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors