JSON to SQLite Converter 🚀

A powerful browser-based tool to convert large JSON files to SQLite databases entirely client-side. Perfect for GitHub Pages deployment!

✨ Features

🔒 100% Private: Your data never leaves your device - everything runs in the browser
⚡ Streaming Architecture: Handles large JSON files (1GB+) without memory crashes
🚀 Fast Processing: Batched inserts with transactions for optimal performance
🎯 Smart Schema Detection: Automatically detects columns and types from your data
🔧 Customizable: Configure table name, batch size, and schema sample size
📊 Real-time Progress: Live progress tracking with detailed statistics
💾 Zero Cost: No server required - perfect for GitHub Pages
🎨 Beautiful UI: Modern, responsive interface with smooth animations

🌟 Why This Implementation is Special

This isn't just another JSON converter - it's a production-grade streaming data pipeline running entirely in your browser. Here's what sets it apart:

🌊 True Streaming (SAX-style)

Unlike traditional converters that load the entire file into RAM, this implementation uses a streaming parser that processes data in chunks:

Chunk-based processing: Files are read in 64KB chunks via the File API
Incremental parsing: JSON objects are parsed as chunks arrive using @streamparser/json
Batch inserts: Objects are written to SQLite in configurable batches (default: 1000 rows)
Memory efficiency: A 1GB file can be converted on a device with only 2GB RAM without crashes

Technical Flow: File → 64KB Chunks → Stream Parser → Objects → Batch Buffer (1000 rows) → SQLite

🚀 Zero UI Blocking

The entire processing pipeline runs in a Web Worker (separate thread), meaning:

Main thread stays free: The UI remains completely responsive during conversion
No freezing: You can interact with the page, view logs, and monitor progress in real-time
Module Worker: Uses modern ES6 modules with native import statements
True parallelism: Parser and database operations run independently of UI rendering

🔄 Dynamic Schema Evolution

The converter doesn't require predefined schemas - it discovers and adapts on the fly:

Runtime column discovery: New fields like configs_extraData_sharpnessDenoise are detected during processing
On-the-fly schema updates: Columns are created dynamically using ALTER TABLE statements
Automatic backfilling: Previously inserted rows get NULL values for new columns
Nested object flattening: Deeply nested structures are automatically flattened to SQL columns
Type inference: Automatically detects INTEGER, REAL, and TEXT types from sample data

Example: If row 5,000 introduces a new field, the table schema updates automatically without reprocessing.

🔐 Privacy First

Your data stays 100% local - no servers, no uploads, no tracking:

Client-side only: All processing happens in your browser's JavaScript engine
WASM sandbox: SQLite runs in WebAssembly with no external access
No network calls: Data never leaves your device (after initial library loads)
Perfect for sensitive data: Medical records, financial data, personal information - all stays private
Offline capable: Full Progressive Web App (PWA) support - works completely offline after first visit
Install as app: Can be installed on your device and used like a native application

🆚 Comparison with Traditional Approaches

Feature	This Tool	Traditional Server-Based	Python/CLI Tools
Privacy	✅ Data never uploaded	❌ Data sent to server	✅ Local processing
Setup	✅ Zero setup (just open URL)	❌ Server required	❌ Install Python + deps
Large Files	✅ Streaming (1GB+ files)	⚠️ Upload limits	✅ Can handle large files
UI Blocking	✅ Web Worker (responsive)	N/A	❌ Blocks terminal
Cost	✅ Free (GitHub Pages)	💰 Server hosting costs	✅ Free
Platform	✅ Any modern browser	⚠️ Server dependent	❌ OS-specific install
Schema Evolution	✅ Dynamic discovery	⚠️ Often needs upfront schema	⚠️ Varies by tool
Accessibility	✅ Just share a link	❌ Need server access	❌ Need tool installed

🏗️ Architecture

This tool implements a streaming "Bucket Brigade" architecture with three independent stages:

Data Flow Pipeline

┌─────────────────┐ │ Main Thread │ 📁 File Selection & UI │ (React) │ 📊 Progress Display └────────┬────────┘ ⚙️ Configuration │ postMessage ▼ ┌─────────────────┐ │ Web Worker │ 🌊 Streaming JSON Parser (@streamparser/json) │ (Module Type) │ 🔍 Schema Detection & Evolution │ │ 💾 SQLite Operations (SQL.js/WASM) └─────────────────┘ 📦 Batched Transaction Inserts │ ▼ ┌─────────────────┐ │ SQLite Database │ 💿 In-Memory WASM Database │ (Binary) │ ⬇️ Exported as .sqlite file └─────────────────┘

Processing Stages

File Streaming (Main Thread)
- Reads file in 64KB chunks using the File API
- Sends raw chunk buffers to worker via postMessage
- Non-blocking reads keep UI responsive
Stream Parsing (Web Worker)
- Uses @streamparser/json for true SAX-style parsing
- Parses JSON incrementally without loading entire file
- Handles partial objects across chunk boundaries
- Emits complete objects for processing
Schema Evolution (Web Worker)
- Scans first N objects (default: 100) to build initial schema
- Flattens nested objects into underscore-notation columns (user_address_city)
- Detects data types (INTEGER, REAL, TEXT)
- Dynamically adds columns when new fields appear
- Backfills existing rows with NULL for new columns
Batch Writing (Web Worker)
- Buffers objects into batches (default: 1000 rows)
- Wraps each batch in a SQLite transaction
- Executes parameterized INSERT statements
- Dramatically faster than individual inserts
Export & Download (Main Thread)
- Worker sends completed database as binary array
- Main thread creates downloadable Blob
- User downloads .sqlite file

Technology Stack

Next.js 14: React framework with static export for GitHub Pages
TypeScript: Type-safe development with full IDE support
Tailwind CSS: Modern, responsive styling
SQL.js (1.10.3): SQLite compiled to WebAssembly for browser execution
@streamparser/json: Streaming JSON parser for true SAX-style parsing
Web Workers (Module): ES6 module worker with native imports
File API: Browser-native file reading without server upload
WebAssembly: Native-speed SQLite execution in browser sandbox

🚀 Quick Start

GitHub Pages Deployment

This repository is configured for automatic deployment to GitHub Pages:

Fork this repository
Go to Settings → Pages
Set Source to "GitHub Actions"
Push to the main branch - the site will deploy automatically
Visit https://yourusername.github.io/json-to-sqlite/

The deployment workflow runs automatically on every push to the main branch.

Local Development

# Clone the repository git clone https://github.com/andreisugu/json-to-sqlite.git cd json-to-sqlite # Install dependencies npm install # Start development server npm run dev # Build for production npm run build # Open browser to http://localhost:3000

📖 Usage

Select JSON File: Click "Choose JSON File" and select your JSON file
Configure Options:
- Table Name: Name for your SQLite table (default: "data")
- Schema Sample Size: Number of objects to scan for schema (default: 100)
- Batch Size: Rows per transaction for performance (default: 1000)
Start Conversion: Click "Start Conversion" and wait for processing
Download: Once complete, download your SQLite database

💡 Use Cases

This tool is perfect for:

🏥 Sensitive Data Processing

Healthcare: Convert patient records to SQLite without HIPAA concerns
Financial: Process transaction data without uploading to servers
Legal: Handle confidential documents with complete privacy
Personal: Your diary, photos metadata, or browsing history stays local

📊 Data Analysis & Research

API exports: Convert API responses to queryable databases
Log analysis: Transform JSON logs into SQLite for SQL queries
Data migration: Move data between systems via universal SQLite format
Research data: Process survey results or experiment data offline

🚀 Development & Testing

Mock data: Convert JSON fixtures to SQLite test databases
Prototype databases: Quick database creation from JSON samples
Data exploration: Use SQL to explore complex JSON structures
CI/CD: Generate test databases in GitHub Actions (no server needed)

🌍 Offline & Low-Connectivity Scenarios

Field research: Convert data on laptops without internet
Remote locations: Process data where cloud access is limited
Air-gapped systems: Works on systems isolated from networks
Bandwidth constrained: No upload/download of large files to servers

📝 Supported JSON Formats

The tool works with JSON arrays of objects:

[ { "id": 1, "name": "John Doe", "email": "john@example.com", "age": 30, "active": true }, { "id": 2, "name": "Jane Smith", "email": "jane@example.com", "age": 25, "active": false } ]

Nested Objects

Nested objects are automatically flattened:

{ "user": { "name": "John", "address": { "city": "New York", "zip": "10001" } } }

Becomes columns: user_name, user_address_city, user_address_zip

Arrays

Arrays are stored as JSON strings in the database.

⚙️ Configuration

Batch Size

Small files (<10MB): 500-1000 rows
Medium files (10-100MB): 1000-2000 rows
Large files (>100MB): 2000-5000 rows

Larger batch sizes = faster processing but more memory usage.

Schema Sample Size

Consistent data: 50-100 objects
Variable data: 200-500 objects
Highly variable: 500-1000 objects

More samples = better schema detection but slower startup.

🔧 Technical Details

Memory Management

The tool uses multiple strategies to handle files larger than available RAM:

Chunked Reading: Files are read in 64KB chunks via FileReader.readAsArrayBuffer()
Streaming Parsing: @streamparser/json library parses JSON incrementally using SAX-style events
Batched Inserts: Rows are buffered and inserted in configurable batches (default: 1000 rows)
Worker Threads: Heavy processing isolated in Web Worker to prevent main thread blocking
Buffer Management: Efficient string buffer handles incomplete objects across chunk boundaries
Transaction Batching: SQLite transactions group inserts for 50-100x performance improvement

Memory Footprint (approximate, measured with Chrome DevTools):

Streaming parser overhead: ~10-20MB
Batch buffer: ~5-10MB (1000 objects)
SQLite in-memory database: Size of actual data + indexes
Total overhead: ~15-30MB (parser + buffer), plus the resulting database size

Schema Detection & Evolution

The schema system adapts dynamically as data is processed:

Initial Schema Building (First N objects):

Scans sample objects to discover all fields
Flattens nested objects using underscore notation (user_address_city)
Detects types based on JavaScript typeof and value patterns
Creates initial SQLite table with discovered columns

Dynamic Column Addition Process:

When a new field appears in row 5,000: 1. Detect new field (e.g., configs_extraData_sharpnessDenoise) 2. Infer type from value 3. Execute: ALTER TABLE data ADD COLUMN configs_extraData_sharpnessDenoise TEXT 4. Continue processing (existing rows automatically have NULL)

Type Detection Rules:

Numbers that are integers → INTEGER type
Numbers with decimal points → REAL type
Boolean values → INTEGER type (stored as 0 for false, 1 for true)
Everything else → TEXT type (including objects stored as JSON strings, arrays, null)
Type conflicts resolve to TEXT

Performance

Typical performance on modern hardware (M1/M2 Mac, Ryzen 5000+, i7-11th gen+):

File Size	Objects	Time	Speed
10MB	~10K	5-10s	~1MB/s
100MB	~100K	30-60s	~1.7MB/s
500MB	~500K	2-5min	~1.7-4MB/s
1GB	~1M	5-10min	~1.7-3MB/s

Performance Factors:

Positive factors (increase speed):

✅ Larger batch sizes (but use more memory)
✅ Flat object structures (vs deeply nested)
✅ Fewer columns in schema
✅ Modern hardware (faster CPU/better browser engine)

Limiting factors (decrease speed):

❌ Frequent schema changes (ALTER TABLE operations)
❌ Very complex nested objects
❌ Low memory conditions

Optimization Tips:

Use 2000-5000 batch size for files > 100MB
Reduce schema sample size if data is consistent
Close other browser tabs to free memory
Use Chrome/Edge for best performance (V8 engine optimizations)

⚠️ Limitations

Browser Memory

Maximum file size: ~1-2GB (depends on available browser memory)
Resulting database: Must fit in memory (~1-2GB)
Larger files may crash the browser tab

Browser Compatibility

✅ Chrome 90+
✅ Firefox 88+
✅ Safari 14+
✅ Edge 90+
❌ Internet Explorer (not supported)

Requires:

Web Workers (module type support)
File API
WebAssembly
ES6 modules
Service Workers (for offline functionality)

Progressive Web App (PWA) Support

This application is a full Progressive Web App with complete offline functionality:

Features:

✅ Service Worker: Automatically caches all resources for offline use
✅ Web App Manifest: Can be installed as a standalone app on any device
✅ Offline-First: Works completely offline after first visit
✅ Smart Caching: Network-first for HTML, cache-first for static assets
✅ CDN Caching: External dependencies (SQL.js, JSON parser) are cached locally
✅ Auto-Update: New versions are detected and installed automatically

How It Works:

On first visit, the service worker caches all essential resources
CDN dependencies (SQL.js, streaming parser, WASM) are cached
Subsequent visits load instantly from cache
Works completely offline - no internet needed after first load
Updates are downloaded in the background and applied on next visit

Installation:

On mobile: Use "Add to Home Screen" from browser menu
On desktop: Look for install prompt in address bar or browser menu
Once installed, the app behaves like a native application

Cached Resources:

Application code (HTML, CSS, JavaScript)
Icons and manifest
Database worker script
External libraries: SQL.js, @streamparser/json
SQLite WebAssembly binary

This means you can:

🔒 Convert sensitive files with zero network access
✈️ Use the tool while traveling without internet
🏔️ Work in remote locations without connectivity
🚀 Experience instant loading after first visit
💾 Install as a dedicated app on your device

Module Workers: A Technical Achievement

This project uses ES6 Module Workers, which was non-trivial to implement:

The Challenge: Traditional web workers use importScripts(), which doesn't support modern ES modules. To use import statements for sql.js and @streamparser/json, we needed module workers.

The Solution:

// Main thread creates module worker const worker = new Worker('/workers/db-worker.js', { type: 'module' }); // Worker can use native ES6 imports (actual code from db-worker.js) import initSqlJs from 'https://esm.sh/sql.js@1.10.3'; import { JSONParser } from 'https://esm.sh/@streamparser/json@0.0.22';

Security Note: The CDN imports shown are the actual implementation used in production. While convenient for quick deployment, using CDN dependencies has security implications:

✅ Pros: Easy setup, no bundling needed, automatic caching

⚠️ Cons: Dependency on third-party CDN uptime, potential supply chain risk

For production deployments with sensitive data or stricter security requirements, consider:

Hosting libraries locally within your repository

Using Subresource Integrity (SRI) hashes to verify CDN resources

Implementing Content Security Policy (CSP) headers

Reviewing and auditing the library source code

Benefits:

✅ Modern import syntax instead of importScripts()
✅ Direct use of ES module libraries
✅ Better code organization and dependency management
✅ Type-safe imports with TypeScript
✅ Leverages CDN module conversion (esm.sh)

This allows the worker to use cutting-edge libraries while keeping the main thread completely free for UI operations.

🐛 Troubleshooting

Debugging

The app now includes comprehensive console logging! Open your browser's developer console (F12) to see:

Detailed processing information
Object structure and flattening
Schema detection steps
Batch insertion progress
Any errors or warnings

See DEBUGGING.md for a complete debugging guide.

"Out of Memory" Error

Reduce batch size
Try a smaller file
Close other tabs
Use a browser with more available memory

Slow Processing

Increase batch size
Reduce schema sample size
Ensure no other heavy processes are running

Invalid JSON

Ensure your JSON is valid (use a validator)
Check for unescaped special characters
Verify the JSON is an array of objects

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

SQL.js - SQLite compiled to WebAssembly
Inspired by the need for privacy-focused data processing tools

📧 Contact

Andrei Șugubete - @andreisugu

Project Link: https://github.com/andreisugu/json-to-sqlite

Made with ❤️ for the privacy-conscious developer community

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
.github/workflows		.github/workflows
app		app
public		public
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
DEBUGGING.md		DEBUGGING.md
DEVELOPMENT.md		DEVELOPMENT.md
IMPLEMENTATION_SUMMARY.md		IMPLEMENTATION_SUMMARY.md
LICENSE		LICENSE
README.md		README.md
next-env.d.ts		next-env.d.ts
next.config.js		next.config.js
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
sample-data.json		sample-data.json
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

JSON to SQLite Converter 🚀

✨ Features

🌟 Why This Implementation is Special

🌊 True Streaming (SAX-style)

🚀 Zero UI Blocking

🔄 Dynamic Schema Evolution

🔐 Privacy First

🆚 Comparison with Traditional Approaches

🏗️ Architecture

Data Flow Pipeline

Processing Stages

Technology Stack

🚀 Quick Start

GitHub Pages Deployment

Local Development

📖 Usage

💡 Use Cases

🏥 Sensitive Data Processing

📊 Data Analysis & Research

🚀 Development & Testing

🌍 Offline & Low-Connectivity Scenarios

📝 Supported JSON Formats

Nested Objects

Arrays

⚙️ Configuration

Batch Size

Schema Sample Size

🔧 Technical Details

Memory Management

Schema Detection & Evolution

Performance

⚠️ Limitations

Browser Memory

Browser Compatibility

Progressive Web App (PWA) Support

Module Workers: A Technical Achievement

🐛 Troubleshooting

Debugging

"Out of Memory" Error

Slow Processing

Invalid JSON

🤝 Contributing

📄 License

🙏 Acknowledgments

📧 Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages