A powerful browser-based tool to convert large JSON files to SQLite databases entirely client-side. Perfect for GitHub Pages deployment!
- π 100% Private: Your data never leaves your device - everything runs in the browser
- β‘ Streaming Architecture: Handles large JSON files (1GB+) without memory crashes
- π Fast Processing: Batched inserts with transactions for optimal performance
- π― Smart Schema Detection: Automatically detects columns and types from your data
- π§ Customizable: Configure table name, batch size, and schema sample size
- π Real-time Progress: Live progress tracking with detailed statistics
- πΎ Zero Cost: No server required - perfect for GitHub Pages
- π¨ Beautiful UI: Modern, responsive interface with smooth animations
This isn't just another JSON converter - it's a production-grade streaming data pipeline running entirely in your browser. Here's what sets it apart:
Unlike traditional converters that load the entire file into RAM, this implementation uses a streaming parser that processes data in chunks:
- Chunk-based processing: Files are read in 64KB chunks via the File API
- Incremental parsing: JSON objects are parsed as chunks arrive using
@streamparser/json - Batch inserts: Objects are written to SQLite in configurable batches (default: 1000 rows)
- Memory efficiency: A 1GB file can be converted on a device with only 2GB RAM without crashes
Technical Flow: File β 64KB Chunks β Stream Parser β Objects β Batch Buffer (1000 rows) β SQLite
The entire processing pipeline runs in a Web Worker (separate thread), meaning:
- Main thread stays free: The UI remains completely responsive during conversion
- No freezing: You can interact with the page, view logs, and monitor progress in real-time
- Module Worker: Uses modern ES6 modules with native
importstatements - True parallelism: Parser and database operations run independently of UI rendering
The converter doesn't require predefined schemas - it discovers and adapts on the fly:
- Runtime column discovery: New fields like
configs_extraData_sharpnessDenoiseare detected during processing - On-the-fly schema updates: Columns are created dynamically using
ALTER TABLEstatements - Automatic backfilling: Previously inserted rows get
NULLvalues for new columns - Nested object flattening: Deeply nested structures are automatically flattened to SQL columns
- Type inference: Automatically detects INTEGER, REAL, and TEXT types from sample data
Example: If row 5,000 introduces a new field, the table schema updates automatically without reprocessing.
Your data stays 100% local - no servers, no uploads, no tracking:
- Client-side only: All processing happens in your browser's JavaScript engine
- WASM sandbox: SQLite runs in WebAssembly with no external access
- No network calls: Data never leaves your device (after initial library loads)
- Perfect for sensitive data: Medical records, financial data, personal information - all stays private
- Offline capable: Full Progressive Web App (PWA) support - works completely offline after first visit
- Install as app: Can be installed on your device and used like a native application
| Feature | This Tool | Traditional Server-Based | Python/CLI Tools |
|---|---|---|---|
| Privacy | β Data never uploaded | β Data sent to server | β Local processing |
| Setup | β Zero setup (just open URL) | β Server required | β Install Python + deps |
| Large Files | β Streaming (1GB+ files) | β Can handle large files | |
| UI Blocking | β Web Worker (responsive) | N/A | β Blocks terminal |
| Cost | β Free (GitHub Pages) | π° Server hosting costs | β Free |
| Platform | β Any modern browser | β OS-specific install | |
| Schema Evolution | β Dynamic discovery | ||
| Accessibility | β Just share a link | β Need server access | β Need tool installed |
This tool implements a streaming "Bucket Brigade" architecture with three independent stages:
βββββββββββββββββββ β Main Thread β π File Selection & UI β (React) β π Progress Display ββββββββββ¬βββββββββ βοΈ Configuration β postMessage βΌ βββββββββββββββββββ β Web Worker β π Streaming JSON Parser (@streamparser/json) β (Module Type) β π Schema Detection & Evolution β β πΎ SQLite Operations (SQL.js/WASM) βββββββββββββββββββ π¦ Batched Transaction Inserts β βΌ βββββββββββββββββββ β SQLite Database β πΏ In-Memory WASM Database β (Binary) β β¬οΈ Exported as .sqlite file βββββββββββββββββββ -
File Streaming (Main Thread)
- Reads file in 64KB chunks using the File API
- Sends raw chunk buffers to worker via
postMessage - Non-blocking reads keep UI responsive
-
Stream Parsing (Web Worker)
- Uses
@streamparser/jsonfor true SAX-style parsing - Parses JSON incrementally without loading entire file
- Handles partial objects across chunk boundaries
- Emits complete objects for processing
- Uses
-
Schema Evolution (Web Worker)
- Scans first N objects (default: 100) to build initial schema
- Flattens nested objects into underscore-notation columns (
user_address_city) - Detects data types (INTEGER, REAL, TEXT)
- Dynamically adds columns when new fields appear
- Backfills existing rows with NULL for new columns
-
Batch Writing (Web Worker)
- Buffers objects into batches (default: 1000 rows)
- Wraps each batch in a SQLite transaction
- Executes parameterized INSERT statements
- Dramatically faster than individual inserts
-
Export & Download (Main Thread)
- Worker sends completed database as binary array
- Main thread creates downloadable Blob
- User downloads .sqlite file
- Next.js 14: React framework with static export for GitHub Pages
- TypeScript: Type-safe development with full IDE support
- Tailwind CSS: Modern, responsive styling
- SQL.js (1.10.3): SQLite compiled to WebAssembly for browser execution
- @streamparser/json: Streaming JSON parser for true SAX-style parsing
- Web Workers (Module): ES6 module worker with native imports
- File API: Browser-native file reading without server upload
- WebAssembly: Native-speed SQLite execution in browser sandbox
This repository is configured for automatic deployment to GitHub Pages:
- Fork this repository
- Go to Settings β Pages
- Set Source to "GitHub Actions"
- Push to the
mainbranch - the site will deploy automatically - Visit
https://yourusername.github.io/json-to-sqlite/
The deployment workflow runs automatically on every push to the main branch.
# Clone the repository git clone https://github.com/andreisugu/json-to-sqlite.git cd json-to-sqlite # Install dependencies npm install # Start development server npm run dev # Build for production npm run build # Open browser to http://localhost:3000- Select JSON File: Click "Choose JSON File" and select your JSON file
- Configure Options:
- Table Name: Name for your SQLite table (default: "data")
- Schema Sample Size: Number of objects to scan for schema (default: 100)
- Batch Size: Rows per transaction for performance (default: 1000)
- Start Conversion: Click "Start Conversion" and wait for processing
- Download: Once complete, download your SQLite database
This tool is perfect for:
- Healthcare: Convert patient records to SQLite without HIPAA concerns
- Financial: Process transaction data without uploading to servers
- Legal: Handle confidential documents with complete privacy
- Personal: Your diary, photos metadata, or browsing history stays local
- API exports: Convert API responses to queryable databases
- Log analysis: Transform JSON logs into SQLite for SQL queries
- Data migration: Move data between systems via universal SQLite format
- Research data: Process survey results or experiment data offline
- Mock data: Convert JSON fixtures to SQLite test databases
- Prototype databases: Quick database creation from JSON samples
- Data exploration: Use SQL to explore complex JSON structures
- CI/CD: Generate test databases in GitHub Actions (no server needed)
- Field research: Convert data on laptops without internet
- Remote locations: Process data where cloud access is limited
- Air-gapped systems: Works on systems isolated from networks
- Bandwidth constrained: No upload/download of large files to servers
The tool works with JSON arrays of objects:
[ { "id": 1, "name": "John Doe", "email": "john@example.com", "age": 30, "active": true }, { "id": 2, "name": "Jane Smith", "email": "jane@example.com", "age": 25, "active": false } ]Nested objects are automatically flattened:
{ "user": { "name": "John", "address": { "city": "New York", "zip": "10001" } } }Becomes columns: user_name, user_address_city, user_address_zip
Arrays are stored as JSON strings in the database.
- Small files (<10MB): 500-1000 rows
- Medium files (10-100MB): 1000-2000 rows
- Large files (>100MB): 2000-5000 rows
Larger batch sizes = faster processing but more memory usage.
- Consistent data: 50-100 objects
- Variable data: 200-500 objects
- Highly variable: 500-1000 objects
More samples = better schema detection but slower startup.
The tool uses multiple strategies to handle files larger than available RAM:
- Chunked Reading: Files are read in 64KB chunks via
FileReader.readAsArrayBuffer() - Streaming Parsing:
@streamparser/jsonlibrary parses JSON incrementally using SAX-style events - Batched Inserts: Rows are buffered and inserted in configurable batches (default: 1000 rows)
- Worker Threads: Heavy processing isolated in Web Worker to prevent main thread blocking
- Buffer Management: Efficient string buffer handles incomplete objects across chunk boundaries
- Transaction Batching: SQLite transactions group inserts for 50-100x performance improvement
Memory Footprint (approximate, measured with Chrome DevTools):
- Streaming parser overhead: ~10-20MB
- Batch buffer: ~5-10MB (1000 objects)
- SQLite in-memory database: Size of actual data + indexes
- Total overhead: ~15-30MB (parser + buffer), plus the resulting database size
The schema system adapts dynamically as data is processed:
Initial Schema Building (First N objects):
- Scans sample objects to discover all fields
- Flattens nested objects using underscore notation (
user_address_city) - Detects types based on JavaScript typeof and value patterns
- Creates initial SQLite table with discovered columns
Dynamic Column Addition Process:
When a new field appears in row 5,000: 1. Detect new field (e.g., configs_extraData_sharpnessDenoise) 2. Infer type from value 3. Execute: ALTER TABLE data ADD COLUMN configs_extraData_sharpnessDenoise TEXT 4. Continue processing (existing rows automatically have NULL) Type Detection Rules:
- Numbers that are integers β INTEGER type
- Numbers with decimal points β REAL type
- Boolean values β INTEGER type (stored as 0 for false, 1 for true)
- Everything else β TEXT type (including objects stored as JSON strings, arrays, null)
- Type conflicts resolve to TEXT
Typical performance on modern hardware (M1/M2 Mac, Ryzen 5000+, i7-11th gen+):
| File Size | Objects | Time | Speed |
|---|---|---|---|
| 10MB | ~10K | 5-10s | ~1MB/s |
| 100MB | ~100K | 30-60s | ~1.7MB/s |
| 500MB | ~500K | 2-5min | ~1.7-4MB/s |
| 1GB | ~1M | 5-10min | ~1.7-3MB/s |
Performance Factors:
Positive factors (increase speed):
- β Larger batch sizes (but use more memory)
- β Flat object structures (vs deeply nested)
- β Fewer columns in schema
- β Modern hardware (faster CPU/better browser engine)
Limiting factors (decrease speed):
- β Frequent schema changes (ALTER TABLE operations)
- β Very complex nested objects
- β Low memory conditions
Optimization Tips:
- Use 2000-5000 batch size for files > 100MB
- Reduce schema sample size if data is consistent
- Close other browser tabs to free memory
- Use Chrome/Edge for best performance (V8 engine optimizations)
- Maximum file size: ~1-2GB (depends on available browser memory)
- Resulting database: Must fit in memory (~1-2GB)
- Larger files may crash the browser tab
- β Chrome 90+
- β Firefox 88+
- β Safari 14+
- β Edge 90+
- β Internet Explorer (not supported)
Requires:
- Web Workers (module type support)
- File API
- WebAssembly
- ES6 modules
- Service Workers (for offline functionality)
This application is a full Progressive Web App with complete offline functionality:
Features:
- β Service Worker: Automatically caches all resources for offline use
- β Web App Manifest: Can be installed as a standalone app on any device
- β Offline-First: Works completely offline after first visit
- β Smart Caching: Network-first for HTML, cache-first for static assets
- β CDN Caching: External dependencies (SQL.js, JSON parser) are cached locally
- β Auto-Update: New versions are detected and installed automatically
How It Works:
- On first visit, the service worker caches all essential resources
- CDN dependencies (SQL.js, streaming parser, WASM) are cached
- Subsequent visits load instantly from cache
- Works completely offline - no internet needed after first load
- Updates are downloaded in the background and applied on next visit
Installation:
- On mobile: Use "Add to Home Screen" from browser menu
- On desktop: Look for install prompt in address bar or browser menu
- Once installed, the app behaves like a native application
Cached Resources:
- Application code (HTML, CSS, JavaScript)
- Icons and manifest
- Database worker script
- External libraries: SQL.js, @streamparser/json
- SQLite WebAssembly binary
This means you can:
- π Convert sensitive files with zero network access
βοΈ Use the tool while traveling without internet- ποΈ Work in remote locations without connectivity
- π Experience instant loading after first visit
- πΎ Install as a dedicated app on your device
This project uses ES6 Module Workers, which was non-trivial to implement:
The Challenge: Traditional web workers use importScripts(), which doesn't support modern ES modules. To use import statements for sql.js and @streamparser/json, we needed module workers.
The Solution:
// Main thread creates module worker const worker = new Worker('/workers/db-worker.js', { type: 'module' }); // Worker can use native ES6 imports (actual code from db-worker.js) import initSqlJs from 'https://esm.sh/sql.js@1.10.3'; import { JSONParser } from 'https://esm.sh/@streamparser/json@0.0.22';Security Note: The CDN imports shown are the actual implementation used in production. While convenient for quick deployment, using CDN dependencies has security implications:
- β Pros: Easy setup, no bundling needed, automatic caching
β οΈ Cons: Dependency on third-party CDN uptime, potential supply chain riskFor production deployments with sensitive data or stricter security requirements, consider:
- Hosting libraries locally within your repository
- Using Subresource Integrity (SRI) hashes to verify CDN resources
- Implementing Content Security Policy (CSP) headers
- Reviewing and auditing the library source code
Benefits:
- β
Modern import syntax instead of
importScripts() - β Direct use of ES module libraries
- β Better code organization and dependency management
- β Type-safe imports with TypeScript
- β Leverages CDN module conversion (esm.sh)
This allows the worker to use cutting-edge libraries while keeping the main thread completely free for UI operations.
The app now includes comprehensive console logging! Open your browser's developer console (F12) to see:
- Detailed processing information
- Object structure and flattening
- Schema detection steps
- Batch insertion progress
- Any errors or warnings
See DEBUGGING.md for a complete debugging guide.
- Reduce batch size
- Try a smaller file
- Close other tabs
- Use a browser with more available memory
- Increase batch size
- Reduce schema sample size
- Ensure no other heavy processes are running
- Ensure your JSON is valid (use a validator)
- Check for unescaped special characters
- Verify the JSON is an array of objects
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- SQL.js - SQLite compiled to WebAssembly
- Inspired by the need for privacy-focused data processing tools
Andrei Θugubete - @andreisugu
Project Link: https://github.com/andreisugu/json-to-sqlite
Made with β€οΈ for the privacy-conscious developer community