A comprehensive CLI tool for extracting database schemas and data from MariaDB servers. Designed for efficient database migration, development environment setup, and backup operations with advanced features including foreign key preservation, selective data extraction, and progressive processing.
- Extract: Database metadata extraction with detailed table information
- DDL: Complete schema extraction with CREATE TABLE statements
- Dump: Traditional full database backup using mysqldump
- Data: Advanced selective data extraction with foreign key preservation
- Foreign key dependency resolution with topological sorting
- Configurable data sampling (percentage or fixed row counts)
- Progressive extraction with resume capability for large datasets
- Pattern-based table filtering and exclusion
- Batch processing for optimal performance
- Docker-based execution with zero local dependencies
# Clone the repository git clone https://github.com/rafaelkamimura/mariadb-extractor.git cd mariadb-extractor # Build the Docker image make build# Requires Go 1.25 or later go build -o mariadb-extractor# Create configuration from template make env-example # Edit .env with your database credentials vim .envExample .env configuration:
MARIADB_HOST=your-database-host MARIADB_PORT=3306 MARIADB_USER=your-username MARIADB_PASSWORD=your-password MARIADB_OUTPUT_PREFIX=extraction# Recommended: Complete pipeline with sample data (fastest) make pipeline # Alternative: Full data extraction make pipeline-full # Custom: Specific databases and tables make pipeline-custom ARGS="--databases db1,db2 --sample-tables users:1000"The data command provides advanced selective extraction with foreign key preservation:
# Extract with 10% sampling ./mariadb-extractor data --all-user-databases --sample-percent 10 # Extract specific databases ./mariadb-extractor data --databases db1,db2 --exclude-tables "*_log,*_audit" # Custom sampling per table ./mariadb-extractor data \ --databases myapp \ --sample-tables "users:1000,orders:5000" \ --exclude-tables "*_history" # Resume interrupted extraction ./mariadb-extractor data --resume extraction-id| Flag | Description | Default |
|---|---|---|
--all-user-databases | Extract all non-system databases | - |
--databases | Comma-separated list of databases | - |
--exclude-tables | Pattern-based table exclusion | - |
--sample-percent | Global sampling percentage (0-100) | 0 |
--sample-tables | Per-table row limits (table:count) | - |
--chunk-size | Rows per chunk for large tables | 10000 |
--batch-size | INSERT statement batch size | 100 |
--timeout | Query timeout in seconds | 300 |
--resume | Resume from previous extraction | - |
Extract complete database schemas:
# Extract all user databases ./mariadb-extractor ddl --all-user-databases # Extract specific databases ./mariadb-extractor ddl --databases db1,db2Output:
output/mariadb-ddl.md- Formatted documentationoutput/init-scripts/01-extracted-schema.sql- Executable SQL script
Full database backup using mysqldump:
# Dump all user databases ./mariadb-extractor dump --all-user-databases # Schema only ./mariadb-extractor dump --all-databases --schema-only # Compressed output ./mariadb-extractor dump --all-databases --compressExtract database and table metadata:
# Extract all databases ./mariadb-extractor extract --all-databases # Generate JSON output ./mariadb-extractor extract --output metadata| Target | Description |
|---|---|
make pipeline | Complete pipeline with 10% sample data |
make pipeline-full | Complete pipeline with full data |
make pipeline-custom | Pipeline with custom extraction parameters |
| Target | Description |
|---|---|
make ddl | Extract database schemas |
make setup-from-ddl | Initialize local database with schema |
make extract-data-sample | Extract 10% sample data |
make extract-data-full | Extract complete data |
make seed-dev-data | Import extracted data to local database |
| Target | Description |
|---|---|
make setup-dev | Start local MariaDB and Adminer |
make dev-db-connect | Connect to local database via CLI |
make status | Show service and file status |
make clean | Clean generated files and containers |
# 1. Configure connection cp .env.example .env vim .env # 2. Run pipeline (DDL -> Setup -> Extract -> Seed) make pipeline # 3. Access database # Web UI: http://localhost:8080 # CLI: make dev-db-connect# Extract complete production data make pipeline-full # Or manually with custom settings make ddl make setup-from-ddl make extract-data-custom ARGS="--databases prod_db --max-rows 50000" make seed-dev-data# Extract specific tables with sampling docker run --rm \ --env-file .env \ -v $(pwd):/app/output \ mariadb-extractor data \ --databases users_db,orders_db \ --sample-tables "users:10000,orders:50000,products:all" \ --exclude-tables "*_temp,*_backup"mariadb-extractor/ ├── cmd/ │ ├── root.go # CLI root command │ ├── extract.go # Metadata extraction │ ├── ddl.go # Schema extraction │ ├── dump.go # Full backup │ └── data.go # Selective data extraction ├── internal/ │ └── config/ │ └── env.go # Environment configuration ├── output/ # Generated files │ └── init-scripts/ │ └── *.sql # Database initialization scripts └── docker-compose.yml The data command implements a sophisticated extraction pipeline:
- Schema Analysis: Discovers foreign key relationships
- Dependency Resolution: Topological sort for correct table ordering
- Extraction Planning: Optimizes based on table sizes and sampling
- Progressive Extraction: Chunks large tables with progress tracking
- Data Generation: Creates optimized INSERT statements
- Automatic dependency detection via
information_schema - Topological sorting ensures correct extraction order
SET FOREIGN_KEY_CHECKS=0/1wrapper for safe imports- Preserves referential integrity across sampled data
| Variable | Description | Default |
|---|---|---|
MARIADB_HOST | Database host | localhost |
MARIADB_PORT | Database port | 3306 |
MARIADB_USER | Database username | - |
MARIADB_PASSWORD | Database password | - |
MARIADB_OUTPUT_PREFIX | Output file prefix | mariadb-extract |
MARIADB_TIMEOUT | Query timeout (seconds) | 300 |
MARIADB_CHUNK_SIZE | Rows per chunk | 10000 |
MARIADB_BATCH_SIZE | Batch insert size | 100 |
- MariaDB: Local database instance (port 3307)
- Adminer: Web-based database management (port 8080)
- Extractor: Main application container
- Chunked Processing: Configurable chunk size for memory efficiency
- Batch Inserts: Reduces I/O with configurable batch sizes
- Progress Tracking: Resume capability for interrupted extractions
- Connection Pooling: Optimized database connections
- Percentage Sampling: Consistent sampling across all tables
- Fixed Row Counts: Specific limits per table
- Pattern Exclusion: Skip log, audit, and temporary tables
- Foreign Key Preservation: Maintains relationships in sampled data
output/mariadb-ddl.md: Human-readable schema documentationoutput/init-scripts/01-extracted-schema.sql: Complete DDL statements
output/data-extract.sql: INSERT statements with datadata-extract.progress: Resume tracking file
output/mariadb-extract.md: Formatted database informationoutput/mariadb-extract.json: Structured metadata
Connection Timeout
# Increase timeout export MARIADB_TIMEOUT=600Foreign Key Errors
- Handled automatically with
SET FOREIGN_KEY_CHECKS=0 - Tables extracted in dependency order
Large Dataset Memory Issues
# Reduce chunk size export MARIADB_CHUNK_SIZE=5000Resume Failed Extraction
# Check progress file ls *.progress # Resume with ID make extract-data-resume# Install dependencies go mod download # Build binary go build -o mariadb-extractor # Run tests go test ./...# Build image docker build -t mariadb-extractor . # Run with local code docker run --rm -v $(pwd):/app/output mariadb-extractor --help- No hardcoded credentials in source code
- Environment-based configuration
- Secure password handling via temporary config files
- Optional table exclusion for sensitive data
- Pattern-based filtering for PII protection
- Docker Engine 20.10+
- Docker Compose 1.29+
- Go 1.25+
- MariaDB client tools (for dump command)
- Network access to MariaDB server
MIT License - See LICENSE file for details
Contributions are welcome. Please ensure:
- Code follows Go best practices
- Tests pass (
go test ./...) - Docker build succeeds
- Documentation is updated
For issues, feature requests, or questions, please open an issue on GitHub.