A comprehensive Python SDK for MatrixOne that provides SQLAlchemy-like interface for database operations, vector search, fulltext search, snapshot management, PITR, restore operations, table cloning, and mo-ctl integration.
📚 Complete Documentation | 🚀 Quick Start Guide | 🧠 Vector Search & IVF Monitoring | 📖 API Reference
Note: This SDK implementation is generated and optimized through AI-assisted development, leveraging advanced code generation techniques to ensure comprehensive API coverage, type safety, and modern Python best practices.
- 🚀 High Performance: Optimized for MatrixOne database operations with connection pooling
- 🔄 Async Support: Full async/await support with AsyncClient for non-blocking operations
- 🧠 Vector Search: Advanced vector similarity search with HNSW and IVF indexing algorithms
- Support for f32 and f64 precision vectors
- Multiple distance metrics (L2, Cosine, Inner Product)
- ⭐ IVF Index Health Monitoring with
get_ivf_stats()- Critical for production! - ⭐ IVF LIMIT BY RANK - Fine-grained ranking control (pre/post/force modes)
- Configurable index parameters for performance tuning
- 🔍 Fulltext Search: Powerful fulltext indexing and search with BM25 and TF-IDF algorithms
- Natural language and boolean search modes
- Multi-column fulltext indexes
- Relevance scoring and ranking
- 📊 Metadata Analysis: Comprehensive table and column metadata analysis with statistics
- Table size and row count statistics
- Column data distribution analysis
- Index usage metrics
- 🔍 Secondary Index Verification: Verify consistency of secondary indexes with main table
- Get all secondary index table names
- Get specific index table by index name
- Verify row counts across main table and all indexes in single query
- 📂 Stage Management: Comprehensive external stage operations for centralized data storage
- Create and manage stages (file system, S3, cloud storage)
- Load data from stages with automatic path resolution
- Stage-scoped load operations for convenience
- Transaction support for atomic stage operations
- 📥 Data Loading: Pandas-style bulk data loading with intuitive interface
read_csv(),read_json(),read_parquet()methods (pandas-compatible API)- Load from local files or external stages (
stage://protocol) - Pandas-compatible parameters (
sep,quotechar,skiprows,names,encoding) - Support for inline data loading and compression
- Transaction-aware loading for data consistency
- 📤 Data Export: Pandas-style data export with intuitive interface
to_csv()andto_jsonl()methods (pandas-compatible API)- Export to local files or external stages (
stage://protocol) - Support for raw SQL, SQLAlchemy select(), and MatrixOne queries
- Customizable CSV options (sep, quotechar, lineterminator)
- Transaction-aware exports for data consistency
- 📸 Snapshot Management: Create and manage database snapshots at multiple levels
- ⏰ Point-in-Time Recovery: PITR functionality for precise data recovery
- 🔄 Table Cloning: Clone databases and tables efficiently with data replication
- ⭐ SQLAlchemy-style statement builders (
clone_table(),clone_database()) - Snapshot-based point-in-time cloning
- Cross-tenant cloning with TO ACCOUNT support
- IF NOT EXISTS for idempotent operations
- ⭐ SQLAlchemy-style statement builders (
- 🌿 Branch Management: Git-style version control for databases and tables
- Create isolated branches for development and testing
- Compare differences between branches with count-only optimization
- Merge branches with conflict resolution strategies
- Point-in-time branching from snapshots
- ⭐ SQLAlchemy-style statement builders (
create_table_branch(),diff_table_branch(), etc.) - Advanced DIFF output: COUNT, LIMIT, FILE export, snapshot on both sides
- Cross-tenant branching with TO ACCOUNT support
- 👥 Account Management: Comprehensive user, role, and permission management
- 📊 Pub/Sub: Real-time publication and subscription support
- 🔧 Version Management: Automatic backend version detection and compatibility checking
- 🛡️ Type Safety: Full type hints support with comprehensive documentation
- 📚 SQLAlchemy Integration: Seamless SQLAlchemy integration with enhanced ORM features
- 🔗 Enhanced Query Building: Advanced query building with logical operations (logical_and, logical_or, logical_not)
- 🪝 Connection Hooks: Pre/post connection hooks for custom initialization logic
- 🛠️ mo-diag CLI Tool: Interactive diagnostic tool for MatrixOne database maintenance
- Index health monitoring and verification
- IVF/HNSW vector index status inspection
- Table statistics and metadata analysis
- Interactive shell with Tab completion and command history
- Non-interactive mode for scripting and automation
- Batch operations on tables and indexes
🚨 CRITICAL: Always use lowercase with underscores (snake_case) for column names!
MatrixOne does not support SQL standard double-quoted identifiers in queries, which causes issues with camelCase column names when using SQLAlchemy ORM.
# ❌ DON'T: CamelCase column names (will fail in SELECT queries) class User(Base): userName = Column(String(50)) # CREATE succeeds, SELECT fails! userId = Column(Integer) # Will cause SQL syntax errors # ✅ DO: Use lowercase with underscores (snake_case) class User(Base): user_name = Column(String(50)) # Works perfectly user_id = Column(Integer) # All operations succeedWhy this matters:
- ✅ CREATE TABLE works with both styles (uses backticks)
- ✅ INSERT works with both styles
- ❌ SELECT fails with camelCase (uses double quotes, not supported by MatrixOne)
Example of the problem:
# CamelCase generates: SELECT "userName" FROM user ❌ Fails! # snake_case generates: SELECT user_name FROM user ✅ Works!pip install matrixone-python-sdkpip install \ --index-url https://test.pypi.org/simple/ \ --extra-index-url https://pypi.org/simple/ \ matrixone-python-sdkNote: The --extra-index-url is required to install dependencies (PyMySQL, SQLAlchemy, etc.) from the official PyPI.
git clone https://github.com/matrixorigin/matrixone.git cd matrixone/clients/python # Create virtual environment python -m venv venv # Activate virtual environment # On macOS/Linux: source venv/bin/activate # On Windows: # venv\Scripts\activate # Quick setup with Makefile (includes all dev dependencies) make dev-setup # Or manual setup pip install -e '.[dev]' # Includes pyarrow, Faker, pytest, etc.git clone https://github.com/matrixorigin/matrixone.git cd matrixone/clients/python # Create conda environment conda create -n matrixone-dev python=3.10 conda activate matrixone-dev # Quick setup with Makefile make dev-setup # Or manual setup pip install -e '.[dev]'git clone https://github.com/matrixorigin/matrixone.git cd matrixone/clients/python # Quick setup with Makefile make dev-setup # Or manual setup pip install -e '.[dev]'# Install test dependencies (includes pyarrow for Parquet support) pip install -e '.[test]' # Or use requirements files pip install -r requirements-sqlalchemy20.txt # SQLAlchemy 2.x pip install -r requirements-sqlalchemy14.txt # SQLAlchemy 1.4.xThe SDK includes comprehensive LOAD DATA functionality with support for:
- CSV/TSV files: Basic delimited file loading
- JSONLINE files: JSON Lines format with object/array structures
- Parquet files: Columnar format loading (requires
pyarrow) - Inline data: Direct string data loading
- Stage loading: Loading from named external stages
Dependencies for Load Data:
# Required for Parquet file support pip install pyarrow>=10.0.0 # For generating test data pip install Faker>=10.0.0 # Or install everything at once pip install -e '.[dev]' # Includes pyarrow, Faker, pytest, etc.from matrixone import Client # Create and connect to MatrixOne client = Client() client.connect( host='localhost', port=6001, user='root', password='111', database='test' ) # Execute queries result = client.execute("SELECT 1 as test") print(result.fetchall()) # Get backend version (auto-detected) version = client.get_backend_version() print(f"MatrixOne version: {version}") client.disconnect()📝 Connection Parameters
The
connect()method requires keyword arguments (not positional):
database- Required, no default valuehost- Default:'localhost'port- Default:6001user- Default:'root'password- Default:'111'Minimal connection (uses all defaults):
client.connect(database='test')By default, all features (IVF, HNSW, fulltext) are automatically enabled via
on_connect=[ConnectionAction.ENABLE_ALL].
Use client.session() for atomic transactions. All operations within a session succeed or fail together:
from matrixone import Client from sqlalchemy import select, insert, update, delete client = Client() client.connect(database='test') # Basic transaction with automatic commit/rollback with client.session() as session: # All operations are atomic session.execute(insert(User).values(name='Alice', age=30)) session.execute(update(User).where(User.age < 18).values(status='minor')) # Query within transaction stmt = select(User).where(User.age > 25) result = session.execute(stmt) users = result.scalars().all() # Commits automatically on success, rolls back on error client.disconnect()Key Benefits:
- ✅ Atomic operations - all succeed or fail together
- ✅ Automatic rollback on errors
- ✅ Access to all MatrixOne managers (snapshots, clones, load_data, etc.)
- ✅ Full SQLAlchemy ORM support
Load bulk data from CSV, JSON, or Parquet files with pandas-compatible API:
from matrixone import Client from matrixone.orm import declarative_base, Column, Integer, String Base = declarative_base() class User(Base): __tablename__ = 'users' id = Column(Integer, primary_key=True) name = Column(String(100)) email = Column(String(255)) client = Client() client.connect(database='test') # Create table client.create_table(User) # Basic CSV load (pandas-style) client.load_data.read_csv('users.csv', table=User) # CSV with header (pandas-style) client.load_data.read_csv('users.csv', table=User, skiprows=1) # Custom separator (pandas-style) client.load_data.read_csv('users.txt', table=User, sep='|') # Tab-separated (pandas-style) client.load_data.read_csv('users.tsv', table=User, sep='\t') # JSON Lines (pandas-style) client.load_data.read_json('events.jsonl', table='events', lines=True, orient='records') # Parquet (pandas-style) client.load_data.read_parquet('data.parquet', table=User) # Load from stage client.load_data.read_csv('stage://s3_stage/users.csv', table=User) client.load_data.read_csv_stage('s3_stage', 'users.csv', table=User) # Inline data csv_data = "1,Alice,alice@example.com\\n2,Bob,bob@example.com\\n" client.load_data.read_csv(csv_data, table=User, inline=True) client.disconnect()Key Features:
- ✅ Pandas-compatible API (
read_csv(),read_json(),read_parquet()) - ✅ Pandas-style parameters (
sep,quotechar,skiprows,names,encoding) - ✅ Load from local files or external stages (
stage://protocol) - ✅ Support for compression, parallel loading, and inline data
Export query results to CSV or JSONL with pandas-compatible API:
from matrixone import Client from sqlalchemy import select client = Client() client.connect(database='test') # Basic CSV export (pandas-style) client.export.to_csv('/tmp/users.csv', "SELECT * FROM users") # With custom separator client.export.to_csv('/tmp/users.tsv', "SELECT * FROM users", sep='\t') # Export with SQLAlchemy stmt = select(User).where(User.age > 25) client.export.to_csv('/tmp/adults.csv', stmt, sep='|') # Export to JSONL client.export.to_jsonl('/tmp/users.jsonl', "SELECT * FROM users") # Export to external stage (using stage:// protocol) client.export.to_csv('stage://s3_stage/backup.csv', stmt) # Export to external stage (using convenience method) client.export.to_csv_stage('s3_stage', 'backup2.csv', stmt) # Export within transaction with client.session() as session: stmt = select(Product).where(Product.stock > 0) session.export.to_csv('/tmp/in_stock.csv', stmt) client.disconnect()Key Features:
- ✅ Pandas-compatible API (
to_csv(),to_jsonl()) - ✅ Convenience methods for stage exports (
to_csv_stage(),to_jsonl_stage()) - ✅ Support for raw SQL, SQLAlchemy, and MatrixOne queries
- ✅ Export to local files or external stages (
stage://protocol) - ✅ Customizable CSV options (sep, quotechar, lineterminator)
If you have existing SQLAlchemy code, you can wrap your sessions with MatrixOne features without refactoring:
from sqlalchemy import create_engine from sqlalchemy.orm import sessionmaker from matrixone import Client from matrixone.session import Session as MatrixOneSession # Your existing SQLAlchemy setup engine = create_engine('mysql+pymysql://root:111@localhost:6001/test') SessionFactory = sessionmaker(bind=engine) sqlalchemy_session = SessionFactory() # Create MatrixOne client mo_client = Client() mo_client.connect(host='localhost', port=6001, user='root', password='111', database='test') # Wrap your existing session with MatrixOne features mo_session = MatrixOneSession( client=mo_client, wrap_session=sqlalchemy_session ) try: # Your existing SQLAlchemy operations still work result = mo_session.execute("SELECT * FROM users") # Now you can also use MatrixOne-specific features mo_session.stage.create_s3('backup_stage', bucket='my-backups', path='') mo_session.snapshots.create('daily_backup', level='database') mo_session.load_data.read_csv('/data/users.csv', table='users') mo_session.commit() finally: mo_session.close()Use Cases:
- 🔄 Gradual migration from pure SQLAlchemy to MatrixOne
- 🏢 Adding MatrixOne features to existing enterprise applications
- 📦 Legacy code modernization without complete refactoring
- 🔧 Testing MatrixOne features alongside existing code
The SDK provides seamless SQLAlchemy integration with ORM-style operations:
from matrixone import Client from matrixone.orm import Base, Column, Integer, String, select, insert, update, delete # Define ORM models class User(Base): __tablename__ = 'users' id = Column(Integer, primary_key=True) name = Column(String(100)) email = Column(String(255)) age = Column(Integer) client = Client() client.connect(database='test') # Create table from model client.create_table(User) # ORM-style INSERT stmt = insert(User).values(name='John', email='john@example.com', age=30) client.execute(stmt) # ORM-style SELECT stmt = select(User).where(User.age > 25) result = client.execute(stmt) for user in result.scalars(): print(f"User: {user.name}, Age: {user.age}") # ORM-style UPDATE stmt = update(User).where(User.id == 1).values(email='newemail@example.com') client.execute(stmt) # ORM-style DELETE stmt = delete(User).where(User.age < 18) client.execute(stmt) client.disconnect()Recommended Practices:
- ✅ Use
session()for multi-statement transactions - ✅ Use ORM-style statements (
select,insert,update,delete) - ✅ Use
client.execute()for single-statement operations - ✅ Prefer SQLAlchemy statements over raw SQL strings
Full async/await support with AsyncClient and async sessions:
import asyncio from matrixone import AsyncClient from sqlalchemy import select, insert, update async def main(): client = AsyncClient() await client.connect(database='test') # Basic async query result = await client.execute(select(User).where(User.age > 25)) print(result.fetchall()) # Async transaction async with client.session() as session: # All operations are atomic await session.execute(insert(User).values(name='Alice', age=30)) await session.execute(update(User).where(User.age < 18).values(status='minor')) # Concurrent queries with asyncio.gather user_result, order_result = await asyncio.gather( session.execute(select(User)), session.execute(select(Order)) ) # Commits automatically await client.disconnect() asyncio.run(main())Async Benefits:
- ✅ Non-blocking I/O operations
- ✅ Concurrent execution with
asyncio.gather() - ✅ Perfect for FastAPI, aiohttp, and async frameworks
- ✅ Same transaction guarantees as sync version
Create and manage database snapshots. Use session() for atomic snapshot + clone operations:
from matrixone import SnapshotLevel # Single snapshot operation snapshot = client.snapshots.create( name='my_snapshot', level=SnapshotLevel.DATABASE, database='production' ) # List snapshots snapshots = client.snapshots.list() for snap in snapshots: print(f"Snapshot: {snap.name}, Created: {snap.created_at}") # Atomic snapshot + clone in transaction with client.session() as session: # Create snapshot snapshot = session.snapshots.create( name='daily_backup', level=SnapshotLevel.DATABASE, database='production' ) # Clone from snapshot atomically session.clone.clone_database( target_db='production_copy', source_db='production', snapshot_name='daily_backup' ) # Both operations commit togetherfrom matrixone.sqlalchemy_ext import create_vector_column from sqlalchemy import Column, BigInteger, String, Text from matrixone.orm import declarative_base import numpy as np # Define vector table model Base = declarative_base() class Document(Base): __tablename__ = 'documents' # IMPORTANT: HNSW index requires BigInteger primary key id = Column(BigInteger, primary_key=True, autoincrement=True) title = Column(String(200)) content = Column(Text) embedding = create_vector_column(384, precision='f32') # 384-dim vector # Create table using client API client.create_table(Document) # Enable and create HNSW index client.vector_ops.enable_hnsw() client.vector_ops.create_hnsw( 'documents', # table name - positional argument name='idx_embedding', column='embedding', m=16, ef_construction=200 ) # Insert document with vector using client API client.insert(Document, { 'title': 'AI Research Paper', 'content': 'Advanced machine learning techniques...', 'embedding': np.random.rand(384).tolist() }) # Vector similarity search query_vector = np.random.rand(384).tolist() results = client.vector_ops.similarity_search( 'documents', # table name - positional argument vector_column='embedding', query_vector=query_vector, limit=5, distance_type='cosine' ) # ⭐ CRITICAL: Monitor IVF index health for production stats = client.vector_ops.get_ivf_stats('documents', 'embedding') counts = stats['distribution']['centroid_count'] balance_ratio = max(counts) / min(counts) if min(counts) > 0 else float('inf') print(f"Index Health: {len(counts)} centroids, balance ratio: {balance_ratio:.2f}") if balance_ratio > 2.5: print("⚠️ Index needs rebuilding for optimal performance!")# Enable and create fulltext index using client API client.fulltext_index.enable_fulltext() client.fulltext_index.create( 'documents', # table name - positional argument name='ftidx_content', columns=['title', 'content'] ) # Boolean search with encourage (like natural language search) from matrixone.sqlalchemy_ext.fulltext_search import boolean_match results = client.query( Document.title, Document.content, boolean_match('title', 'content').encourage('machine learning techniques') ).execute() # Boolean search with must/should/must_not operators results = client.query( Document.title, Document.content, boolean_match('title', 'content') .must('machine') .must('learning') .must_not('basic') ).execute()Load data from external sources using stages. Use session() for atomic multi-file loading:
from matrixone.orm import Base, Column, Integer, String # Define ORM model class User(Base): __tablename__ = 'users' id = Column(Integer, primary_key=True) name = Column(String(100)) email = Column(String(255)) # Create S3 stage using simple interface client.stage.create_s3( name='data_stage', bucket='my-bucket', path='data/', aws_key_id='your_key', aws_secret_key='your_secret', region='us-east-1' ) # Load data from stage using ORM model (pandas-style) client.load_data.read_csv_stage('data_stage', 'users.csv', table=User) # Atomic multi-file loading in transaction with client.session() as session: # Create local stage session.stage.create_local('import_stage', '/data/imports/') # Load multiple files atomically (pandas-style) session.load_data.read_csv('/data/users.csv', table=User) session.load_data.read_csv('/data/orders.csv', table=Order) # All loads commit togetherStage Benefits:
- ✅ Centralized data source configuration
- ✅ Support for S3, local filesystem, and cloud storage
- ✅ Automatic path resolution
- ✅ Transaction support for atomic operations
# Get table metadata with statistics metadata_rows = client.metadata.scan( dbname='test', tablename='documents', columns='*' # Get structured MetadataRow objects ) for row in metadata_rows: print(f"Column: {row.col_name}") print(f" Nulls: {row.null_cnt}") print(f" Rows: {row.rows_cnt}") print(f" Size: {row.origin_size}") # Get table statistics brief_stats = client.metadata.get_table_brief_stats( dbname='test', tablename='documents' ) table_stats = brief_stats['documents'] print(f"Total rows: {table_stats['row_cnt']}") print(f"Original size: {table_stats['original_size']} bytes") print(f"Compressed size: {table_stats['compress_size']} bytes")# Get all secondary index tables index_tables = client.get_secondary_index_tables('my_table') print(f"Found {len(index_tables)} secondary indexes") # Get specific index by name physical_table = client.get_secondary_index_table_by_name('my_table', 'idx_name') print(f"Index 'idx_name' -> {physical_table}") # Verify all indexes have same count as main table try: count = client.verify_table_index_counts('my_table') print(f"✓ Verification passed! Row count: {count}") except ValueError as e: print(f"✗ Index mismatch detected:") print(e) # Error includes all count details for debugging# Check if feature is available if client.is_feature_available('snapshot_creation'): snapshot = client.snapshots.create('my_snapshot', 'cluster') else: hint = client.get_version_hint('snapshot_creation') print(f"Feature not available: {hint}") # Check version compatibility if client.check_version_compatibility('3.0.0', '>='): print("Backend supports 3.0.0+ features")The SDK automatically detects MatrixOne backend versions and handles compatibility:
- Development Version:
8.0.30-MatrixOne-v→999.0.0(highest priority) - Release Version:
8.0.30-MatrixOne-v3.0.0→3.0.0 - Legacy Format:
MatrixOne 3.0.1→3.0.1
# Check if running development version if client.is_development_version(): print("Running development version - all features available") else: print(f"Running release version: {client.get_backend_version()}")The mo-diag command-line tool provides an interactive shell for diagnosing and maintaining MatrixOne databases, with a special focus on vector indexes, secondary indexes, and table statistics.
After installing the SDK, mo-diag is automatically available as a command:
pip install matrixone-python-sdk mo-diag --helpLaunch the interactive shell to execute multiple diagnostic commands:
# Connect to default localhost mo-diag --database test # Connect to remote database mo-diag --host 192.168.1.100 --port 6001 --user admin --password secret --database productionInteractive Features:
- 🔍 Tab Completion: Press
Tabto auto-complete commands, table names, and database names - ⬆️⬇️ Command History: Use arrow keys to browse command history
- 🔎 History Search: Press
Ctrl+Rto search command history - 🎨 Colored Output: Clear visual feedback with syntax highlighting
- 💾 Persistent History: Command history saved to
~/.mo_diag_history
Execute single commands directly for scripting and automation:
# Check IVF index status mo-diag -d test -c "show_ivf_status" # Get detailed table statistics mo-diag -d test -c "show_table_stats my_table -a -d" # Execute SQL query mo-diag -d test -c "sql SELECT COUNT(*) FROM my_table" # Flush table and all indexes mo-diag -d test -c "flush_table my_table"Manage Change Data Capture directly from the shell or scripts. The interactive commands (cdc_tasks, cdc_task, cdc_create, cdc_drop) now have matching non-interactive helpers via mo-diag cdc.
# Summaries (same as running cdc_tasks inside the shell) mo-diag cdc show # Inspect a task with watermark threshold control mo-diag cdc show nightly_sync --details --threshold=5m # Launch the guided creator with a forced mode mo-diag cdc create --table-level # Drop a task with explicit confirmation override mo-diag cdc drop nightly_sync --forceshow_indexes <table> [database]
- Display all indexes for a table including IVF, HNSW, Fulltext, and regular indexes
- Shows physical table names, index types, and statistics
- Includes object counts, row counts, and sizes for vector/fulltext indexes
MO-DIAG[test]> show_indexes ivf_health_demo_docs 📊 Secondary Indexes for 'test.ivf_health_demo_docs' *************************** 1. row *************************** Index Name: idx_embedding_ivf_v2 Algorithm: ivfflat Table Type: metadata Physical Table: __mo_index_secondary_0199e725-0a7a-77b8-b689-ccdd0a33f581 Columns: embedding Statistics: - Objects: 1 - Rows: 7 - Compressed Size: 940 B - Original Size: 1.98 KB *************************** 2. row *************************** Index Name: idx_embedding_ivf_v2 Algorithm: ivfflat Table Type: centroids Physical Table: __mo_index_secondary_0199e725-0a7b-706e-8f0a-a50edc3621a1 Columns: embedding Statistics: - Objects: 1 - Rows: 17 - Compressed Size: 3.09 KB - Original Size: 6.83 KB *************************** 3. row *************************** Index Name: idx_embedding_ivf_v2 Algorithm: ivfflat Table Type: entries Physical Table: __mo_index_secondary_0199e725-0a7c-77f4-8d0b-48fd8258098a Columns: embedding Statistics: - Objects: 1 - Rows: 1,000 - Compressed Size: 156.34 KB - Original Size: 176.07 KB Total: 3 index tables (1 ivfflat with 3 physical tables) Note: IVF indexes have 3 physical tables (metadata, centroids, entries)
show_all_indexes [database]
- Health report for all tables with secondary indexes in the database
- Row count consistency checks
- IVF/HNSW/Fulltext index status
MO-DIAG[test]> show_all_indexes 📊 Index Health Report for Database 'test': ======================================================================================================================== ✓ HEALTHY (3 tables) ------------------------------------------------------------------------------------------------------------------------ Table Name | Indexes | Row Count | Notes ------------------------------------------------------------------------------------------------------------------------ cms_all_content_chunk_info | 3 | ✓ 32,712 rows | - demo_mixed_indexes | 4 | ✓ 20 rows | - ivf_health_demo_docs | 1 | 300 rows | IVF: 17 centroids, 300 vectors ======================================================================================================================== Summary: ✓ 3 healthy tables Total: 3 tables with indexes 💡 Tip: Use 'verify_counts <table>' or 'show_ivf_status' for detailed diagnostics verify_counts <table> [database]
- Verify row count consistency between main table and all its secondary indexes
- Highlights any mismatches
MO-DIAG[test]> verify_counts my_table 📊 Row Count Verification for 'test.my_table' ════════════════════════════════════════════════════════════════════════════ Main table: 10,000 rows ──────────────────────────────────────────────────────────────────────────── ✓ __mo_index_secondary_xxx: 10,000 rows ✓ __mo_index_unique_yyy: 10,000 rows ════════════════════════════════════════════════════════════════════════════ ✅ PASSED: All index tables match (10,000 rows) show_ivf_status [database] [-v] [-t table]
- Display IVF index building status and centroid distribution
-v: Verbose mode with detailed centroid information-t <table>: Filter by specific table
MO-DIAG[test]> show_ivf_status 📊 IVF Index Status in 'test': ════════════════════════════════════════════════════════════════════════════ Table | Index | Column | Centroids | Vectors | Balance | Status ──────────────────────────────────────────────────────────────────────────── ivf_health_demo_docs | idx_embedding_ivf_v2 | embedding | 17 | 1,000 | 2.35 | ✓ active show_table_stats <table> [database] [-t] [-a] [-d]
- Display table metadata and statistics
-t: Include tombstone statistics-a: Include all indexes (hierarchical view with -d)-d: Show detailed object lists
MO-DIAG[test]> show_table_stats ivf_health_demo_docs -a -d 📊 Detailed Table Statistics for 'test.ivf_health_demo_docs': ====================================================================================================================================================== Table: ivf_health_demo_docs Objects: 1 | Rows: 1,000 | Null: 0 | Original: 176.03 KB | Compressed: 156.24 KB Objects: Object Name | Rows | Null Cnt | Original Size | Compressed Size ---------------------------------------------------------------------------------------------------------------------------------------------------- 0199e729-642e-71e0-b338-67c4980ee294_00000 | 1000 | 0 | 176.03 KB | 156.24 KB Index: idx_embedding_ivf_v2 └─ (metadata): __mo_index_secondary_0199e725-0a7a-77b8-b689-ccdd0a33f581:272851 Objects: 1 | Rows: 7 | Null: 0 | Original: 1.98 KB | Compressed: 940 B Objects: Object Name | Rows | Null Cnt | Original Size | Compressed Size ---------------------------------------------------------------------------------------------------------------------------------------------------- 0199e729-642a-7d36-ac37-0ae17325f7ec_00000 | 7 | 0 | 1.98 KB | 940 B └─ (centroids): __mo_index_secondary_0199e725-0a7b-706e-8f0a-a50edc3621a1:272852 Objects: 1 | Rows: 17 | Null: 0 | Original: 6.83 KB | Compressed: 3.09 KB Objects: Object Name | Rows | Null Cnt | Original Size | Compressed Size ---------------------------------------------------------------------------------------------------------------------------------------------------- 0199e729-6431-794d-9c13-d10f2e4a59e7_00000 | 17 | 0 | 6.83 KB | 3.09 KB └─ (entries): __mo_index_secondary_0199e725-0a7c-77f4-8d0b-48fd8258098a:272853 Objects: 1 | Rows: 1,000 | Null: 0 | Original: 696.11 KB | Compressed: 626.37 KB Objects: Object Name | Rows | Null Cnt | Original Size | Compressed Size ---------------------------------------------------------------------------------------------------------------------------------------------------- 0199e729-6438-7db1-bcb6-7b74c59b0aee_00000 | 1000 | 0 | 696.11 KB | 626.37 KB Note:
- Hierarchical structure: Table → Index → Physical Tables → Objects
- IVF index has 3 physical tables: metadata (7 rows), centroids (17 rows), entries (1,000 rows)
- Physical table name format:
table_name:table_id - Use
-tflag to include tombstone statistics for deleted/updated data
flush_table <table> [database]
- Flush main table and all its secondary index physical tables
- Includes IVF metadata/centroids/entries, HNSW, Fulltext, and regular indexes
- Requires sys user privileges
MO-DIAG[test]> flush_table ivf_health_demo_docs 🔄 Flushing table: test.ivf_health_demo_docs ✓ Main table flushed: ivf_health_demo_docs 📋 Found 3 index physical tables Index: idx_embedding_ivf_v2 ✓ metadata: __mo_index_secondary_0199e725-0a7a-77b8-b689-ccdd0a33f581 ✓ centroids: __mo_index_secondary_0199e725-0a7b-706e-8f0a-a50edc3621a1 ✓ entries: __mo_index_secondary_0199e725-0a7c-77f4-8d0b-48fd8258098a 📊 Summary: Main table: ✓ flushed Index tables: 3/3 flushed successfully Note:
- Automatically discovers and flushes all index physical tables
- For IVF indexes: flushes metadata, centroids, and entries tables
- Requires sys user privileges
- Recommended before backups or after bulk operations
tables [database]
- List all tables in current or specified database
databases
- List all databases (highlights current database)
use <database>
- Switch to a different database
sql <SQL statement>
- Execute arbitrary SQL query
MO-DIAG[test]> sql SELECT COUNT(*) FROM my_table col0 ---- 10000 1 row(s) returned history [n | -c]
- Show last n commands (default: 20)
-c: Clear command history
help [command]
- Show help for all commands or specific command
exit / quit
- Exit the interactive shell
usage: mo-diag [-h] [--host HOST] [--port PORT] [--user USER] [--password PASSWORD] [--database DATABASE] [--log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [--command COMMAND] options: --host HOST Database host (default: localhost) --port PORT Database port (default: 6001) --user USER Database user (default: root) --password PASSWORD Database password (default: 111) --database DATABASE Database name (optional) -d DATABASE Short form of --database --log-level LEVEL Logging level (default: ERROR) --command COMMAND Execute single command and exit -c COMMAND Short form of --command # Quick check on index status mo-diag -d production -c "show_ivf_status -v" # Detailed index inspection with physical table stats mo-diag -d production -c "show_indexes my_vector_table" # Verify centroid distribution balance mo-diag -d production -c "show_ivf_status -t my_vector_table -v"# Check row consistency mo-diag -d test -c "verify_counts my_table" # Get health report for all indexes mo-diag -d test -c "show_all_indexes" # Flush to sync if needed mo-diag -d test -c "flush_table my_table"# Get table statistics with all indexes mo-diag -d test -c "show_table_stats my_table -a" # Detailed object-level analysis mo-diag -d test -c "show_table_stats my_table -a -d" # Include tombstone statistics mo-diag -d test -c "show_table_stats my_table -a -t -d"#!/bin/bash # daily_index_check.sh DATABASES=("prod_db1" "prod_db2" "prod_db3") for db in "${DATABASES[@]}"; do echo "Checking $db..." mo-diag -d "$db" -c "show_all_indexes" >> /var/log/mo_diag_daily.log mo-diag -d "$db" -c "show_ivf_status" >> /var/log/mo_diag_daily.log done- Regular Health Checks: Run
show_all_indexesdaily to catch index issues early - Monitor IVF Balance: Use
show_ivf_status -vto ensure even centroid distribution - Before Major Operations: Always
verify_countsbefore bulk updates or migrations - Production Debugging: Use non-interactive mode for logging and monitoring
- Tab Completion: Leverage Tab completion to avoid typos in table/database names
- Command History: Use
Ctrl+Rto quickly find and re-execute previous diagnostic commands - Flush Regularly: If you notice index count mismatches,
flush_tablecan help sync
Issue: Tab completion not working
- Solution: Ensure
prompt_toolkit>=3.0.0is installed:pip install prompt_toolkit
Issue: "Unknown database" error
- Solution: Create the database first:
mo-diag -d test -c "sql CREATE DATABASE IF NOT EXISTS test"
Issue: Permission denied for flush operations
- Solution: Connect as sys user or a user with sufficient privileges
Issue: IVF index shows 0 centroids
- Solution: The index might be empty or still building. Check with
show_table_stats <table> -a
pitr = client.pitr.create_cluster_pitr( name='cluster_pitr', range_value=7, range_unit='d' )
restore_point = datetime(2024, 1, 15, 10, 30, 0) client.pitr.restore_to_time(restore_point)
### Account Management ```python # Create user user = client.account.create_user( username='newuser', password='password123', description='New user account' ) # Create role role = client.account.create_role( role_name='analyst', description='Data analyst role' ) # Grant privileges client.account.grant_privilege( user='newuser', role='analyst', privileges=['SELECT', 'INSERT'] ) # Create publication publication = client.pubsub.create_publication( name='data_changes', tables=['users', 'orders'], description='User and order changes' ) # Create subscription subscription = client.pubsub.create_subscription( name='data_sync', publication_name='data_changes', target_tables=['users_backup', 'orders_backup'] )# Create a local file system stage (simplified) stage = client.stage.create_local('my_data_stage', './data/', comment='Main data import stage') # Or use the full method with explicit URL stage = client.stage.create('my_data_stage', 'file:///path/to/data/', comment='Main data import stage') # Create an S3 stage with simplified credentials s3_stage = client.stage.create_s3( 'production_stage', 'my-bucket/data/', aws_key='AKIAIOSFODNN7EXAMPLE', aws_secret='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY', aws_region='us-east-1', comment='Production data stage' ) # Or use environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) s3_stage = client.stage.create_s3('production_stage', 'my-bucket/data/') # List all stages stages = client.stage.list() for stage in stages: print(f"{stage.name}: {stage.url}") # Load data from stage (Method 1: via client.load_data, pandas-style) client.load_data.read_csv_stage('my_data_stage', 'users.csv', table=User) # Load data from stage (Method 2: via stage object - simpler!) stage = client.stage.get('my_data_stage') stage.load_csv('users.csv', User) stage.load_json('events.jsonl', Event) stage.load_parquet('orders.parq', Order) # Batch load multiple files results = stage.load_files({ 'users.csv': User, 'events.jsonl': Event, 'orders.parq': Order }) # Modify stage client.stage.alter('my_data_stage', comment='Updated comment') # Drop stage client.stage.drop('my_data_stage')# Create PITR for cluster ## Configuration ### Connection Parameters ```python client = Client( connection_timeout=30, query_timeout=300, auto_commit=True, charset='utf8mb4', sql_log_mode='auto', # 'off', 'simple', 'auto', 'full' slow_query_threshold=1.0 )from matrixone import MatrixOneLogger logger = MatrixOneLogger( level=logging.INFO, sql_log_mode='auto', # 'off', 'simple', 'auto', 'full' slow_query_threshold=1.0, max_sql_display_length=500 ) client = Client(logger=logger)The SDK provides comprehensive error handling with helpful messages:
from matrixone.exceptions import ( ConnectionError, QueryError, VersionError, SnapshotError ) try: snapshot = client.snapshots.create('test', 'cluster') except VersionError as e: print(f"Version compatibility error: {e}") except SnapshotError as e: print(f"Snapshot operation failed: {e}")Check out the examples/ directory for comprehensive usage examples:
Basic Examples:
example_01_basic_connection.py- Basic database operations and connectivityexample_02_account_management.py- User and role managementexample_03_async_operations.py- Async/await operations with AsyncClientexample_04_transaction_management.py- Transaction handling and rollbackexample_05_snapshot_restore.py- Snapshot creation and restore operationsexample_06_sqlalchemy_integration.py- SQLAlchemy ORM integrationexample_07_advanced_features.py- Advanced SDK featuresexample_08_pubsub_operations.py- Pub/Sub operationsexample_09_logger_integration.py- Logging configuration and integrationexample_10_version_management.py- Version compatibility managementexample_11_matrixone_version_demo.py- MatrixOne version detection
Vector Search Examples:
example_12_vector_basics.py- Vector data types and basic distance functionsexample_13_vector_indexes.py- IVF and HNSW index management and performanceexample_14_vector_search.py- Vector similarity search operationsexample_15_vector_advanced.py- Advanced vector operations and optimization
ORM and Advanced Examples:
example_18_snapshot_orm.py- Snapshot operations with ORMexample_19_sqlalchemy_style_orm.py- SQLAlchemy-style ORM patternsexample_20_sqlalchemy_engine_integration.py- SQLAlchemy engine integrationexample_21_advanced_orm_features.py- Advanced ORM featuresexample_22_unified_sql_builder.py- Unified SQL query builderexample_24_query_update.py- Query and update operationsexample_25_metadata_operations.py- Table metadata analysis and statistics
Load Data Examples:
example_23_load_data_operations.py- Comprehensive LOAD DATA operations (CSV, TSV, JSONLINE, Parquet, Inline, Stage)
Stage Management Examples:
example_26_stage_operations.py- External stage management and data loading from stages
Branch Builder Examples:
example_branch_builder.py- SQLAlchemy-style branch statement builders (create, diff, merge)
Clone Builder Examples:
example_clone_builder.py- SQLAlchemy-style clone statement builders (table, database)
Specialized Examples:
example_connection_hooks.py- Connection hooks for custom initializationexample_dynamic_logging.py- Dynamic logging configurationexample_ivf_stats_complete.py- IVF index statistics and analysis
Run all examples:
make examplesRun specific examples:
make example-basic # Basic connection and database operations make example-async # Async/await operations make example-vector # Vector search and similarityRun individual example files:
# Basic examples python examples/example_01_basic_connection.py python examples/example_03_async_operations.py # Vector examples python examples/example_12_vector_basics.py python examples/example_13_vector_indexes.py python examples/example_14_vector_search.py # Load Data examples python examples/example_23_load_data_operations.py # Stage management examples python examples/example_26_stage_operations.py # Metadata and advanced examples python examples/example_25_metadata_operations.py python examples/example_22_unified_sql_builder.pyYou can override the default connection parameters using environment variables:
# Set custom connection parameters export MATRIXONE_HOST=localhost export MATRIXONE_PORT=6001 export MATRIXONE_USER=root export MATRIXONE_PASSWORD=111 export MATRIXONE_DATABASE=test # Run examples with custom parameters make examplesOr set them inline:
MATRIXONE_HOST=localhost MATRIXONE_USER=admin make example-basicAvailable environment variables:
MATRIXONE_HOST: Database host (default: 127.0.0.1)MATRIXONE_PORT: Database port (default: 6001)MATRIXONE_USER: Database username (default: root)MATRIXONE_PASSWORD: Database password (default: 111)MATRIXONE_DATABASE: Database name (default: test)MATRIXONE_CHARSET: Character set (default: utf8mb4)MATRIXONE_CONNECT_TIMEOUT: Connection timeout (default: 30)MATRIXONE_AUTOCOMMIT: Auto-commit mode (default: true)
The MatrixOne Python SDK includes comprehensive testing with support for multiple Python and SQLAlchemy versions.
📖 For detailed testing documentation, see TESTING.md
# Check environment and dependencies make check-env # Install test dependencies make install-sqlalchemy14 # For SQLAlchemy 1.4.x make install-sqlalchemy20 # For SQLAlchemy 2.0.x # Run tests make test-offline # Fast mock-based tests make test-online # Database integration tests (requires MatrixOne) make test-sqlalchemy14 # Full test suite with SQLAlchemy 1.4.x make test-sqlalchemy20 # Full test suite with SQLAlchemy 2.0.x make test-matrix # Test both SQLAlchemy versions- Offline Tests: Mock-based unit tests (450+ tests, ~3-5 seconds)
- Online Tests: Database integration tests (100+ tests, ~20-30 seconds)
- Matrix Tests: Cross-version compatibility testing (Python 3.10/3.11 + SQLAlchemy 1.4/2.0)
# Check environment requirements make check-env # Setup development environment (includes all dev dependencies) make dev-setup # Install specific dependency sets make install-deps-dev # Install development dependencies make install-deps-test # Install test dependencies make install-sqlalchemy14 # Install SQLAlchemy 1.4.x dependencies make install-sqlalchemy20 # Install SQLAlchemy 2.0.x dependencies # Run all tests make test # Run examples make examplesDependency Management:
make install-deps-dev: Installs all development dependencies includingpyarrow,Faker,pytest, etc.make install-deps-test: Installs testing dependencies includingpyarrow(required for Parquet)make install-sqlalchemy14/20: Installs version-specific SQLAlchemy dependencies
Important: pyarrow>=10.0.0 is required for LOAD DATA Parquet functionality. Install via make dev-setup, make install-deps-dev, or make install-deps-test. For detailed testing instructions, environment setup, and troubleshooting, see TESTING.md.
Full Documentation on ReadTheDocs ⭐
Complete documentation with examples, API reference, and production best practices.
Getting Started:
- Quick Start: Get started quickly with common use cases
- Installation Guide: Installation and setup instructions
- Configuration Guide: Connection and client configuration
Core Features:
- Vector Guide: Vector search, indexing, and ⭐ IVF health monitoring
- Fulltext Guide: Fulltext search and indexing
- ORM Guide: SQLAlchemy ORM integration and patterns
- Metadata Guide: Table metadata analysis and statistics
Advanced Features:
- Snapshot & Restore: Snapshot and PITR operations
- Clone Guide: Database and table cloning
- Account Guide: User and role management
- Pub/Sub Guide: Publication and subscription operations
- MoCTL Guide: MoCTL integration
Production Guide:
- Best Practices: Production-ready patterns using SDK APIs
- Connection Hooks: Pre/post connection hooks
Reference:
- API Reference: Complete API reference
- Examples: Comprehensive usage examples
- Testing Guide: Testing instructions and environment setup
Generate HTML documentation:
make docs # Generate HTML documentation make docs-serv # Start documentation server at http://localhost:8000We welcome contributions!
- Fork the repository
- Set up development environment
git clone <your-fork-url> cd matrixone/clients/python # Create virtual environment python -m venv venv source venv/bin/activate # On macOS/Linux # Install development dependencies make dev-setup
- Create a feature branch
git checkout -b feature/your-feature-name
- Make your changes
- Add tests for new functionality
- Run pre-commit checks
make pre-commit # format + lint + test - Submit a pull request
# Always work in virtual environment source venv/bin/activate # Before making changes make test # Ensure all tests pass # After making changes make format # Format code make lint # Check code quality make test # Run tests make examples # Test examples # Before submitting PR make release-check # Complete pre-release checksThis project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- 📧 Email: contact@matrixorigin.cn
- 🐛 Issues: GitHub Issues
- 💬 Discussions: GitHub Discussions
- 📖 Documentation:
See CHANGELOG.md for version history and changes.
MatrixOne Python SDK - Making MatrixOne database operations simple and powerful in Python.