MCP (Model Context Protocol) agent that generates production-ready Apollo GraphQL servers from BigQuery SQL queries with Dataplex lineage tracking.
- 🚀 Auto-generate Apollo GraphQL Servers from BigQuery queries
- 📊 BigQuery Integration with type inference from SQL schemas
- 📝 Dataplex Lineage Tracking for end-to-end data governance
- 🐳 Docker Support for containerized deployments
- 🧪 Test Client Generation for API validation
- 🔌 MCP Protocol for seamless integration with Cursor and other AI assistants
1. Input → 2. Schema Inference → 3. Code Generation → 4. Validation → 5. Output BigQuery SQL Dry-run Analysis Jinja2 Templates Multi-level GCS/Local Queries Type Mapping Apollo Server v4 Checks Files Detailed Steps:
- Input: You provide BigQuery SQL queries via MCP tool
- Schema Inference: Agent runs BigQuery dry-run to infer result types
- Code Generation: Generates complete Apollo Server project with templates
- Validation (optional): Validates generated code at selected level
- Output: Writes validated code to GCS or local filesystem
- Deployment: You run the generated Node.js application
Choose validation thoroughness based on your needs:
| Level | Time | Coverage | Checks | Use Case |
|---|---|---|---|---|
| Quick | ~1s | 80% | GraphQL syntax, SQL dry-run, file structure | Rapid iteration, development |
| Standard | ~10s | 95% | Quick + TypeScript compilation, imports | Default, balanced approach |
| Full | ~60s | 99% | Standard + Docker build, server startup, health check | Pre-production, CI/CD |
The agent generates a complete TypeScript/Node.js project with:
- Apollo Server v4 - GraphQL API server with plugins and context
- Type-safe resolvers - Auto-generated from BigQuery schemas
- Dataplex integration - Runtime lineage event tracking
- Error handling - Production-safe error formatting
- Docker configuration - Multi-stage builds for production
- Test suite - Integration tests and test client
- Python 3.10-3.12
- Poetry (Python dependency management)
- Google Cloud account with BigQuery access
# Clone the repository git clone https://github.com/opendedup/data-graphql-agent.git cd data-graphql-agent # Install dependencies poetry install # Configure environment variables cp .env.example .env # Edit .env with your GCP credentialsCreate a .env file or set environment variables:
# GCP Configuration GCP_PROJECT_ID=your-project-id GCP_LOCATION=us-central1 # Output Configuration GRAPHQL_OUTPUT_DIR=gs://your-bucket/graphql-server # Or local path: GRAPHQL_OUTPUT_DIR=/path/to/output # MCP Server Configuration MCP_TRANSPORT=stdio # or http MCP_HOST=0.0.0.0 MCP_PORT=8080Configure in Cursor's mcp.json:
{ "mcpServers": { "data-graphql-agent": { "command": "poetry", "args": ["run", "python", "-m", "data_graphql_agent.mcp"], "cwd": "/path/to/data-graphql-agent", "env": { "GCP_PROJECT_ID": "your-project", "GRAPHQL_OUTPUT_DIR": "gs://your-bucket/graphql-server" } } } }from data_graphql_agent.generation import ProjectGenerator from data_graphql_agent.clients import StorageClient from data_graphql_agent.models import QueryInput # Define queries queries = [ QueryInput( query_name="trendingItems", sql="SELECT item, SUM(sales) as total FROM `project.dataset.sales` GROUP BY item", source_tables=["project.dataset.sales"] ) ] # Generate project generator = ProjectGenerator(project_id="your-project") files = generator.generate_project("my-project", queries) # Write to storage storage = StorageClient(project_id="your-project") manifests = storage.write_files("gs://bucket/output", files)# Set transport to HTTP export MCP_TRANSPORT=http export MCP_PORT=8080 # Start server poetry run python -m data_graphql_agent.mcpThen call tools via HTTP:
curl -X POST http://localhost:8080/mcp/call-tool \ -H "Content-Type: application/json" \ -d '{ "name": "generate_graphql_api", "arguments": { "queries": [...], "project_name": "my-project" } }'Generates a complete Apollo GraphQL Server project with validation.
Input:
queries: Array of query objects withqueryName,sql, andsource_tablesproject_name: Project name for lineage trackingoutput_path: Optional output location (defaults to GRAPHQL_OUTPUT_DIR)validation_level: Optional validation thoroughness -"quick","standard"(default), or"full"auto_fix: Optional boolean to attempt automatic error fixes (default:false)
Output:
- Complete TypeScript/Node.js project
- Docker configuration
- Test client
- Integration tests
- Validation results with checks passed and warnings
Example with Validation:
result = await handle_generate_graphql_api({ "queries": [ { "queryName": "salesByRegion", "sql": "SELECT region, SUM(amount) as total FROM `project.dataset.sales` GROUP BY region", "source_tables": ["project.dataset.sales"] } ], "project_name": "analytics-api", "output_path": "./output", "validation_level": "standard", # Quick validation for speed "auto_fix": false })Success Response:
{ "success": true, "output_path": "./output", "files_generated": [...], "message": "Successfully generated and validated Apollo GraphQL Server with 1 queries. Generated 15 files at ./output. Validation: 5 checks passed in 8.2s" }Validation Failure Response:
{ "success": false, "output_path": "./output", "files_generated": [], "message": "Code validation failed at standard level", "error": "Validation errors: Invalid SQL in query 'salesByRegion': Table not found; TypeScript compilation failed" }Validates a GraphQL schema file.
Input:
schema_path: Path to schema file
Output:
- Validation results with errors and warnings
graphql-server/ ├── src/ │ ├── server.ts # Main Apollo Server │ ├── typeDefs.ts # GraphQL schema │ ├── resolvers.ts # Query resolvers │ └── lineage.ts # Dataplex integration ├── test-client/ # Test client ├── tests/ # Integration tests ├── package.json ├── tsconfig.json ├── Dockerfile └── docker-compose.yml cd output/graphql-server # Install dependencies npm install # Development mode npm run dev # Production build npm run build npm start # Docker docker-compose up --build# Run all tests poetry run pytest # Run unit tests only poetry run pytest tests/unit # Run with coverage poetry run pytest --cov=data_graphql_agent# Format with Black poetry run black src tests # Lint with Ruff poetry run ruff check src testsThe agent automatically maps BigQuery types to GraphQL types:
| BigQuery Type | GraphQL Type |
|---|---|
| STRING | String |
| INT64 | Int |
| FLOAT64 | Float |
| BOOL | Boolean |
| TIMESTAMP/DATE | String (ISO 8601) |
| STRUCT | Custom Object Type |
| ARRAY | [Type] |
Nested structures (STRUCTs and ARRAYs) are fully supported with automatic type generation.
- Catch errors early - Invalid SQL, type mismatches, and syntax errors detected before deployment
- Faster iteration - No manual debugging of generated code
- Confidence - Know your code will work before running
npm install - Cost savings - Avoid wasted GCS writes and Docker builds for broken code
- CI/CD friendly - Use
fullvalidation in pipelines for guaranteed deployments
Quick Validation (~1s)
- ✅ Rapid prototyping and experimentation
- ✅ Iterating on SQL queries
- ✅ Testing query-to-schema mappings
- ❌ Not for production deployments
Standard Validation (~10s) - Recommended Default
- ✅ Normal development workflow
- ✅ Before committing to version control
- ✅ Balanced speed and thoroughness
- ✅ Most common use case
Full Validation (~60s)
- ✅ Pre-production deployments
- ✅ CI/CD pipelines
- ✅ Critical production updates
- ✅ When Docker compatibility is essential
- ❌ Too slow for rapid iteration
The generated GraphQL server automatically tracks data lineage in Google Cloud Dataplex:
- Process: Each resolver is registered as a process
- Run: Each query execution creates a run (with unique request ID)
- Lineage Events: Link BigQuery sources to BI report targets
- Cleanup: Graceful shutdown removes lineage processes
Lineage operations are asynchronous (fire-and-forget) and don't block API responses.
Apache 2.0 - See LICENSE for details
Contributions are welcome! Please submit pull requests or open issues for bugs and feature requests.