Skip to content

ravishan16/converSQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

37 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

New Documentation

We have introduced new documentation to help you understand the modular architecture and migration path.

Please refer to the following documents:

converSQL logo

converSQL

CI Code Coverage License: MIT Built with Streamlit

Transform natural language questions into production-ready SQL with ontological context and warm, human-centered design.

Why converSQL

The challenge

  • Business teams wait on backlogs of custom SQL while analysts juggle endless report tweaks.
  • Complex domains like mortgage analytics demand institutional knowledge that traditional BI tools cannot encode.
  • Open data is abundant, but combining it with AI safely and accurately remains tedious.

Our approach

  • Ontology-first modeling captures relationships, risk logic, and business vocabulary once and reuses it everywhere.
  • Adapter-based AI orchestration lets you swap Claude, Bedrock, Gemini, or local engines without touching the UI.
  • Streamlit experience design bridges analysts and executives with curated prompts, cached schemas, and explainable results.

Flagship implementation: Single Family Loan Analytics

The reference app ships with 9M+ rows of Fannie Mae loan performance data. Ask the AI for โ€œhigh-risk California loans under 620 credit scoreโ€ and get DuckDB-ready SQL plus rich metrics at a glance.

Spotlight features

  • ๐Ÿง  110+ fields grouped into 15 ontology domains with risk heuristics baked into prompts.
  • โšก CSV โžœ Parquet pipeline with enforced types, 10ร— compression, and predicate pushdown via DuckDB.
  • ๐Ÿ” Google OAuth guardrails with optional Cloudflare D1 logging.
  • ๐Ÿค– Multi-provider AI adapters (Bedrock, Claude, Gemini) with graceful fallbacks and prompt caching.
SELECT LOAN_ID, STATE, CSCORE_B, OLTV, DTI, DLQ_STATUS, CURRENT_UPB FROM data WHERE STATE = 'CA' AND CSCORE_B < 620 AND CSCORE_B IS NOT NULL ORDER BY CSCORE_B ASC, OLTV DESC LIMIT 20;

Architecture at a glance

Streamlit UI (app.py) โ””โ”€ Core orchestration (src/core.py) โ”œโ”€ DuckDB execution โ”œโ”€ Cached schema + ontology context โ””โ”€ Data sync checks (scripts/sync_data.py) โ””โ”€ AI service (src/ai_service.py) โ”œโ”€ Adapter registry (src/ai_engines/*) โ”œโ”€ Prompt construction with risk framework โ””โ”€ Clean SQL post-processing 

๐Ÿ—๏ธ Architecture

converSQL follows a clean, layered architecture designed for extensibility:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Application Layer โ”‚ โ”‚ (Streamlit UI โ€ข Query Builder โ€ข Ontology Explorer) โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ AI Engine Layer โ”‚ โ”‚ (Adapter Pattern: Bedrock โ€ข Claude โ€ข Gemini โ€ข Ollama) โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Intelligence Layer โ”‚ โ”‚ (Ontology โ€ข Schema Context โ€ข Business Rules) โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Data Layer โ”‚ โ”‚ (Parquet Files โ€ข DuckDB โ€ข R2 Storage โ€ข Query Execution) โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ 

The Data Engineering Pipeline

Our showcase implementation demonstrates a complete data engineering workflow:

  1. Ingestion: Fannie Mae's pipe-separated loan performance files
  2. Transformation: Schema enforcement with explicit data types (VARCHAR, Float, Int16, etc.)
  3. Storage: Parquet format with SNAPPY compression (10x size reduction)
  4. Performance: DuckDB for blazing-fast analytical queries
  5. Ontology: Structured metadata linking business concepts to database schema

๐Ÿ“„ Learn more about the data pipeline โ†’

Brand palette

Token Hex Description
--color-background #FAF6F0 Ivory linen canvas across the app
--color-background-alt #FDFDFD Porcelain surfaces for cards and modals
--color-text-primary #3A3A3A Charcoal Plum headings
--color-text-secondary #7C6F64 Warm Taupe body copy
--color-accent-primary #DDBEA9 Soft Clay primary accent
--color-accent-primary-darker #B45F4D Terracotta hover and emphasis
--color-border-light #E4C590 Gold Sand borders, dividers, and tags

Quick start

  1. Install prerequisites
    git clone https://github.com/ravishan16/converSQL.git cd converSQL pip install -r requirements.txt
  2. Configure environment
    cp .env.example .env # Enable one AI block (CLAUDE_API_KEY, AWS_* for Bedrock, or GEMINI_API_KEY) # Provide Google OAuth or set ENABLE_AUTH=false for local dev
  3. Launch the app
    streamlit run app.py

Key documentation

  • Architecture โ€“ layered design and component interactions.
  • Data pipeline โ€“ ingest, transformation, and Parquet strategy.
  • AI engines โ€“ adapter contracts and extension guides.
  • Environment setup โ€“ required variables for auth, data, and providers.

Developer workflow

  • make setup โ€“ clean install + cache purge.
  • make test-unit / make test โ€“ pytest with coverage that mirrors CI.
  • make format and make lint โ€“ Black (120 cols), isort, flake8, mypy.
  • Cached helpers such as scan_parquet_files() trigger scripts/sync_data.py when Parquet is missingโ€”keep data/processed/ warm during tests.

Contributing

  1. Fork and branch: git checkout -b feature/my-update.
  2. Run formatting + tests before committing.
  3. Open a PR describing the change, provider credentials (if applicable), and test strategy.

See CONTRIBUTING.md for templates, AI adapter expectations, and review checklists.

Broader use cases

  • Financial services โ€“ credit risk, portfolio concentrations, regulatory stress tests.
  • Healthcare โ€“ patient outcomes, clinical trial cohorts, claims analytics.
  • E-commerce โ€“ customer segments, inventory velocity, supply chain exceptions.
  • Any ontology-driven domain โ€“ define your schema metadata and let converSQL converse.

Roadmap snapshot

  • โœ… Multi-AI adapter support with prompt caching and fallbacks.
  • โœ… Mortgage analytics reference implementation.
  • ๐Ÿ”„ Ollama adapter and enhanced SQL validation.
  • ๐Ÿ”ฎ Upcoming: multi-table joins, query explanations, historical learning, self-serve ontology editor.

License

Released under the MIT License.

Acknowledgments

  • Fannie Mae for the Single Family Loan Performance dataset.
  • The DuckDB, Streamlit, and Anthropic/AWS/Google teams for exceptional tooling.
  • The converSQL community for ideas, issues, and adapters.

Stay connected

  • โญ Star the repo to follow releases.
  • ๐Ÿ’ฌ Join discussions or open issues at github.com/ravishan16/converSQL/issues.
  • ๐Ÿ“จ Share what you buildโ€”data should feel conversational.

About

Transform natural language questions into production-ready SQL with ontological context and warm, human-centered design.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors