DataHelm

DataHelm is a data engineering framework focused on:

source ingestion orchestration
dbt transformation workflows
notebook-based dashboard execution
reusable provider connectors (SharePoint, GCS, S3, BigQuery)
optional local-LLM analytics query scaffolding

Core Capabilities

Config-driven ingestion using YAML in config/api/
Dagster orchestration for jobs, schedules, and sensors
dbt project execution through analytics/dbt_runner.py and dbt configs
Dashboard generation with Dagstermill notebooks
Reusable handlers/connectors for multiple external providers
Optional NL-to-SQL module (analytics/nl_query/) for local Ollama-based analytics workflows

High-Level Architecture

The repository follows layered responsibilities:

handlers/: provider-specific source connectors and API handlers
ingestion/: ingestion factory + native ingestion implementations
analytics/: dbt, dashboard, and optional NL-query modules
dagster_op/: orchestration objects (jobs, schedules, repository)
config/: all runtime configuration (api, dbt, dashboard, analytics metadata)
tests/: unit tests for handlers, ingestion, analytics, and scripts

Repository Structure

config/ api/ dbt/ dashboard/ analytics/ analytics/ dbt_projects/ notebooks/ nl_query/ dagster_op/ handlers/ api/ sharepoint/ gcs/ s3/ bigquery/ ingestion/ tests/ scripts/ docs/

Local Setup

Prerequisites

Python 3.12+
PostgreSQL (reachable from local environment)
Optional: Docker, local Ollama, dbt CLI

Installation

python3 -m venv .venv source .venv/bin/activate pip install --upgrade pip pip install -e .

Environment Variables

Create a .env file in repository root with required values, for example:

DB_HOST=${DB_HOST} DB_PORT=${DB_PORT} DB_USER=${DB_USER} DB_PASSWORD=${DB_PASSWORD} DB_NAME=${DB_NAME} CLASHOFCLANS_API_TOKEN=${CLASHOFCLANS_API_TOKEN}

Run Dagster Locally

python scripts/run_dagster_dev.py

Useful option:

python scripts/run_dagster_dev.py --print-only

Configuration Model

Ingestion Config (`config/api/*.yaml`)

Defines source-level extraction, publish targets, schedules, and column mapping.

Example currently included:

CLASHOFCLANS_PLAYER_STATS

dbt Config (`config/dbt/projects.yaml`)

Defines dbt units, selection/exclusion rules, vars, and schedules.

Dashboard Config (`config/dashboard/projects.yaml`)

Defines notebook path, source table mapping, chart columns, and cadence.

Analytics Semantic Config (`config/analytics/semantic_catalog.yaml`)

Defines dataset metadata for the isolated NL-to-SQL module.

Reusable Connectors

The repository includes reusable connector classes under handlers/:

handlers/sharepoint/sharepoint.py
- Microsoft Graph auth + site/file access helpers
handlers/gcs/gcs.py
- upload/download/list/delete/signed URL helpers
handlers/s3/s3.py
- upload/download/list/delete/presigned URL helpers
handlers/bigquery/bigquery.py
- query, row fetch, dataframe load, schema helpers

Local LLM Analytics Module

analytics/nl_query/ is an isolated module for natural-language-to-SQL generation using local Ollama:

semantic catalog loader
SQL read-only safety guard
Ollama client wrapper
orchestration service

Testing

Run all tests:

.venv/bin/python -m pytest -q

Current suite covers:

ingestion and handler behavior
analytics factory and runner logic
connector modules (SharePoint, GCS, S3, BigQuery)
script behavior
NL-query safety and service paths

CI/CD and Branching

dev: integration branch
master: release/production branch

Workflows:

CI: tests on development and PR flows
Docker Release: image build/publish on master
Deploy Release: workflow_run/manual deployment orchestration

Containerization

Container image is defined via Dockerfile.

Default runtime command starts Dagster gRPC:

python -m dagster api grpc -m dagster_op.repository

Deployment

Deployment flow is workflow-based:

production auto-path after successful Docker release
manual staging/production dispatch path

Contributing and Governance

Contribution guide: CONTRIBUTING.md
Code of conduct: CODE_OF_CONDUCT.md
Security reporting: SECURITY.md

Detailed Technical Documentation

For complete, long-form project documentation (operations, architecture, and runbook-style details), see:

docs/document.md

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
.github/workflows		.github/workflows
analytics		analytics
config		config
dagster_op		dagster_op
docs		docs
handlers		handlers
ingestion		ingestion
sandbox		sandbox
scripts		scripts
tests		tests
utils		utils
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py
workspace.yaml		workspace.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataHelm

Core Capabilities

High-Level Architecture

Repository Structure

Local Setup

Prerequisites

Installation

Environment Variables

Run Dagster Locally

Configuration Model

Ingestion Config (`config/api/*.yaml`)

dbt Config (`config/dbt/projects.yaml`)

Dashboard Config (`config/dashboard/projects.yaml`)

Analytics Semantic Config (`config/analytics/semantic_catalog.yaml`)

Reusable Connectors

Local LLM Analytics Module

Testing

CI/CD and Branching

Containerization

Deployment

Contributing and Governance

Detailed Technical Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DataHelm

Core Capabilities

High-Level Architecture

Repository Structure

Local Setup

Prerequisites

Installation

Environment Variables

Run Dagster Locally

Configuration Model

Ingestion Config (config/api/*.yaml)

dbt Config (config/dbt/projects.yaml)

Dashboard Config (config/dashboard/projects.yaml)

Analytics Semantic Config (config/analytics/semantic_catalog.yaml)

Reusable Connectors

Local LLM Analytics Module

Testing

CI/CD and Branching

Containerization

Deployment

Contributing and Governance

Detailed Technical Documentation

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Ingestion Config (`config/api/*.yaml`)

dbt Config (`config/dbt/projects.yaml`)

Dashboard Config (`config/dashboard/projects.yaml`)

Analytics Semantic Config (`config/analytics/semantic_catalog.yaml`)

Packages