Skip to content
View WarFox's full-sized avatar
🦊
⭐️ ⭐️ ⭐️ ⭐️ ⭐️
🦊
⭐️ ⭐️ ⭐️ ⭐️ ⭐️

Organizations

@coderwall-charity @coderwall-komododragon @coderwall-python @coderwall-polygamous

Block or report WarFox

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

⭐ Data

Data processing and engineering
34 repositories

re_data - fix data issues before your users & CEO would discover them 😊

HTML 1,569 124 Updated Apr 30, 2024

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

Python 20,925 5,098 Updated Mar 22, 2026

The Metadata Platform for your Data and AI Stack

Java 11,694 3,402 Updated Mar 22, 2026

AWS Glue code samples

Python 1,535 834 Updated Nov 5, 2025

Immutable database and Datalog query engine for Clojure, ClojureScript and JS

Clojure 5,727 316 Updated Oct 11, 2025

The easy-to-use open source Business Intelligence and Embedded Analytics tool that lets everyone work with data 📊

Clojure 46,505 6,317 Updated Mar 22, 2026

FADI - Ingest, store and analyse big data flows

JavaScript 46 15 Updated Feb 12, 2024

Data Lake as Code, featuring ChEMBL and OpenTargets

TypeScript 173 46 Updated Nov 20, 2023

Nessie: Transactional Catalog for Data Lakes with Git-like semantics

Java 1,439 172 Updated Mar 21, 2026

This repository has moved into https://github.com/dbt-labs/dbt-adapters

Python 444 240 Updated Jul 16, 2025

Data Contracts engine for the modern data stack. https://www.soda.io

Python 2,311 260 Updated Mar 20, 2026

🔥 🔥 🔥 A Free & Self-hostable Airtable Alternative

TypeScript 62,537 4,676 Updated Mar 21, 2026

AI and Machine Learning with Kubeflow, Amazon EKS, and SageMaker

Jupyter Notebook 3,432 1,089 Updated Jul 31, 2024

Apache Superset is a Data Visualization and Data Exploration Platform

TypeScript 71,050 16,831 Updated Mar 22, 2026

A Clojure dataframe library that runs on Spark

Clojure 293 27 Updated Nov 28, 2023

Dremio - the missing link in modern data

Java 1,475 460 Updated Sep 26, 2025

The live data layer for apps and AI agents. Create up-to-the-second views into your business, just using SQL

Rust 6,253 498 Updated Mar 22, 2026

An implementation of differential dataflow using timely dataflow on Rust.

Rust 2,924 204 Updated Mar 21, 2026

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Java 12,655 3,541 Updated Mar 21, 2026

CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data…

Python 4,986 2,089 Updated Mar 20, 2026

A modular implementation of timely dataflow in Rust

Rust 3,589 293 Updated Mar 21, 2026

Self-serve BI to 10x your data team ⚡️

TypeScript 5,653 687 Updated Mar 21, 2026

This is a collecton of Amazon CDK projects to show how to directly ingest streaming data from Amazon Mananged Service for Apache Kafka (MSK) and MSK Serverless into Apache Iceberg table in S3 with …

Python 16 Updated Sep 10, 2024

Event Driven Orchestration & Scheduling Platform for Mission Critical Applications

Java 26,574 2,534 Updated Mar 20, 2026

🦀 event stream processing for developers to collect and transform data in motion to power responsive data intensive applications.

Rust 5,183 530 Updated Mar 20, 2026

TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.

Go 39,939 6,158 Updated Mar 22, 2026

do more with dbt. dbt-fal helps you run Python alongside dbt, so you can send Slack alerts, detect anomalies and build machine learning models.

Python 857 76 Updated Apr 5, 2024