ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipeline ― Cloud Storage, Dataproc, PySpark, Cloud Spanner and Tableau
- Updated
Mar 9, 2022 - Python
ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipeline ― Cloud Storage, Dataproc, PySpark, Cloud Spanner and Tableau
An end-to-end machine learning project predicting DoorDash delivery durations, utilizing MLOps principles and best practices.
Sanitized ML-based risk scoring pipeline for a Tier-1 UK Retail Bank (GCP + BigQuery ML). Includes Batch/ETL ingestion, feature engineering, BQML training, scoring workflows, governance, lineage, and runbooks. No client code/data.
Docs-only case study of a compliance & anomaly detection platform on Azure + Databricks (Streaming ETL + Batch ELT + ML).
Event-driven data pipeline prototype combining batch and streaming processing with Kafka, Redis, and PostgreSQL — built for learning and realistic data-engineering demos using NYC Taxi data.
Docs-only case study – Compliance Reporting data platform on Azure for a Big-4 Audit & Consulting Firm (BFSI, healthcare-style datasets) using Streaming Pipeline (ETL) + Batch Pipeline (ELT) with Snowflake, Synapse, ADF, Power BI, ML risk scoring, DQ, governance, and lineage.
Data Engineer Training Using Google Cloud Platform
Add a description, image, and links to the batch-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the batch-pipeline topic, visit your repo's landing page and select "manage topics."