ETL pipeline using pyspark (Spark - Python)
- Updated
Apr 4, 2020 - CSS
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
ETL pipeline using pyspark (Spark - Python)
Automated Real-Time Indian Railway Twitter Complaint Administration System. It uses Apache Kafka, Spark, MySQL, PHP. The full project was deployed on AWS EC2 and RDS.
Master's thesis on Big Data
The easiest way to figure out how to connect Scala Play and Apache Spark
A dynamic bus ticket booking system using PHP, Apache, and MySQL. Users can search routes, choose seats, make payments, and download tickets. Admins can manage buses, schedules, and special trips.
Data Engineering: Speech-to-text data collection with Kafka, Airflow, and Spark
REPOSITORY FOR MY SOFTWARE DEVELOPMENT AND DATA SCIENCE PORTFOLIO.
Repository for the contents of Technical Blog
Personal blog about Data Engineering
Portfolio Website of Siby Abin Thomas - Senior Data Engineer
Distributed Systems group project
Apache Spark mllib example for seminar 'AI with scala'
Created by Matei Zaharia
Released May 26, 2014