Build software better, together

In this project I stream data and do crime classification using Spark. This dataset contains incidents derived from the SFPD Crime Incident Reporting system. The data ranges from 1/1/2003 to 5/13/2015. I do some data analysis of crime scenes in different areas and with respect to other parameters.

spark-streaming spark-mllib spark-ml

Updated Dec 21, 2021
Python

desaiankitb / spark-mllib

Star

Apache Spark is one of the most widely used and supported open-source tools for machine learning and big data. In this repo, discover how to work with this powerful platform for machine learning. This repo discusses MLlib—the Spark machine learning library—which provides tools for data scientists and analysts who would rather find solutions to b…

apache-spark python3 spark-mllib spark-ml

Updated May 3, 2018
Python

JinbinYu / MLwithSpark

Star

对Spark ML进行二次封装，提供api调用

mysql python flask spark-mllib

Updated Mar 24, 2017
Python

NupurShukla / Movie-Recommendation-System

Star

data-mining map-reduce spark-mllib movie-recommendation-system inf553 local-sensitivity-hashing

Updated Aug 16, 2018
Python

Paranoid-kid / Movie-Recommender-System

Star

A movie recommender system using user-based collaborative filtering algorithm.

python flask machine-learning spark telegram-bot recommender-system spark-mllib

Updated Apr 25, 2019
Python

abouslimi / spark-ml-product-recommendation

Star

Real-time product recommendation system built using Apache Spark, Kafka, and Python.

python docker kafka big-data spark docker-compose bigdata python3 spark-streaming kafka-consumer kafka-producer spark-sql spark-mllib product-recommendation product-recommender-system

Updated Dec 21, 2024
Python

OmarAlhaz / E-Commerce-Sales-Forecasting-with-PySpark

Star

Forecasting e-commerce product demand using PySpark MLlib. Includes data preprocessing, feature engineering, Random Forest modeling, and evaluation via Mean Absolute Error.

machine-learning ecommerce big-data time-series random-forest pyspark data-engineering feature-engineering demand-forecasting spark-mllib

Updated Oct 16, 2025
Python

Mohitsai / epidemic-engine

Star

Streaming ETL data pipeline for health event monitoring and predictive analytics using Kafka, Airflow, Docker, Hadoop and Spark ML.

spark apache-spark apache-kafka health-data spark-mllib etl-pipeline healthcare-analysis healthcare-data

Updated Mar 13, 2025
Python

corneliouzbett / Master-Apache-Spark

Star

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph p…

python spark python3 pyspark spark-streaming spark-sql spark-mllib spark-ml

Updated Mar 17, 2019
Python

SayamAlt / Amazon-Products-API-ETL-and-ML-pipeline

Star

In this project, I've created an end-to-end ETL pipeline and subsequently developed a machine learning model to predict the price of Amazon products based on several product-related features.

machine-learning apache-spark linear-regression feature-engineering regression-models data-ingestion spark-sql extract-transform-load spark-mllib azure-data-factory etl-pipeline azure-databricks delta-lake model-training-and-evaluation azure-data-lake-storage-gen2

Updated Nov 26, 2024
Python

pathak-ashutosh / sentiment-analysis-yelp-reviews

Star

Perform sentiment analysis on Yelp dataset with Apache Spark

natural-language-processing big-data apache-spark hadoop sentiment-analysis data-visualization pyspark data-engineering hdfs data-pipeline spark-sql spark-mllib spark-nlp

Updated Aug 7, 2024
Python

bassrehab / zerofish-imaging

Star

Using the Thunder Library for Image Processing with Spark ML Lib

spark pyspark thunder spark-mllib-library spark-mllib

Updated Mar 5, 2017
Python

SayamAlt / Formula-1-Data-Ingestion-Transformation---ETL-Pipeline

Star

This project demonstrates a complete ETL pipeline for Formula 1 racing data using Azure Databricks, Delta Lake, and Azure Data Factory. It covers data ingestion, transformation with PySpark and Spark SQL, data governance with Unity Catalog, and visualization through Power BI. Designed to showcase real-world data engineering workflows in Azure.

data-transformation data-engineering spark-streaming data-ingestion spark-sql spark-mllib microsoft-azure databricks-notebooks azure-databricks delta-lake workflow-orchestration etl-pipelines azure-data-lake-storage-gen2

Updated Nov 14, 2024
Python

berksudan / PySpark-Auto-Clustering

Star

Implemented an auto-clustering tool with seed and number of clusters finder. Optimizing algorithms: Silhouette, Elbow. Clustering algorithms: k-Means, Bisecting k-Means, Gaussian Mixture. Module includes micro-macro pivoting, and dashboards displaying radius, centroids, and inertia of clusters. Used: Python, Pyspark, Matplotlib, Spark MLlib.

spark clustering pyspark kmeans-clustering spark-mllib elbow-method gaussian-mixture clustering-analysis bisecting-kmeans silhouette-score

Updated Feb 4, 2025
Python

lkptl / Yelp_Business_Success_Rate_Prediction_Based_On_Reviews

Star

This repo contains code for restuarant recommendation system for users based upon business rating value.

python json mongodb regression matrix-factorization recommendation-engine spark-mllib spark-ml

Updated Jan 13, 2020
Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spark-mllib

Here are 36 public repositories matching this topic...

shre1000 / Sentiment-Analysis-of-Twitter-Data-using-pySpark-and-Live-Graphs

giuseppegambino / Italian-Sentiment-Analysis-with-Spark

DavideNardone / TwitterSentimentAnalysis

TrainingByPackt / Big-Data-Processing-with-Apache-Spark-eLearning

cbozan / graduation-project

MHassaanButt / Crime-Spark-ML

desaiankitb / spark-mllib

JinbinYu / MLwithSpark

NupurShukla / Movie-Recommendation-System

Paranoid-kid / Movie-Recommender-System

abouslimi / spark-ml-product-recommendation

OmarAlhaz / E-Commerce-Sales-Forecasting-with-PySpark

Mohitsai / epidemic-engine

corneliouzbett / Master-Apache-Spark

SayamAlt / Amazon-Products-API-ETL-and-ML-pipeline

pathak-ashutosh / sentiment-analysis-yelp-reviews

bassrehab / zerofish-imaging

SayamAlt / Formula-1-Data-Ingestion-Transformation---ETL-Pipeline

berksudan / PySpark-Auto-Clustering

lkptl / Yelp_Business_Success_Rate_Prediction_Based_On_Reviews

Improve this page

Add this topic to your repo