A containerized basketball data analytics platform that leverages multiple services to collect, process, store, and visualize basketball player and match data.
This project is designed to automate and manage basketball data workflows using Docker containers. The platform includes ETL pipelines, data storage, web scraping, and data visualization, all orchestrated and monitored using modern containerization and orchestration tools.
The project consists of the following Docker containers, each with a specific role:
-
Airflow Container
- Port Mapping:
8080:8080 - Role: Manages and schedules ETL (Extract, Transform, Load) pipelines for data processing.
- Port Mapping:
-
Frontend Container
- Port Mapping:
2425:5000 - Role: Provides a web interface for users to analyze and visualize basketball player performance and match statistics.
- Port Mapping:
-
MinIO Container
-
Port Mapping: -- Server:
9000:9000-- UI:9090:9090 -
Role: Object storage for player images and other static assets.
-
-
MongoDB Container
- Port Mapping: -- Server:
27017:27017-- Mongo Express:8081:8081 - Role: Stores shooting statistics for each match. Each document represents a different match.
- Port Mapping: -- Server:
-
Postgres Container
- Port Mapping: -- Server:
5432:5432-- PgAdmin4:5050:80 - Role: Relational database for storing structured data about players, matches, and teams.
- Port Mapping: -- Server:
-
Selenium Container
- Port Mapping: -- Selenium hub:
4444:4444-- Publish port:4442:4442-- Suscribe port:4443:4443 - Role: Used for web scraping to extract basketball data from external sources.
- Port Mapping: -- Selenium hub:
-
Monitoring Container
- Tools: Prometheus and Grafana
- Role: Monitors the health and performance of all containers and services.
- ETL Pipelines: Automated data processing workflows using Airflow.
- Data Storage: Combination of relational (Postgres) and NoSQL (MongoDB) databases for structured and semi-structured data.
- Object Storage: MinIO for storing player images and other files.
- Web Scraping: Selenium for extracting data from external websites.
- Data Visualization: Frontend interface using Taipy framework for users to analyze player and match data.
- Monitoring: Comprehensive monitoring using Prometheus and Grafana.
-
Prerequisites
- Docker installed on your system.
- Docker Compose installed for orchestrating multiple containers.
-
Clone the Repository
git clone https://github.com/ayllon99/basketball-project.git cd basketball-project -
Create a Docker Network
docker network create principal_network
-
Build and Run Containers Recomended compose each container one by one starting by postgres_container
docker-compose up -d
-
Access the Services
- Airflow:
http://localhost:8080 - Frontend:
http://localhost:2425 - MinIO:
http://localhost:9090 - Mongo Express:
http://localhost:8081 - pgAdmin:
http://localhost:5050 - Selenium:
http://localhost:4444 - Prometheus:
TBD - Grafana:
TBD
- Airflow:
The project includes Prometheus and Grafana for monitoring the health and performance of all containers. You can access the Grafana dashboard at TBD and Prometheus at TBD.
- Port Conflicts: Ensure that the ports listed in the
docker-compose.ymlfile are not in use by other services. - Service Startup Issues: Check the logs for each container using
docker logs <container-name>. - Network Issues: Verify that all containers are connected to the
principal_networknetwork.
- Internationalization: Add new pipelines to get data from more countries and leagues. (Extracting data only from Spain at this moment)
- Scalability: Implement horizontal scaling for the MongoDB and Postgres containers.
- Backup and Recovery: Add automated backup and recovery mechanisms for the databases.
- Security: Implement proper authentication and authorization for all services.
- CI/CD: Set up a CI/CD pipeline for automated testing and deployment.
Contributions are welcome! If you'd like to contribute to this project, please fork the repository and submit a pull request. For major changes, please open an issue first to discuss the changes.