Questions tagged [etl]
Extract, Transform, Load - process in a database
54 questions
-1 votes
2 answers
194 views
Should uniqueness validation be on the database level or backend codebase level in data import
There is a kinda ETL task of importing data from csv to the database in project with legacy codebase and legacy database**. Data should be validated before persisting to database. Validation includes ...
-1 votes
1 answer
228 views
Modeling a CSV file: What is the standard? Python or SQL?
I have a wide CSV file of about 350mb, and want to load it into a SQL database and properly model the data to make it easier to use for analysis. I could split the data into tables with python and ...
1 vote
1 answer
234 views
Data pipeline design - robust and resilient to future variations
I need to build a data pipeline to populate a database from various files. This is a common scenario. However, I want to have expert opinions for implementing a pipeline that is robust, modular and ...
1 vote
1 answer
528 views
Is microservice approach always best fit for ETL processes?
In our project we are using Django and Django Rest Framework as main application to get/query the data from database and send it to the frontend. Those endpoints are very fast as they should be. ...
0 votes
1 answer
88 views
Running ad hoc queries on JSON log files
I have a situation where let's say I have a folder called logs which has N folders. Each folder contains events for a specific event type and each folder has N .log files where each file has multiple ...
-2 votes
2 answers
491 views
What happens after the ETL process?
I have thousands of .csv files with the same structure and, in most of the cases, some column values are the same ones recurring. Each file represents a report on some structures, with numeric ...
-1 votes
1 answer
37 views
Duplicating API implementations for declaring intention
I'm developing an ETL process in Python and Pandas to pull data from a rest API, and then dump it into a relational database. A few of the fields that come back contain sensitive that I do not want to ...
0 votes
1 answer
53 views
Is there any general guidelines to allocate table space quota to different layers in ETL?
I am looking for any general guidelines to allocate table space quota to different layers/schemas in ETL flow of a data warehouse (% of total space in each layer). As per my research, ETL flow can ...