MySQL: Automate Data Ingestion from regular txt/csv files to a Database

Question

Intro

I've searched all around about this problem, but I didn't really found a source of knowledge about this, so I'm sorry if this problem seems basic to you, but for me is rather quite intriguing due the fact that I'm having hard time to guess what keywords to use on google in order to retrieve proper info.

Problem Description :

As a matter of fact, i have to issues that i don't know how to deal in a MySQL instance installed in a laptop in a windows environment:

I have a DB in MySQL with 50 tables, of with 15 or 20 tables are tables with original data. The other tables were tables that i generated from the original data tables, in order to properly create tables that would allow me to analyze data in PowerBI. The original data tables are fed by dumps from a ERP Database. My issue is the following:

How would one automate the process of receiving cumulative txt/csv files (via pen-drive or any other transfer mechanism), store those files into a folder and then update the existing tables with the new information? Is there any reference of best practices to deal with such a scenario? How can i maintain the good shape of my database with the successive data integration, I mean, how can I make my database scalable and responsive? Can you point me some sources that would help me with this?

At the moment I imported data into tables, in 2 steps:

1st - I created the table structure with the Workbench import wizard help ( I had to do it this way because the tables have a lot of fields - dozens of them, literally, and those fields need to be in the database). I also inserted primary keys and indexes in those tables;

2nd - I Managed to load the data from the files into those tables, using LOAD DATA IN FILE command.

Some of the fields of the tables created with the import wizard, were created as data type text, with is not necessary in this scenario. I would like to revert those fields to data type NVARCHAR(255) or something, However there are a lot of field to alter the data type and in multiple tables at this point, and i was wondering if i can write a query to do the job of creating all the ALTER TABLES statements i need.

So my issue here is: is it safe to alter the data type in multiple fields in multiple columns (in this case i would like to change fields with datatype text to NAVARCHAR(255))? What is the best way to do this? Can you point me to some sources or best practices for this, please?

Thank you, in advance, for your help. Cheers

You could build queries by interrogating information_schema.columns and submit them using prepared statements.dev.mysql.com/doc/refman/8.0/en/… — P.Salmon
– P.Salmon, Commented Oct 18, 2018 at 15:07
@P.Salmon, I'm not sure i did follow along, but are you sugesting me to use something like this: stackoverflow.com/questions/35622020/… in order to answer my issue number 2? Any ideias about my issue number 1? Thanks. :) — zStrike
– zStrike, Commented Oct 19, 2018 at 7:22
Is the load a complete set of data? Or some kind of incrementatal set of rows? — Rick James
– Rick James, Commented Oct 19, 2018 at 23:54
What i need is some incremental loading. At the moment i got the tables i need in the data base, even if they need some optimization in terms of data types used as described in my step 2. If i was using SQL Server i know i needed to build a SSIS package to integrate the data incrementally. However i don't know how this is done in MySQL, and that's where my doubts are. Thanks for taking the time to answer. :) — zStrike
– zStrike, Commented Oct 20, 2018 at 11:15

Rick James · Accepted Answer · 2018-10-19 23:59:09Z

1

You need a scripting language, not a UI. See mysql commandline tool, the shell of your OS, etc, etc.

DROP DATABASE and reCREATE it
LOAD DATA
Massage the data to get the columns cleaner than what the load data provided
Sic the BI tool on the data.

If you want to discuss Step 3, we need details about what transformations are needed between step 2 and step 4. That includes providing the format or schema for steps 2 and 4.

answered Oct 19, 2018 at 23:59

Rick James

144k15 gold badges144 silver badges255 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

zStrike Over a year ago

So your telling me the ETL process is done via a scripting language, like python for instance? I find this funny in some way, because i started using MySQL to store and access data, with a decent volume, via R and/or python, and now that i need to build a data pipeline i need to go back to python. Nice, who said doing data science was easy :P Anyway, Can you please provide me some proper sources of info/best practices about how to build data pipelines in python to integrate data incrementally in MySQL? And again, thank you for taking the time to answer. :)

Rick James Over a year ago

@zStrike - Well you can use mysql as a scripting language by saying mysql <my_etl_process.sql But you don't have access to if statements, etc.

zStrike Over a year ago

thanks mate but in this regard i prefer to use python even though i'm not a seasoned python programmer, i'm sure i'll make it through some how, given the amounts of python tutorials out there. I just need some good pointers about to where to start looking, in order to reach my goal the more rapidly possible :)

Rick James Over a year ago

The decision between SQL versus Python for ETL is (1) what you are comfortable with, and (2) what the language is powerful enough to do. SQL is good for manipulating an entire table's worth of data; Python (etc) is better for complex manipulations of the data. I usually use a mixture of SQL and Perl.

Collectives™ on Stack Overflow

MySQL: Automate Data Ingestion from regular txt/csv files to a Database

1 Answer 1

4 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Linked

Related