0

I have a huge 23 GB CSV file I am trying to load, and I've found that instead of converting it to insert statements which takes a while, It's possible to directly load to the DB. So I tried the following syntax:

LOAD DATA LOCAL INFILE client_report.csv into table client_report fields terminated by ',' optionally enclosed by '"' lines terminated by '\r\n' ignore 1 lines; 

mysql> LOAD DATA LOCAL INFILE client_report.csv into table client_report fields terminated by ',' optionally enclosed by '"' lines terminated by '\r\n' ignore 1 lines; ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'client_report.csv into table client_report fields terminated by ',' opti' at line 1

I am at a loss, I seem to be following documentation to the letter, and checked sHOW GLOBAL VARIABLES LIKE 'local_infile'; its ON.

4
  • 2
    Wrap your CSV file name in single quotes. i.e. LOAD DATA LOCAL INFILE 'client_report.csv' into... Commented Oct 29, 2022 at 2:46
  • 1
    This was actually the answer! Commented Oct 29, 2022 at 22:49
  • However the big file failed to insert, there's error I will try perhaps in chunks each time 10M rows Commented Oct 29, 2022 at 22:50
  • hmm apparently this command does not support chunking natively Commented Oct 29, 2022 at 23:00

2 Answers 2

0

The final solution that worked best, is two steps:

  1. Split the file into chunks of 10M rows each using split -l 10000000 file.csv which will generate files like xaa xab, xac ...
  2. Using the below query (note filename in quote):

LOAD DATA LOCAL INFILE 'xaa' into table table_name fields terminated by ',' optionally enclosed by '"' lines terminated by '\r\n' ignore 1 lines (field1,field2,field3,field4,fieldN) SET ID = NULL;

Note that if you don't explicitly set the fields and ID=null, it will try to insert first column into the autogenerated ID column, shifting all content by 1 column leaving the last one empty.

Also note starting from file xab, you need to remove the ignore 1 lines part otherwise you'll skip the first row on each file.

Sign up to request clarification or add additional context in comments.

Comments

-2

The only way I know how to do this is using python. This example uses sqlite, but the concept is the same.

It will murder your ram, but with an large enough pagefile you should be fine. It will also take some time (up to a few hours).

import pandas import sqlite3 import time start_time = time.process_time() db = sqlite3.connect("-database.db") def import_csv(): df = pandas.read_csv('csv_file.csv') df.to_sql('test', con = db, if_exists = 'append', chunksize = 100000) # when 1=1 csv file is imported without only select is executed. sqlite auto creates the database and table with index column if 1 == 1: import_csv() #print(pandas.read_sql_query('select * from test limit 100000000', db)) end_time = time.process_time() print(end_time-start_time) 

2 Comments

The OP hasn't tagged a language other than MySQL. This won't be very helpful if he's using, say, PHP.
The question has nothing to do with Python, so an answer that uses it is not appropriate. This isn't a proper answer to the question asked.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.