I want to copy all data from a MyISAM table with 16 millions rows to a InnoDB table, that will have a different (optimized) schema. See tables below to see the changes.
However, every approach I try leads to a slow result.
#1 INSERT
Using a simple INSERT with VALUES from a SELECT:
INSERT INTO new_table (field1,field2,...) SELECT * FROM old_table; Really slow and it locks the original table, which I can't have locked for much longer.
#2 IMPORT FROM SQL FILE
Creating the SQL file takes minutes, but ok. Importing the file is really slow.
#3 LOAD DATA
I dump all the data to a comma separate file, which takes about 30 seconds (quite good), and then I load it with LOAD DATA INFILE. But still very slow.
Why not drop unique keys before loading the data?
Yes, it loads faster. However, when I try to add the unique keys I get this:
Lock wait timeout exceeded; try restarting transaction
Let's split the into several files?
I tried that. Each file contained 100,000 rows.
- File 1 = 2.76 sec
- File 2 = 1 min 40 sec
- File 3 = 4 min 16 sec
I gave up on file 3. I'd have to do it 157 more times and on each iteration the processing time would increase. Too much time...
What is 'slow'
When I say it is slow, I mean, with unique keys off, it takes about 10 seconds to load 500,000 rows. No bad. But then I can't add unique keys because of the timeout.
With unique keys previously set, it takes almost 1 minute to insert 100,000 rows. And the throughput decreases the more rows is inserted.
The tables
MyISAM - Original table
+----------------------+-----------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +----------------------+-----------------------+------+-----+---------+-------+ | player_id | int(11) | NO | PRI | NULL | | | date | date | NO | PRI | NULL | | | time | int(2) | NO | PRI | NULL | | | minutes_online | decimal(5,0) | NO | | 0 | | | minutes_playing | decimal(5,0) | NO | | 0 | | | minutes_chatting | decimal(5,0) | NO | | 0 | | | minutes_away | decimal(5,0) | NO | | 0 | | +----------------------+-----------------------+------+-----+---------+-------+ InnoDB - New table
+----------------------+-----------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +----------------------+-----------------------+------+-----+---------+-------+ | player_id | mediumint(8) unsigned | NO | PRI | NULL | | | date | date | NO | PRI | NULL | | | time | tinyint(2) | NO | PRI | NULL | | | minutes_online | tinyint(2) | NO | | 0 | | | minutes_playing | tinyint(2) | NO | | 0 | | | minutes_chatting | tinyint(2) | NO | | 0 | | | minutes_away | tinyint(2) | NO | | 0 | | +----------------------+-----------------------+------+-----+---------+-------+ Am I stuck on this? Meaning, will I have to accept the fact that it is slow and be patient? Or is there any way or tweak to load the data with an acceptable performance?
Solution I split into small CSV files, applied the tweaks that Shlomi Noach suggested and then loaded each file.
Not the perfect. But it loaded a bit faster. Dumping the original table to a CSV file took less than 1 minute, which avoided the table from being locked during the whole process. A script did the job to load each file, which would be very boring to do manually.
In the end I just had to get from the original table the rows inserted/updated in the last hours and update on the new table.