Skip to main content
3 of 4
added 574 characters in body
rlcabral
  • 357
  • 1
  • 3
  • 10

Fastest way to copy data from MyISAM to InnoDB

I want to copy all data from a MyISAM table with 16 millions rows to a InnoDB table, that will have a different (optimized) schema. See tables below to see the changes.

However, every approach I try leads to a slow result.

#1 INSERT

Using a simple INSERT with VALUES from a SELECT:

INSERT INTO new_table (field1,field2,...) SELECT * FROM old_table; 

Really slow and it locks the original table, which I can't have locked for much longer.

#2 IMPORT FROM SQL FILE

Creating the SQL file takes minutes, but ok. Importing the file is really slow.

#3 LOAD DATA

I dump all the data to a comma separate file, which takes about 30 seconds (quite good), and then I load it with LOAD DATA INFILE. But still very slow.

Why not drop unique keys before loading the data?

Yes, it loads faster. However, when I try to add the unique keys I get this:

Lock wait timeout exceeded; try restarting transaction

Let's split the into several files?

I tried that. Each file contained 100,000 rows.

  • File 1 = 2.76 sec
  • File 2 = 1 min 40 sec
  • File 3 = 4 min 16 sec

I gave up on file 3. I'd have to do it 157 more times and on each iteration the processing time would increase. Too much time...

What is 'slow'

When I say it is slow, I mean, with unique keys off, it takes about 10 seconds to load 500,000 rows. No bad. But then I can't add unique keys because of the timeout.

With unique keys previously set, it takes almost 1 minute to insert 100,000 rows. And the throughput decreases the more rows is inserted.

The tables

MyISAM - Original table

+----------------------+-----------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +----------------------+-----------------------+------+-----+---------+-------+ | player_id | int(11) | NO | PRI | NULL | | | date | date | NO | PRI | NULL | | | time | int(2) | NO | PRI | NULL | | | minutes_online | decimal(5,0) | NO | | 0 | | | minutes_playing | decimal(5,0) | NO | | 0 | | | minutes_chatting | decimal(5,0) | NO | | 0 | | | minutes_away | decimal(5,0) | NO | | 0 | | +----------------------+-----------------------+------+-----+---------+-------+ 

InnoDB - New table

+----------------------+-----------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +----------------------+-----------------------+------+-----+---------+-------+ | player_id | mediumint(8) unsigned | NO | PRI | NULL | | | date | date | NO | PRI | NULL | | | time | tinyint(2) | NO | PRI | NULL | | | minutes_online | tinyint(2) | NO | | 0 | | | minutes_playing | tinyint(2) | NO | | 0 | | | minutes_chatting | tinyint(2) | NO | | 0 | | | minutes_away | tinyint(2) | NO | | 0 | | +----------------------+-----------------------+------+-----+---------+-------+ 

Am I stuck on this? Meaning, will I have to accept the fact that it is slow and be patient? Or is there any way or tweak to load the data with an acceptable performance?

Solution I split into small CSV files, applied the tweaks that Shlomi Noach suggested and then loaded each file.

Not the perfect. But it loaded a bit faster. Dumping the original table to a CSV file took less than 1 minute, which avoided the table from being locked during the whole process. A script did the job to load each file, which would be very boring to do manually.

In the end I just had to get from the original table the rows inserted/updated in the last hours and update on the new table.

rlcabral
  • 357
  • 1
  • 3
  • 10