I am trying to insert a lot of users into a MySQL database with two tables:
The first table contains the user data. An example INSERT looks like this (id is the primary key, mail is a unique key):
INSERT INTO users (id, mail, name) VALUES (NULL, "[email protected]", "John Smith") ON DUPLICATE KEY UPDATE name = VALUE(name) The second table contains the group the user belongs to. It only stores two foreign keys users_id and groups_id. An example query looks like this:
INSERT INTO users_groups (users_id, groups_id) VALUES (LAST_INSERT_ID(), 1) This setup works perfectly fine for small data sets. When I import large amounts of data (>1M rows) the INSERTs get slow. Obviously, it would be much better to do a batch insert:
INSERT INTO users (id, mail, name) VALUES (NULL, "[email protected]", "John Smith"), (NULL, "[email protected]", "Anna Smith") ON DUPLICATE KEY UPDATE name = VALUE(name) and:
INSERT INTO users_groups (users_id, groups_id) VALUES (LAST_INSERT_ID(), 1), (LAST_INSERT_ID(), 4) The problem of course is, that LAST_INSERT_ID() only returns one (the first) id of a batch INSERT.
So, what I would need is a "nested" batch INSERT, which IMO does not exist in MySQL.
What can I do to make my INSERTs faster?
insert selectstatement to fill up the group table. In the select statement, you would then use a CASE WHEN expression, assigning the group depending on the email. But if harvey's solution is confirmed then it's better.