77

I'm trying to find the rows that are in one table but not another, both tables are in different databases and also have different column names on the column that I'm using to match.

I've got a query, code below, and I think it probably works but it's way too slow:

SELECT `pm`.`id` FROM `R2R`.`partmaster` `pm` WHERE NOT EXISTS ( SELECT * FROM `wpsapi4`.`product_details` `pd` WHERE `pm`.`id` = `pd`.`part_num` ) 

So the query is trying to do as follows:

Select all the ids from the R2R.partmaster database that are not in the wpsapi4.product_details database. The columns I'm matching are partmaster.id & product_details.part_num

1
  • For me exists / not exists is the best way since express clearly want you want to get. But seems to be the slowest way (on MySQL). Check this: explainextended.com/2009/09/18/… Commented Sep 29, 2011 at 12:51

5 Answers 5

138

Expanding on Sjoerd's anti-join, you can also use the easy to understand SELECT WHERE X NOT IN (SELECT) pattern.

SELECT pm.id FROM r2r.partmaster pm WHERE pm.id NOT IN (SELECT pd.part_num FROM wpsapi4.product_details pd) 

Note that you only need to use ` backticks on reserved words, names with spaces and such, not with normal column names.

On MySQL 5+ this kind of query runs pretty fast.
On MySQL 3/4 it's slow.

Make sure you have indexes on the fields in question
You need to have an index on pm.id, pd.part_num.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for your answer, it didn't work quickly enough but the query is good, just not on my DB. I have worked out another soloution but stackOverflow won't let me post until 3 hours time.
Just a heads up that a similar query like above took about ~10 minutes on MySQL 5.1.73a. When I upgraded to MySQL 5.6.22 it took about 456ms.
See @colmaclean's answer for null values
Only thing I would add is " WHERE pd.part_num IS NOT NULL" to the inner select query.
77

You can LEFT JOIN the two tables. If there is no corresponding row in the second table, the values will be NULL.

SELECT id FROM partmaster LEFT JOIN product_details ON (...) WHERE product_details.part_num IS NULL 

Comments

8

To expand on Johan's answer, if the part_num column in the sub-select can contain null values then the query will break.

To correct this, add a null check...

SELECT pm.id FROM r2r.partmaster pm WHERE pm.id NOT IN (SELECT pd.part_num FROM wpsapi4.product_details pd where pd.part_num is not null) 
  • Sorry but I couldn't add a comment as I don't have the rep!

1 Comment

not in sure can be nasty
4

So there's loads of posts on the web that show how to do this, I've found 3 ways, same as pointed out by Johan & Sjoerd. I couldn't get any of these queries to work, well obviously they work fine it's my database that's not working correctly and those queries all ran slow.

So I worked out another way that someone else may find useful:

The basic jist of it is to create a temporary table and fill it with all the information, then remove all the rows that ARE in the other table.

So I did these 3 queries, and it ran quickly (in a couple moments).

CREATE TEMPORARY TABLE `database1`.`newRows` SELECT `t1`.`id` AS `columnID` FROM `database2`.`table` AS `t1` 

.

CREATE INDEX `columnID` ON `database1`.`newRows`(`columnID`) 

.

DELETE FROM `database1`.`newRows` WHERE EXISTS( SELECT `columnID` FROM `database1`.`product_details` WHERE `columnID`=`database1`.`newRows`.`columnID` ) 

1 Comment

It runs fast because it is only comparing the first product, or I am not seeing this right?!
0

The simple workaround that worked for me is as below:

SELECT first_table.* FROM first_table LEFT JOIN second_table ON second_table.common_column = first_table.common_column WHERE second_table.common_column IS NULL; 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.