0
SELECT COUNT(organization.ID) FROM organization WHERE organization.NAME IN ( SELECT organization.NAME FROM organization WHERE organization.NAME <> '' AND organization.APPROVED = 0 AND organization.CREATED_AT > '2012-07-31 04:31:08' GROUP BY organization.NAME HAVING COUNT(organization.ID) > 1 ) 

This query finds duplicates, the problem is that it takes 6 seconds for the page to load because of the inner statement. Is there a way to make it run faster? MySQL database version 5.1.

4
  • Isn't the inner statement useless? SELECT COUNT(organization.ID) FROM organization WHERE organization.NAME <> '' AND organization.APPROVED =0 AND organization.CREATED_AT > '2012-07-31 04:31:08' GROUP BY organization.NAME HAVING COUNT( organization.ID ) >1) Commented Aug 31, 2012 at 20:47
  • 2
    No. It will return a other result. Commented Aug 31, 2012 at 20:51
  • No , mine for instance returns 67 duplicates , your query breaks it down to 55,10,2 which adds up to 67 Commented Aug 31, 2012 at 20:53
  • @SativaNL: the OP query is getting a count of all organizations that have a duplicate name, but ONLY for those organization names that have two (or more rows) with the specified predicates on APPROVED and CREATED_AT. The OP query will include additional rows in the total count. Commented Aug 31, 2012 at 21:38

4 Answers 4

1

Yes. This is slow because MySQL is slow in processing "in" queries. You can fix it by using this instead:

SELECT COUNT(organization.ID) FROM organization o WHERE exists ( SELECT organization.NAME FROM organization o2 WHERE organization.NAME <> '' AND organization.APPROVED = 0 AND organization.CREATED_AT > '2012-07-31 04:31:08' and organization.name = o.organization.name GROUP BY organization.NAME HAVING COUNT(organization.ID) > 1 ) 
Sign up to request clarification or add additional context in comments.

Comments

0

Try to avoid IN.

SELECT COUNT(organization.ID) FROM organization INNER JOIN ( SELECT organization.NAME FROM organization WHERE organization.NAME <> '' AND organization.APPROVED = 0 AND organization.CREATED_AT > '2012-07-31 04:31:08' GROUP BY organization.NAME HAVING COUNT(organization.ID) > 1 ) AS t ON organization.NAME = t.Name 

1 Comment

This one is pretty fast, will test it later on again thanks :)
0

I also find making indexes for the db fields included vastly improves speed in complex queries.

1 Comment

I think he has already indexes. The problem is the IN it will execute the statement for each row.
0

If what you want to return is a total "count" of all duplicates, but only for those organizations NAMES that have two or more rows with the specified predicates on APPROVED and CREATED_AT, then you could get by with an alternate statement to return an equivalent result:

SELECT SUM(c.cnt) FROM ( SELECT COUNT(organization.ID) AS cnt FROM organization o WHERE o.NAME <> '' GROUP BY o.NAME HAVING SUM(o.APPROVED = 0 AND o.CREATED_AT > '2012-07-31 04:31:08') > 1 ) c 

MySQL can make use of a suitable covering index to satisfy this query, otherwise, this is likely a full scan on the organization table. But it avoids referencing the organization table twice, and avoids a JOIN operation.

One suitable covering index for this query would be:

ON organization (NAME, CREATED_AT, APPROVED, ID) 

Note that if the ID column is guaranteed to be non-NULL (either a NOT NULL constraint or its the PRIMARY KEY of the table, you can avoid referencing that column, and you can leave that column out of the index definition.)

SELECT SUM(c.cnt) FROM ( SELECT SUM(1) AS cnt FROM organization o WHERE o.NAME <> '' GROUP BY o.NAME HAVING SUM(o.APPROVED = 0 AND o.CREATED_AT > '2012-07-31 04:31:08') > 1 ) c 

The EXPLAIN output shows this query using the index to satisfy the query without referencing any data blocks from the table:

id select_type table type possible_keys key key_len ref rows Extra -- ----------- ---------- ------ --------------- --------------- ------- ------ ------ -------------------------- 1 PRIMARY <derived2> ALL (NULL) (NULL) (NULL) (NULL) 2 2 DERIVED o index organization_ix organization_ix 44 (NULL) 29 Using where; Using index 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.