340

While reading some SQL Tuning-related documentation, I found this:

SELECT COUNT(*) :

  • Counts the number of rows.
  • Often is improperly used to verify the existence of a record.

Is SELECT COUNT(*) really that bad?

What's the proper way to verify the existence of a record?

9 Answers 9

399

It's better to use either of the following:

-- Method 1. SELECT 1 FROM table_name WHERE unique_key = value; -- Method 2. SELECT COUNT(1) FROM table_name WHERE unique_key = value; 

The first alternative should give you no result or one result, the second count should be zero or one.

How old is the documentation you're using? Although you've read good advice, most query optimizers in recent RDBMS's optimize SELECT COUNT(*) anyway, so while there is a difference in theory (and older databases), you shouldn't notice any difference in practice.

Sign up to request clarification or add additional context in comments.

5 Comments

I will clarify that I intended "unique key" with the "key = value" clause but other than that I'm still behind my answer.
OK. With that premise indeed the query would return just one or zero record. BUT: The question does not limit to a unique column. Also: The 2nd query count(1) is equivalent to count(*) from a practical POV.
The question says "what's the proper way to verify the existence of A record". I interpreted that as singular, as in: 1 record. The difference between count(*) and count(1) is already covered by my answer. I prefer count(1) because it does not rely on a specific RDBMS implementation.
If unique_key is indexed, is it guaranteed that the statement will run strictly under linear time?
@McSinyx in theory that is correct, although it totally depends on the specific RDBMS implementation that you are running this on.
261

I would prefer not use Count function at all:

IF [NOT] EXISTS ( SELECT 1 FROM MyTable WHERE ... ) <do smth> 

For example if you want to check if user exists before inserting it into the database the query can look like this:

IF NOT EXISTS ( SELECT 1 FROM Users WHERE FirstName = 'John' AND LastName = 'Smith' ) BEGIN INSERT INTO Users (FirstName, LastName) VALUES ('John', 'Smith') END 

3 Comments

Generaly we use it (the verify) when want do something, then your answer is more complete.
Good to mention that by using T-SQL
I would expect the query planner to do a better job optimizing EXISTS than the other methods; it should know it doesn't have to scan the whole table.
39

You can use:

SELECT 1 FROM MyTable WHERE <MyCondition> 

If there is no record matching the condition, the resulted recordset is empty.

4 Comments

Did you mean TOP 1? -> (SELECT TOP 1 FROM MyTable WHERE <MyCondition>)
No, I meant exactly "1"
to enable the query optimizer to even knwo that you won't read/need the remaining datasets, you should state SELECT TOP 1 1 FROM... WHERE... (or use the appropriate query hints for your RDBS)
The Exists operator itself tries to retrieve just the absolute minimum of information, so the addition of TOP 1 does nothing except add 5 characters to the query size. - sqlservercentral.com/blogs/sqlinthewild/2011/04/05/…
27

You can use:

SELECT 1 FROM MyTable WHERE... LIMIT 1 

Use select 1 to prevent the checking of unnecessary fields.

Use LIMIT 1 to prevent the checking of unnecessary rows.

2 Comments

Good point but Limit works on MySQL and PostgreSQL, top works on SQL Server, you should note it on your answer
unnecessary with respected to ?
24
SELECT COUNT(1) FROM MyTable WHERE ... 

will loop thru all the records. This is the reason it is bad to use for record existence.

I would use

SELECT TOP 1 * FROM MyTable WHERE ... 

After finding 1 record, it will terminate the loop.

4 Comments

In case of SELECT TOP 1 will it actually terminate after finding one or does it continue to find all to be able to say which one is TOP?
PS: To be sure I always IF EXISTS (SELECT TOP 1 1 FROM ... WHERE ..)
the Star operator will force the DBMS to access the clustered index instead of just the index(es) that will be needed for your join condition. so it's better to use a constant valua as result, i.e. select top 1 1 .... That will return 1 or DB-Null, depending on the condition is a match or not.
it's nice. I like the first one.
17

The other answers are quite good, but it would also be useful to add LIMIT 1 (or the equivalent, to prevent the checking of unnecessary rows.

4 Comments

If any "check for existence" query returns more than one row, I think it is more useful to double check your WHERE clause instead of LIMIT-ing the number of results.
I think Limit is used in Oracle and not in SQL Server
I'm considering the case where they can legitimately be multiple rows -- where the question is: "Is there (one or more) rows that satisfy this condition?" In that case, you don't want to look at all of them, just one.
@Shantanu -- I know, that's why I linked to the (very through) en.wikipedia article explaining the other forms.
14

You can use:

SELECT COUNT(1) FROM MyTable WHERE ... 

or

WHERE [NOT] EXISTS ( SELECT 1 FROM MyTable WHERE ... ) 

This will be more efficient than SELECT * since you're simply selecting the value 1 for each row, rather than all the fields.

There's also a subtle difference between COUNT(*) and COUNT(column name):

  • COUNT(*) will count all rows, including nulls
  • COUNT(column name) will only count non null occurrences of column name

6 Comments

You're making the mistaken assumption that a DBMS will somehow check all those columns. The performance difference between count(1) and count(*) will be different only in the most brain-dead DBMS.
No, I'm saying that you are actually relying on implementation details when you state it'll be more efficient. If you really want to ensure you get the best performance, you should profile it for the specific implementation using representative data, or just forget about it totally. Anything else is potentially misleading, and could change drastically when moving (for example) from DB2 to MySQL.
I want to make it clear I'm not dissing your answer. It is useful. The only bit I take issue with is the efficiency claim since we've done evaluations in DB2/z and found there's no real difference between count(*) and count(1). Whether that's the case for other DBMS', I can't say.
"Anything else is potentially misleading, and could change drastically when moving (for example) from DB2 to MySQL" You're much more likely to get bitten by performance degradation of SELECT COUNT(*) when moving DBMS than an implementation difference in SELECT 1 or COUNT(1). I'm a firm believer in writing the code which most clearly expresses exactly what you want to achieve, rather than relying on optimizers or compilers to default to your desired behaviour.
Misleading statement "COUNT(*)" means 'count the rows' full stop. It does not require access to any particular column. And in most cases will not even require access to the row itself as a count any unique index is sufficient.
|
3

Other option:

SELECT CASE WHEN EXISTS ( SELECT 1 FROM [MyTable] AS [MyRecord]) THEN CAST(1 AS BIT) ELSE CAST(0 AS BIT) END 

2 Comments

what is the purpose of CAST(1 AS BIT) ? Why can't I just write THEN 1 ELSE 0 ?
You defiantly can return 1 or 0. It's all about what type of result you want in the end. I wanted to return boolean instead of numeric value.
3

I'm using this way:

IF (EXISTS (SELECT TOP 1 FROM Users WHERE FirstName = 'John'), 1, 0) AS DoesJohnExist 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.