Why does the comparison of value to null return false, except when using a NOT IN, where it returns true?
Given a query to find all stackoverflow users who have a post:
SELECT * FROM Users WHERE UserID IN (SELECT UserID FROM Posts) This works as expected; i get a list of all users who have a post.
Now query for the inverse; find all stackoverflow users who don't have a post:
SELECT * FROM Users WHERE UserID NOT IN (SELECT UserID FROM Posts) This returns no records, which is incorrect.
Given hypothetical data1
Users Posts ================ =============================== UserID Username PostID UserID Subject ------ -------- ------- ------ ---------------- 1 atkins 1 1 Welcome to stack ov... 2 joels 2 2 Welcome all! ... ... ... ... 399573 gt6989b ... ... ... ... ... ... 10592 null (deleted by nsl&fbi... ... ... And assume the rules of NULLs:
NULL = NULLevaluates to unknownNULL <> NULLevaluates to unknownvalue = NULLevaluates unknown
If we look at the 2nd query, we're interested in finding all rows where the Users.UserID is not found in the Posts.UserID column. i would proceed logically as follows:
Check UserID 1
1 = 1returns true. So we conclude that this user has some posts, and do not include them in the output list
Now check UserID 2:
2 = 1returns false, so we keep looking2 = 2returns true, so we conclude that this user has some posts, and do not include them in the output list
Now check UserID 399573
399573 = 1returns false, so we keep looking399573 = 2returns false, so we keep looking- ...
399573 = nullreturns unknown, so we keep looking- ...
We found no posts by UserID 399573, so we would include him in the output list.
Except SQL Server doesn't do this. If you have a NULL in your in list, then suddenly it finds a match. It suddenly finds a match. Suddenly 399573 = null evaluates to true.
Why does the comparison of value to null return unknown, except when it returns true?
Edit: i know that i can workaround this nonsensical behavior by specifically excluding the nulls:
SELECT * FROM Users WHERE UserID NOT IN ( SELECT UserID FROM Posts WHERE UserID IS NOT NULL) But i shouldn't have to, as far as i can tell the boolean logic should be fine without it - hence my question.
Footnotes
- 1 hypothetical data; if you don't like it: make up your down.
- celko now has his own tag
INusesorandNOT INusesAND. When you are evaluating against aNULLwith an inequality vs. a known value, you will always get a false since there is no way to know if it matches or not.INis nothing more than a convenient shorthand for a series ofORclauses. Technically, the correct way to think of the expansion given in the other question is:select 'true' where NOT(3 = 1 or 3 = 2 or 3 = null), which is logically equivalent by DeMorgan's Law. In any case, the fallacy is assuming that a comparison of a value=NULL returns FALSE when in fact the result is UNKNOWN.SISTERthat refers back to your main table. If you don't have a sister you wouldn't be included in the result set. Also,NULLmeans unknown. If you have 2 people whose names you don't know, can you say one of their names isn't Kirsten?