4

I read a 45-tips-database-performance-tips-for-developers document from a famous commercial vendor for SQL tools today and there was one tip that confuse me:

If possible, avoid NULL values in your database. If not, use the appropriate IS NULL and IS NOT NULL code.

I like having NULL values because to me it is a difference if a value was never set or it 0 or string empty. So databases have this for a porpuse.

So is this tip nonsense or should I take action to prevent having NULL values at all in my database tables? Does it effect performance a lot have a NULL value instead of a filled number or string value?

4
  • 2
    Use NULL whenever appropriate. Never ever consider an 'alternative' to NULLs. My 2c. Commented Feb 14, 2014 at 11:11
  • 2
    One guess would be that the advice started off along the lines of "mark all columns as NOT NULL unless you've really thought hard about it and want to accept NULLs in that column" Commented Feb 14, 2014 at 11:26
  • 1
    it also speaks to normalization.. the lack of a value could be pretty directly tied to the lack of a row. Commented Feb 14, 2014 at 15:35
  • Perhaps this should be merged with stackoverflow.com/questions/4336687/… ? Commented Jun 11, 2015 at 23:34

5 Answers 5

9

Besides the reasons mentioned in other answers, we can look at NULLs from a different angle.

Regarding duplicate rows, Codd said

If something is true, saying it twice doesn’t make it any more true.

Similarly, you can say

If something is not known, saying it is unknown doesn't make it known.

Databases are used to record facts. The facts (truths) serve as axioms from which we can deduce other facts.
From this perspective, unknown things should not be recorded - they are not useful facts.
Anyway, anything that is not recorded is unknown. So why bother recording them?
Let alone their existence makes the deduction complicated.

Sign up to request clarification or add additional context in comments.

Comments

5

The NULL question is not simple... Every professional has a personal opinion about it.

Relational theory Two-Valued Logic (2VL: TRUE and FALSE) rejects NULL, and Chris Date is one of the most enemies of NULLs. But Ted Codd, instead, accepted Three-Valued Logic too (TRUE, FALSE and UNKNOWN).

Just a few things to note for Oracle:

  1. Single column B*Tree Indexes don't contain NULL entries. So the Optimizer can't use an Index if you code "WHERE XXX IS NULL".

  2. Oracle considers a NULL the same as an empty string, so:

    WHERE SOME_FIELD = NULL 

    is the same as:

    WHERE SOME_FIELD = '' 

Moreover, with NULLs you must pay attention in your queries, because every compare with NULL returns NULL. And, sometimes, NULLs are insidious. Think for a moment to a WHERE condition like the following:

WHERE SOME_FIELD NOT IN (SELECT C FROM SOME_TABLE) 

If the subquery returns one or more NULLs, you get the empty recordset!

These are the very first few cases that I want to talk about. But we can speak about NULLs for a lot of time...

2 Comments

Codd did not accept 3VL, as the detractors say he did. By the evidence, (a) he accepted that UNKNOWN may have to be recorded, and (b) he gave a method, which is not 3VL, it is 2VL. Codd's Relational Theory rejects NULL in the data, but NULL can appear in ResultSets, so there is no getting around the fact that code must handle NULL, regardless of whether it is in the data or not.
Codd certainly did propose 3VLs, with UNKNOWN. Standard SQL allows certain use of UNKNOWN & allows NULL in its place.
3

It's usually good practice to avoid or minimise the use of nulls. Nulls cause some queries to return results that are "incorrect" (i.e. the results won't correspond with the intended meaning of the database). Unfortunately SQL and SQL-style databases can make nulls difficult, though not necessarily impossible, to avoid. It's a very real problem and even experts often have trouble spotting flaws in query logic caused by nulls.

Since there is nothing like nulls in the real world, using them means making some compromises in the way your database represents reality. In fact there is no single consistent "meaning" of nulls and little general agreement on what they are for. In practice, nulls get used to represent all sorts of different situations. If you do use them it's a good idea to document exactly what a null means for any given attribute.

Here's an excellent lecture about the "null problem" by Chris Date:

http://www.youtube.com/watch?v=kU-MXf2TsPE

Comments

2

There are various downsides to NULLs that can make using them more difficult than actual values. for example:

  1. In some cases they are not indexed.
  2. They make join syntax more difficult.
  3. They need special treatment for comparisons.

For string columns it might be appropriate to use "N/A", or "N/K" as a special value that helps distinguish between different classes of what could otherwise be NULL, but that's tricky to do for numerics or dates -- special values are generally tricky to use, and it may be better to add an extra column (eg. for date_of_birth you might have a column that specifies "reason_for_no_date_of_birth", which can help the application be more useful.

For many cases where data values are genuinely unknown or not relevant they can be entirely appropriate of course -- date_of_death is a good example, or date_of_account_termination.

Sometimes even these examples can be rendered irrelevant by normalising events out to a different table, so you have a table for "ACCOUNT_DATES" with DATE_TYPES of "Open", "Close", etc.

Comments

1

I think using NULL values in the database is feasible until your application has a proper logic to handle it, but according to this post there may be some problems as discussed here http://databases.aspfaq.com/general/why-should-i-avoid-nulls-in-my-database.html

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.