I seemed to have struck a minor chord with my hatred for database triggers. I had very few comments, but a handful of emails and exchanged even more tweets on the subject. But one thing was made clear to me and that is people wanted to hear about why I disdain NULL values.
Fortunately for me I was reminded by Louis Davidson that it is not the feature I dislike, it is the abuse (or misuse) of that feature. This would not be the first time that Louis has reminded me that I was (or about to) make an ass of myself, so I took note of his words very carefully. Indeed, it is not necessarily database triggers that I hate, but rather the fact that they are abused…misused…actually, I hate the fact that they are used and it is so difficult to troubleshoot.
But NULLs, well, they are a different animal. Why do I say that I hate NULLs? I mean, there is nothing to them, right? They do not perform any actions like triggers, they just sit there, in your tables, doing nothing. Some of them are waiting to be discovered and some hope to be updated to have real value. It makes me think of that old Schoolhouse Rock video about that little scrap of paper. So, why so much dislike for them? What harm do they cause?
Why NULLs Can Be Harmful
The harm they cause is usually the result of a person not knowing they are there. It’s that simple, and that is why I hate them. OK, not hate, but I would rather design a database with as few NULL values as possible.
Oh sure, I understand that there are times when you do not know the value of something, and it is easy to simply define that particular column to allow for NULL values and you will come back later to update that value. Sure, sounds great. I always hear about people that plan on coming back later to fix the things they didn’t do right the first time around. I only wish I could say that I actually see that happening as often as I hear it is going to happen. So, you put a column into a table, allow it to have NULL values, and are asking yourself right now “WTF? What’s your problem?”
Again, it is because the end user is not always aware that NULL values exist. And sometimes, the end user is a developer (*ahem*) who may be brought into the middle of a project and be building some widget only to have the widget crap out once a month because their code never accounted for NULL values that get inserted (or updated) after a monthly batch load. Not that I know anyone that has ever experienced such things, but I hear about these people (*ahem*). What’s that? Oh, so my hatred for NULLs is really a hatred for the fact that I am not someone is not good enough at coding or T-SQL or database design to begin with? Maybe, but consider this…
Query Tools
What tool do you use to write a majority of your T-SQL queries? OK, do you know what the default connection properties are? What I am looking for is: Do you know what your ANSI_NULLS is set to? How about this question: Do you even know what ANSI_NULLS setting you want to use? Did you know you even had that option? Do you know the difference in behavior if ANSI_NULLS is set to ON or OFF? I have a handful of tools at my disposal and surprisingly they have different default settings when making connections to an instance. Some of those differences can be with ANSI_NULLS, which makes no sense to me because the whole idea around ANSI is to have a standard so you would think the rocket surgeons in charge would have thought about mandating a default ANSI_NULLS setting whenever someone makes an application that wants to connect to an database server in order to avoid crap like I have had to deal with. Sure, you want to be able to change the option, but the behavior out of the box should be a standard, in my opinion.
Now, how many DBA’s know everything there is to know about ANSI_NULLS, connection strings, the various applications being used to connect to data these days, and each and every database design in their shop? I have no idea how many of you just raised your hand, but I know mine stayed on my keyboard. Now, let’s go one level away from the DBA, and ask: how many customers understand all of this? I supposed that you could only have your users connect to pristine cubes or data marts, right? Good luck with that, because I know most people want to connect to the raw data so that they can go through it themselves. And guess who will get blamed if it turns out that data is “missing” because the end user missed the fact that a NULL value altered their result set?
So, if database designs minimized the use of NULLs, then we can minimize the amount of issues faced downstream. Ever since I took part in some Six Sigma Black Belt training I have come to understand that what I really hate are inefficient processes. The use of and troubleshooting triggers is inefficient. And the same case can be made for NULL values. Their use can lead to a lot of bad result sets being accepted as a “version of the truth”, which is something we will ultimately be on the hook for at some point. Why not save people some time by looking to minimize their use? That would seem to be more efficient for everyone.
So, maybe I don’t hate NULLs. I just hate how so many groups are all operating under the assumption that everyone knows what they are, why they are there, and how best to use them. Seems to me that with all of those assumptions chances are lots of bad business decisions are being made every day.
Wait a minute…maybe we could blame the economic downturn on the use of NULL values? Hey, maybe I found a reason to love the little buggers after all.
Yeah, I’m thread-jacking. I don’t like the way NULLs typically areused either. Nothing more to be said for that. Now for the jack (and I’m asking in utter seriousness): Did you find the Six Sigma training worth the time invested?
Yes, the Six Sigma training was worth the time invested, in my opinion. I honestly believe that everyone should attend the training. Of course, I also believe that everyone should serve one year in the military, so perhaps I am not the best person to ask.
Not all the material and subject matter is ground-breaking stuff. In fact, in a lot of cases, the material was really some common sense stuff. However, we all know that common sense is not all that common. I think a lot of people get scared by the statistics involved in the training, and some criticize Six Sigma as (1) being focused only on saving money, (2) being the hatchet when jobs need to be cut, or (3) reporting cost savings that are not always accurate.
In the end, I enjoy making things more efficient. Like an economist of sorts.
i think the distinction must be drawn between a developer not knowing the expected values of something at design time, to a value not being known in production.
If something is not known at design time, then it is up to the design team to define it, find out or leave it out. i agree, dont just go with null because we dont know what the ‘6 status flags of nong’ will be.
however, somethings are just not known in production and this is where NULL comes into being, for example someones “Loan Finalisation Date”, no other value than null can be brought into being for this, unless of course you have a table for finalised loans and only bring a row into existence when the loan is finalised, of course assuming that the finalisation date is known when the finalisation process is started, etc,etc,etc
just a thought 🙂
could you use a default date instead of NULL? why not default to ‘1/1/1900’? i realize that no solution is perfect, but i would rather insert something than nothing.
the issue id have with a default date is that many people (well, me hehehe) would use the existence of a date in particular to mean something.
is this right or wrong, not sure but if we start putting something in there that has a second meaning, ie 1/1/1900 = not closed then the field is no longer atomic.
further on this, who knows what calculations are then going to be used on that field, maybe finalisedDate – startDate to get average durations??
im not saying that nulls are always appropriate, i guess with any statement there is always going to be an edge case that goes against it, as with most recommended practices, if you know you are going against it and can justify the position than who is to say its the wrong one.
thanks john, you raise some valid points.
“they just sit there, in your tables, doing nothing” they do something, for every value that is put in they make the value take more space than necessary. Since null is a value that is outside the range of say an int field the actual field will have to be larger than an int to accommodate null OR int. The only time where they don’t take up much space is when most of the values are null. For every value that is entered they have to take more space, at least a “bit” more (pun) as seems logically necessary to me.
You could use 0 for the finalization date for loans that are not finalized….
I was just thinking about that this morning…what is the overhead on storage and performance for tables built to hold NULL values and as those values are updated? My head started hurting so I had some bacon and felt much better.
I personally believe that the biggest problem with NULLs in a database is that application languages like those in .NET don’t handle NULL very well. In T-SQL, NULL is a part of every datatype, yet the application developers where I work say they have to do extra work passing NULLs into the database and getting them back out.
Using magic numbers or defaults is not a good idea either. How does the user know the difference between a value listed as 0 because the actual number is supposed to be 0 vs. a value that is 0 but is really unknown? NULLs have a place and meaning.
Yes, they do have a place andmeaning. You happen to mention one area of frustration I have seen quite frequently, and that is how the VS tools will create tables with default datatype settings, and on top of that they can have difficulties working with NULL values. Why can’t we get someone at ANSI to step up and define some standards when it comes to this stuff?
Below link provide good information about NULL in SQL and how to handle the same in SQL queries
http://www.a2zmenu.com/MySql/Working-with-NULL-Values-in-SQL.aspx