The other day I was sitting in my home office, contemplating what it would be like if I had a zebra and enjoying some music while consuming my lunch. I was also reviewing some T-SQL for a client and was going down a mental checklist of the bad T-SQL habits I was seeing. And that’s when it hit me: could I find a song lyric in my playlist for the most common T-SQL code sins I frequently come across?
The answer was “yes”. The next logical question was “should I blog this”? I decided to answer that with a “yes” as well.
You’re welcome.
“I used to do a little but a little wouldn’t do so the little got more and more…” – Mr. Brownstone, Guns N’ Roses
T-SQL CODE SIN: Adding extra (too many) indexes to your tables in order to boost performance. Unfortunately this is an all-too-common performance tuning practice that is made all-too-easy by tools such as the Database Engine Tuning Advisor where a DBA or developer will simply deploy the default recommendations without understanding the implications.
WHY THIS IS BAD: Each extra index is additional overhead for the DUI (Delete, Update, Insert) statements on that table, as well as additional overhead when you need to perform some maintenance on your indexes. Also, each index requires disk space, so you need to be mindful of that as well. If you want a nice summary of the dangers of adding extra indexes check out Kimberly Tripp’s blog, starting with this post.
WHAT CAN YOU DO: Well, you can start by doing nothing. After all, if you don’t create additional indexes then you don’t have to worry about having too many, right? Of curse some indexes are desired. So, how do you know if you have too many? Simple, run the script found in this blog post and let that be a starting point for you to examine your index structure.
“Meeting you with a view to a kill…” – A View To a Kill, Duran Duran
T-SQL CODE SIN: Building a view on top of a view (and using functions, going across linked servers, etc.) I see this quite a bit as well, simply because it is so easy to code. But ‘easy to code’ doesn’t mean ‘the data is located one rack down’, or that ‘the optimizer knows exactly what to do with that plan’.
WHY THIS IS BAD: Views are simply a layer of abstraction, built on top of a table in order to return only the data you wanted someone to see. But tables themselves are also an abstraction, a logical way of displaying the data that is contained as bits inside of a file on a server. The real difference here is that when you nest views you force the optimizer to do extra work to generate a plan. If you nest enough of them then you will find you are limiting the ability for the optimizer to choose the best plan.
As for the use of linked servers, you need to be aware of how the SQLOLEDB provider returns rows from the remote server. If you have a linked server buried deep inside some nested views you are just asking for performance problems.
WHAT YOU CAN DO: Simple: don’t build nested views, and don’t use linked servers in your view definitions. It would be best to avoid this, but if it cannot be avoided then you need to be aware that performance will simply not be optimal.
“The best of us can find happiness in misery.” – I Don’t Care, Fallout Boy
T-SQL CODE SIN: No testing of anything, anywhere, anytime.
WHY THIS IS BAD: If I have to tell you why this is a bad thing then you are reading the wrong blog. Move along, nothing to see here.
WHAT YOU CAN DO: Test. I don’t care (ha! GET IT?) how big or small your change may be, run through the same standard sets of testing. Just because you may need to slap something into production is no reason to deploy untested code. It was fairly common for me to be handed scripts that would not even parse. Think about that…if I cannot run your script in SSMS, what is the likelihood that you ran it yourself?
“Making all his nowhere plans for nobody.” – Nowhere Man, The Beatles
T-SQL CODE SIN: Having no WHERE clause in your query.
WHY THIS IS BAD: In most cases the reason this is bad is because you probably didn’t need to return each and every row from a table, and that WHERE clause would allow for you to filter and return only the rows you needed, thus speeding up the query. I see this a lot from users that connect to a database using MS Access and decide to pull back all of the rows from a few different tables. After they wait for an hour or so for a few billion rows they call the DBA to complain about performance.
WHAT YOU CAN DO: Know what you are doing! Only return the rows that are necessary, and not more.
“…here it goes again.” – Here It Goes Again, OK GO
T-SQL CODE SIN: Duplicate columns (also known as bad database design, but that can be rather broad, so let’s focus on one thing).
WHY THIS IS BAD: This is all about efficiency and normalization. Duplicate columns lead to redundant data, which lead to bloated databases, larger than necessary indexes, wasted space, larger than necessary backups, extra I/O, etc. While you MAY be able to demonstrate a use case for de-normalization every now and then, you should not be in the habit of just shoving everything into one table in order to “make your queries faster”.
Another thing to consider is that the additional columns increase the size of the row on the page, which means fewer rows would fit onto a page, which means more pages are needed for your result sets. This will increase your query response time.
WHAT YOU CAN DO: Learn about normalization and understand why it is a good thing to have in your life. If you see a table with columns named col1, col2, col3, and col4 then start asking questions about the database design.
“One way, or another, I’m gonna see ya…” – One Way Or Another, Blondie
T-SQL CODE SIN: Misusing the ORDER BY clause.
WHY THIS IS BAD: If you really need to order your result set (and you very well could need this), then you MOST LIKELY only need the very last result set to be ordered. I mean, would it make sense to order each and every result set, when only the last one is the one being presented to your end user? No, of course not, and yet I always seem to find places where someone has issued an ORDER BY on each and every statement. And each one of these takes extra time.
I should mention that if your database has a proper design that includes primary keys and clustered indexes then the additional overhead of an ORDER BY may not be all the significant. What I am talking about here are those databases that are nothing more than glorified Excel spreadsheets that have no clustered indexes, no primary keys, no foreign keys, and each ORDER BY simply consumes resources unnecessarily.
WHAT YOU CAN DO: For databases with poor design only use ORDER BY on the last set of results that are going to be presented to the end user.
“No one knows” – No One Knows, Queens of the Stone Age
T-SQL CODE SIN: Having no documentation.
WHY THIS IS BAD: Because the people that take over your code or project will have no idea what is going on. Believe it or not, people really do want to know what is going on, especially the people that need to pick up the pieces after you have moved on.
WHAT YOU CAN DO: Write something down! Hey, if you don’t want to write something down then here’s an idea…COMMENT YOUR CODE! Even if that means you say “magic happens here” at least we have SOME idea as to what your thought process was when you were writing the code.
“I need to know, I need to know…” – I Need To Know , Tom Petty
T-SQL CODE SIN: Using non-sargable filters.
WHY THIS IS BAD: When you write your queries you want to do so in a way such that the optimizer can quickly and easily find the most optimal plan. (OK, you want to help it find a plan that is “good enough”.) Your use of filters allows for this to happen. That means you want to use filters that allow for the optimizer to take advantage of existing indexes.
For more examples check out this blog post by Rob Farley.
WHAT YOU CAN DO: Know to avoid the use of functions on columns in your WHERE or JOIN clause. Many functions will prohibit the optimizer from taking advantage of indexes, will force table scans, and will slow your queries down considerably.
“Well we scheme and we scheme and we always blow it…” – Dashboard, Modest Mouse
T-SQL CODE SIN: Selecting the wrong datatypes for your data.
WHY THIS IS BAD: Because you are wasting resources, that’s why. For example, a datetime datatype is 8 bytes long, whereas a smalldatetime datatype is only 4 bytes long. If you don’t require the extra precision from using datetime then don’t waste the space. I once calculated a savings of 100GB from a database by simply changing to smalldatetime on three columns on one table. That’s 100GB of storage, overhead for indexes, space for backups, length of time for backups, etc. It all adds up to waste which ends up costing money.
Since I first learned about the extra costs associated with bad datatypes from this article by Greg Larsen, I think I’ll just point you there as well. And don’t forget the longer rows and extra pages.
WHAT YOU CAN DO: This will not be easy, but you need to get in front of database design as much as possible. You’ll be seen as a roadblock, no doubt. So it would be a good idea for you to get some “wins” under your belt before you start pestering people to be let into their secret design meetings. Go find one of the larger databases you have, and go look at the largest tables. Examine the datatypes and see if you can make any improvements. For example, if you find a column defined as integer but only has values that would be satisfied by smallint, see how much space you might save by making such a change (multiply the number of rows by the number of bytes saved). Chances are you could save some space right now and don’t forget to include the space saved for other things like backups.
“…I think about the implications of diving in too deep…it’s just overkill” – Overkill, Men At Work
T-SQL CODE SIN: Trigger abuse.
WHY THIS IS BAD: Everyone knows I hate triggers. And if you didn’t know that, then you know it now. And when I come across database designs that not only include triggers…but NESTED triggers, it makes me want to do some shake ‘n bake. I mean, if THEY don’t care anymore, why should I, right?
Most people go through life thinking that accumulation of objects is a good thing, despite the fact that you “can’t take it with you” after you die. Well, you can take triggers with you. In fact, please do.
WHAT YOU CAN DO: Only use triggers as a last resort, when there are no other options. Also, make certain you document their use in your design. Do everything you can to make certain people are aware of the triggers being used, and only use them sparingly.
“Well you burnt my house down and then got mad at my reaction…” – Effect and Cause, The White Stripes
T-SQL CODE SIN: Building application and code as ‘sa’, or requiring sysadmin rights for your system to run.
WHY THIS IS BAD: It’s beyond bad. It’s horrible. It was an acceptable practice maybe 15 years ago, but these days most folks have an eye for standard security practices. Very few applications need to have such privileges and the ones that do are coming under more and more scrutiny by auditors that understand the principle of least privilege. Quite often the reason many of these application “require” sysadmin rights is because the developers that built the product were doing so as sysadmins and built pieces of code that required more rights than desired (or necessary).
WHAT YOU CAN DO: Don’t allows your developers to have sysadmin rights, especially when they are building code. Give them the rights they need, but no more. My rule was that they could do whatever they wanted inside their database, but once they asked for rights outside of the database then that was something my team should handle for them. It made for a decent line in the sand, so to speak.
I don’t care how much they may complain, don’t let anyone build applications and systems with more privileges than necessary. You’ll thank me later.
“How much longer until you break?” – MK Ultra, Muse
T-SQL CODE SIN: Using dashes and/or spaces in object names such as tables or databases.
WHY THIS IS BAD: It seems innocent enough, right? I mean, why wouldn’t you want to name a database “SharePoint_AdminContent_8ee3e8ad-af84-41e2-8584-6ebcf96cc4d2”? It makes TOTAL SENSE, right? And you wouldn’t think it is a problem, especially if you were building out an application and was never concerned with the need to script out database names. See, if you *did* have that need then you would find the dashes in such a name end up causing problems for your scripts, and you soon learn the need to include [brackets] in your code.
Same thing for including a space in something like a column name. So, ‘FirstName’ is preferred over ‘First Name’.
WHAT YOU CAN DO: Don’t use spaces or dashes. Is it really that difficult? Also, think one step ahead and use those brackets for object names. It’s a good habit to develop and helps you overcome the issues caused by those that use spaces and dashes.
I couldn’t agree more about triggers. My rule of thumb for triggers is that they should only be used if the logic in the trigger absolutely must be executed immediately. If we can handle it in an outside process that runs every few minutes, I’d rather do that. Triggers tend to only deal with a small set of data. Moving it to an outside process allows you handle a larger set of data at once.
Great post. I really enjoyed reading this. Providing the back links to some more in depth examples is also appreciated.
I was getting pretty worried about your music taste with the Fallout Boy call, but you redeemed yourself with Modest Mouse.
Functions in a WHERE clause is a problem I’m trying to get around at the moment. I need to find rows where the trigram similarity between two text fields is above a threshold. Relatively easily done with a CLR function in the WHERE clause, even though it makes me feel oh so dirty.