Extreme Makeover – Database Edition

The original idea for this blog was to write about some of the craziest things you would ever want to do with MS SQL Server. Things like building DTS packages that have a single T-SQL task that executes a stored procedure, or building a database that has but a single table with 800 million rows and no indexes, or putting all of your business logic into functions that get called for each row returned in a result set. The trouble was, I am not only not inventive enough to come up with a myriad of Crazy scenarios, I really do not have the time.

Fortunately (or unfortunately), I am able to see examples of each of the above items each and every day. Indeed, truth is stranger than fiction, and I honestly have no shortage of Crazy items to write about. I just have to take the time to write about them. The other trouble is that I do not want to offend anyone. I write this blog as a way to help me make sense of things. Sure, I make light of a lot of things, but anything I write here is something I would not hesitate to tell someone directly.

Recently we migrated a system from SQL 2000 to SQL 2005. This involved migrating the current development system to the new development system. The same method was used for the test migration. In other words, we went from “old test” to “new test”. As part of the migration we have a checklist of items to run through before we turn the system over to the developers, at which time we assume the developers would spend some time kicking the tires on their new system.

Now, I consider such a migration to be essentially a release of a new version. I treat a migration from SQL 2000 to SQL 2005 with the same level of suspicion as if I were migrating from Sybase to MS SQL. In other words, I would test connections, run queries, etc. But that is me, and I am Crazy.

If anyone is familiar with the show Extreme Makeover – Home Edition, you know that Ty and his crew will rip down and build a new house in seven days. While it may look nice, I am not certain I would want to reside in a house so hastily built. I honestly feel you cannot rush quality. I am sure those houses come out fine, it is just that I would prefer if my house took a little longer, say at least a month to let certain things set properly.

So there we are, Saturday morning, migrating from “old prod” to “new prod”. We finish up during the afternoon. On Sunday evening (yes, more than 24 hours later), we are notified that the system is not performing properly. We double check things on our end and everything seems to be working, well, except that the duration for some queries have gone from three seconds on “old prod” to twelve minutes on “new prod”. Ouch. Our team investigates a bit further and escalates the issue to the (sleeping) onsite team, who pick up their research on Monday morning.

What do we do first on Monday morning? Well, we go to the new test system and run the query. It takes eight minutes. Turns out development is slow as well. Surely someone noticed this during testing, right?

Hello?

Is the caller there?

To be honest I have yet to be given any reason why the system was not tested. I suspect that it was not tested because of the difficulties involved in accurately rebuilding the production environment in test. Such an excuse seems to be used a lot around here, that because it is not production it is not worthwhile to test the system. After all, if it performs poorly how would you know if the reason had to do with the system, the code, or the environment in general?

How Crazy is that? Can you imagine if they built airplanes in the same way? Imagine if they simply decided that wind tunnels were useless because the only way to really simulate atmospheric conditions is to strap that jet to the underbelly of a big, pressurized metal tube with 130 people and their luggage loaded and push it 200 MPH down the tarmac. Of course I understand that lives are at stake there, and the consequences for failure are not the same if a query runs longer than expected in our shop. But when did testing and benchmarking become optional? How can anyone honestly sit around and convince themselves that testing is a waste of time?

So, after the Extreme Makeover – Database Edition is complete, we have a system that is mostly broken for about a week or two. No lives are lost, just increased tensions between our team and the development team, which could have been avoided. The developers feel that their old code should work as fast or faster on the newer version, and no code changes should be necessary. The DBA’s feel that code updates are almost a certainty when it come to new versions. And the stalemate begins, we sit in our cubes and send emails back and forth that essentially say “How much longer before you fix this”?

I wonder if Ty has the same problems with his house each week?

3 thoughts on “Extreme Makeover – Database Edition”

  1. Hi!

    wow – this sounds like a nightmare…

    I am no expert in MS SQL, but it seems a bit odd that old code slows down so bad in a newer version.

    I can imagine this happening if extensive use has been made of hints (not sure of MS SQL has those), or if the newer version changed default indexing algorithm and nothing was explicitly specified in table DDL (which seems pretty unlikely)

    I’d be very interested to hear it if you find out more about the cause.

    Reply
  2. Hi Roland.

    To be honest, we are not certain how the code ran on SQL2000 so well. It is a view of a view, I believe, so we took a shotgun approach and recompiled all views and did an update of stats using fullscan.

    I do believe the optimizer in sql2005 has been rewritten, and this is not the first time i have seen query performance degrade between versions. I am willing to bet that if more developers were benchmarking their queries they would find that performance has suffered.

    The real reason that people seem to be blind by this situation is that when they upgrade to new systems they typically get newer hardware as well as a server that is not as bloated. So, if your old query ran in five seconds, but on the new server it runs in four, chances are you would not think there is a problem. but if you dived into some benchmarks with regards to physical and logical I/O, chances are the performance degradation would be visible.

    But if the end user is simply pressing a button on a web page, they actually believe that performance has gotten better. and who are we to argue with a happy customer?

    Reply
  3. Thanks for your reply crazy dba,

    the point regarding the new hardware is a good one.

    Well, I hope you’ll find a way to solve it. Good luck. MS SQL is not on my rader that much, but I’ll be reading your blog.

    kind regards,

    Roland

    Reply

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.