Last weekend Karen Lopez (blog | @datachick) and I did a session together at SQL Saturday #159 in Kansas City titled Database Design Throwdown: The Trailer. The reason we called it “The Trailer” is because we are giving people a sneak peek as to what topics we will be covering when we present this session together at the PASS Summit in November.
One of the topics we discussed last weekend was on database refactoring. I’m not a big fan of this technique but only because I’ve seen it fail time and again. Often times I see database refactoring and the terms “scrum” or “agile” get mashed together in a way that seems to be nothing more than an excuse for being sloppy.
Anyway, I decided to ask the audience how many people have seen or used refactoring in their shop. About a dozen or so people raised their hands. I then asked how many of them had seen it deployed successfully and no hands were left raised.
One aspect of refactoring that always seems to foul things up is the adding of a column to a table when they need to rename an existing column on that table. Yeah, I know…what a brilliant idea it is to double your data like that. The theory is that your new code will point to the new column, and that in time you will update your legacy code as well, and then you can just remove the old column. Well, except for a couple of things. First, most likely your business user with a SQL login and Excel on their desktop has no idea which column is the one they should be using. Second, all that code out there right now that relies on the use of ‘SELECT *’ is likely to break (don’t pretend that code doesn’t exist, we both know it does).
Anyway, while debating Karen and telling her why she was wrong about database refactoring I thought of this poster. I’m sure someone will leave a comment and tell me how refactoring is the greatest thing, and it never fails but my experience is different.
As far as I have seen, database refactoring is a great idea in theory but not always great in practice.
Just like when George Lucas decided to make The Phantom Menace. Some ideas are best left as just that: an idea.
I’ve done more that a few DB refactors. Nothing massive or where it affected the whole DB. Some have been successful, more have not be. A common theme of the unsuccessful ones was not having people see them through to completion. A refactor can be done in a few steps separated by time but if you don’t complete them all you don’t really get the benefits. Actually a half way completed refactor will generally leave you with something that is worse than if you had not started at all.
What exactly do you mean when you say “database refactoring?” Refactoring, according to wikipedia (http://en.wikipedia.org/wiki/Code_refactoring), is making an improvement without changing the way something works. Based on this definition, adding an index, or performance tuning a stored procedure or a sql query are considered refactoring.
I’ve done numerous database refactoring projects, both large and small, with a very high level of success. The most complicated project radically changed the structure of the database, without changing any of the application code. All while the database was growing by over 2TB/month. Without the refactoring, the database would not have continued to function.
This is not to say that refactoring should be taken lightly. Any changes to a critical production database should be fully understood, and the risks of the project need to be weighed against the benefits.
Rob,
Yeah, your experience is different, and I can tell you why. You made this statement: Any changes to a critical production database should be fully understood, and the risks of the project need to be weighed against the benefits.
Like I said in the post, typically “refactoring” is one of those terms (like “scrum” and “agile”) that I have seen used in a way to help find an excuse for being sloppy.
You don’t strike me as someone that would leave any stone unturned, and that is likely why you have had more success than not. Others don’t have that same level of success with such projects.
Database refactoring is where views and stored procedures really shine. If at the time of initial creation you have the foresight to deny clients direct access to the underlaying tables you can refactor those tables with impunity, then just use an ‘AS’ modifier to return the new column as the old name.
Pass along some deprecation warnings and then eventually cut them off.
Eric,
I can see how that would work well, is it considered a best practice?
Good question, answer is it’s something that as a developer who gained DBA skills that came to me as an epiphany and I started doing. In OO development we always talk about “coding to interfaces, not implementation” which basically means we try to expose a generic contract for interactions and code against that instead of against hard coded instances so that we can make many types of implementation changes without changing the calling code.
I had an epiphany years back that stored procedures and views were technically interfaces to the underlaying data model, and if from the get-go I denied applications access to the underlaying tables and made them go through the interface layer that I would be free to make changes to that model and as long as my views/sprocs returned the same interface the client code was expecting it would be invisible to the caller. As we would say in “Separation of Concerns” lingo, it’s none of the code’s damn business how the physical data is actually stored.
As with any blanket statement there are certainly times when direct access to tables is required, but I have found the 80/20 rule applies in my experience where 80% of the time it isn’t necessary, and with the bonus to refactoring it makes a lot of sense to go this route.
Eric,
I can recall the days when I would debate the use of sprocs for accessing data versus code in ASP.NET pages. I was told that it was easier to change a database than it was to update code and rebuild the web app. Naturally, they weren’t the ones that needed to “just change the database”.
I’m glad to see that you have found a method for refactoring that works well. I’d like to see more information on the subject, especially some best practices in database design and data access.
I’ll have to add it to my pile of blog posts in waiting. The problem you run into with that particular query is this:
Is using contracts between layers of an application a best practice? Undeniably yes! There are tons of source material on this.
Are stored procedures and views considered a proper contract? Depends on who you ask. There’s so much vitriol in the stored procedures vs ORM debate in particular that I doubt you’ll ever find anything resembling an accepted best practice in the same vein as layered architecture.