11 Jul 2012 Clustering On Uniqueidentifiers Is “No Reason For Worry”?
“Trust [everyone], but always cut the deck.” – Gerald Weinberg
Whenever I come across a piece of technical information on the internet I try to remember two things. The first is assume good intentions. The second is trust, but verify. In other words, don’t assume that the person sharing the information is doing so for any other reason than they are trying to be helpful. Also, you should trust their findings but verify them for yourself before you declare them to be invalid.
So imagine my reaction when I found this sentence buried deep within a blog regarding Windows Azure SQL Database and Federations:
“…clustering on uniqueidentifiers is no reason for worry.”
The explanation here is that we don’t need to worry about the fragmentation that will occur as a result of clustering on a uniqueidentifier. The reason for why the fragmentation will happen is simple: because the uniqueidentifier values are random, the insertion of new rows will be random as well. In other words, when I go to insert a new row I get a random insertion point because of the clustered uniqueidentifier. This leads to fragmentation over time and is why the IDENTITY property is the preferred method. With IDENTITY we are able to ensure that values are created in a sequential order, reducing the fragmentation.
So why don’t we need to worry about fragmentation here? Because we are using Windows Azure SQL Database (which I have decided to just call WASDB) as opposed to an on-premise version of SQL Server. My initial reaction was that this is the equivalent of your dentist handing you a bag of candy canes and gummy worms and telling you not to worry about your teeth because you have a dentist.
So, it is OK to deploy something that will cause fragmentation because if the fragmentation starts to cause poor performance you can just use Azure to help your design scale. Another little known fact about Azure is throttling. If you consume too many resources then you may find your connections being throttled. Guess what may cause you to consume too many resources? Improper indexing techniques that cause excessive fragmentation.
This means, of course, that at some point your design crosses a threshold where you need more of the service being provided. I guess that is really no different from the electric company also selling you lamps and light bulbs.
This is the best post I have read regarding choices for clustering keys written by Kimberly L. Tripp: http://www.sqlskills.com/BLOGS/KIMBERLY/post/GUIDs-as-PRIMARY-KEYs-andor-the-clustering-key.aspx
I understand that there are technical limitations for why a WASDB Federation will not have the identity property, and as a result why they would advocate the use of a uniqueidentifier. Of course I will trust that this will work, but I will want to verify this for myself at some point. The real concern I have here is that someone will stumble across that original blog post and think to themselves that clustering on uniqueidentifiers is OK for all types and versions of databases, which is simply not a true statement.
A quick search of the intertubz turned up some alternatives to generating sequential id in Azure one of which is called SnowMaker. There is another post here that talks about creating a data source for the same purpose. Of course they both require extra code and the point of the original article that prompted my blog post today is that using a uniqueidentifier requires no additional code logic.
At the end of the day you should use what works, of course.
But if others can do it, why can’t Microsoft deploy something that would work to satisfy the need for a true identity column? I certainly hope their stance isn’t “why bother, it helps us sell more of our service”.
Faster memory, cheaper disks, etc., will never solve design problems. Eventually one runs out of money or technology.
Good design will always pay off.