The 2006 PASS conference has begun. This year we are located in sunny Seattle. It feels good to be back in the Pacific Northwest, having attended graduate school at Washington State University ten years ago. Hard to believe it has been that long since I was last here, and even longer since the last time I saw the Emerald City.
I arrived on Sunday to take advantage of a pre-conference seminar on Monday. Normally I arrive on Monday for a Tuesday seminar, but this year I was invited to attend a session on Tuesday that is dedicated to PASS volunteers. I have been volunteering part of my time to PASS for the past two years, so it was natural for me to attend that on Tuesday, meaning if I wanted to learn something extra, I needed to arrive a day early.
The topic I chose was SQL CLR. I decided that since it was an area of which I only had a cursory knowledge, and since my shop has some developers using the SQL CLR, that it would be of a great benefit for me to spend the day absorbing as much as possible. The goal was to have a better understanding of SQL CLR, and how best to apply my knowledge to what our company needs at this time.
The lecture was delivered by Niels Berglund. I had never met Niels before today. He was able to deliver the material rather well, interacting with the audience when necessary, and most of his demos were stable. Every now and then I will get a presenter who struggles with their presentation, they are not able or willing to dive into some questions, or their prepared demos do not work, or the demos do not help make a point, etc., etc. Niels did very well, even working without a white board when at times it would have been nice for him to have one at his disposal. He was very receptive to questions, and was able to explain things in terms that even I could understand.
There is a lot of mystery about the SQL CLR these days. Not many people I know are using it in their shops. The people that were in the talk today seemed to be more on the development side of database administration. They had lots of interesting questions regarding the use of SQL CLR, and lots of reasons as to why they would want to use it.
And there is the rub. Yes, SQL CLR is a nice tool. But it is not necessarily the right tool. At the end of the day I wanted to know one very basic concept: When (or where) should it be used? Just because I can create an object in SQL CLR does not mean that I should. And then there are additional questions regarding stability, reliability, and portability.
By the end of the day I believe I had enough information to formulate my answers. Now all I need to do is explain everything to my peers back at ING IM if and when they ask me questions. Niels broke his talk into sections that covered the creation of SQL CLR objects, how the SQL Server process hosts the CLR, how to debug SQL CLR code, and also gave some examples to compare and contrast T-SQL versus SQL CLR objects.
After going through the creation of SQL CLR objects, and how SQL Server hosts the CLR , I think the part on comparing and contrasting T-SQL versus SQL CLR was the most impressive part of the day. Niels had put together a very basic example. He built some T-SQL code that did a factorial, created a SQL CLR object that did the same factorial, and then compared the amount of processing time each one took to run. The SQL CLR object was faster, much faster, than the T-SQL. The catch here is that T-SQL is not compiled code, whereas the SQL CLR is compiled code. For computational processing, machine code will always be faster than T-SQL, and the break-even point is four computations. Thus, the factorial example was really well chosen to make this point, as he did a factorial of four and saw such a dramatic difference in execution time.
Another use of the SQL CLR was for string validation, say if you wanted to validate an email address. The T-SQL code is quite lengthy, but the SQL CLR code was two or three lines. However, the processing time was nearly identical. How could this be? Well, the SQL CLR was processing their code for each row being returned in the rowset, and that extra overhead slowed things down considerably. Niels then demonstrated a way to only have the SQL CLR execute once for the result set and then the code was about four times as fast as T-SQL.
The final example involved table valued functions. He discussed how ‘back in the day’ he was in a class that built an OLE DB provider to bring in data from a specific source, and it took them a day and a half to accomplish that task. Well, in ten minutes he used a SQL CLR object to go out to a directory and get back information on filename, filesize, and creation date.
Apparently, SQL CLR really is a gift from the Gods at Microsoft. So, why would any DBA be against its use? Let me count the ways…
First, it can be difficult for a DBA to track performance issues when SQL CLR objects are being utilized. So, if the phone rings and someone says “the server is slow” and when I try to capture the statements hitting the box at that time, I may not see all that I need to see in order to determine what is going on. It depends on how the SQL CLR object was built. And when you have a team of developers waiting for you to solve their performance issues the last thing you want to do is go back to them and say “I don’t know”. That reply will not earn you any confidence points the next time they need your help. In our role, we have to have the answers, each and every time, otherwise people start to look at you funny and wonder what it is you do around here. Actually, they probably do that already.
Second, on top of everything else that I need to know, I now need to (re)learn VB and C# in a hurry, so that I can sit with a developer and help them find ways to rewrite code. If I cannot read the languages, or be familiar with their functionality, then it makes things more difficult for me to offer helpful suggestions. Thankfully I am not completely in the dark with VB.net, but C# is not something I have used regularly in the past. And while I doubt anyone is expecting me to write actual VB and C# code, I know that it would be quite useful for me to be able to explain and demonstrate the proper way to load a dataset for processing, or where it might be beneficial to make a process multi-threaded, or why it might not be a good idea to issue a rollback transaction command inside of a nested SQL CLR object.
Third, while unlikely, it is still possible for someone to write code poor enough to bring down the SQL Server instance should their SQL CLR object crash. While I can agree that the chance is quite small, it is still a chance. And as a DBA, I would like that chance to be less than zero. One way to make it zero is by not enabling SQL CLR for the instance! Okay, so maybe this is not as much of a concern as the first two. But what do they say about building something that is idiot-proof?
At
the end of the day, I found myself quite comfortable with the concept of SQL CLR running in my shop. However, I believe that when the use of SQL CLR is being requested, there should be some form of code review. This way we would be able to ensure that we minimize any duplication of effort. If one team builds an email validator, then other teams would be aware that it exists and have it made available for their use in the future. Perhaps after the code review we look to keep our own repository to SQL CLR objects and document what they are for. Then, whenever someone needs a solution that they believe SQL CLR is the best answer for, they could search through an existing library to see if similar work has already been done.
This would be valuable for quite a few reasons. First, my shop is spread over three cities (maybe more). As new people join teams, it would be a great asset if they could go to a common place to find code examples. The benefit to these particular examples is that they would be objects that are designed with our environment in mind. Second, it would allow for our teams to start sharing their solutions with other teams, which should open up communications between groups, something that we do not have enough of today. Lastly, as our offices start to consolidate more systems, or as the systems start to consume data across sites, having code and solutions that we know already works is a great asset. And should it become necessary to research an issue, say in production, you would have more people already familiar with the underlying code, thereby increasing the odds that the collective minds would also be able to find a solution, as opposed to the one or two minds that built a process by hand in today’s environment.
Just think of it as one big toolbox for everyone to use, with the tools being built by the group as well. What could be bad about that?