This Saturday I will be speaking at SQL Saturday #59 in New York City. I only have one session, at 9AM(!) and I need to be done and out of the room before 10AM because that is the time for the next session to start and I need to make certain the next speaker has time to get set up. That means if you attend my session you can expect it to be fairly fast paced since I like to repeat myself, I’m redundant, I say the same things over and over.
Anyway, I thought I would give everyone an idea about my session by blogging about it here. Usually I don’t blog about my session content because I want people to come and hear it firsthand but at the PASS Summit it was pointed out to me that only two people read this blog. I doubt those two people will be in NYC this Saturday, so I figured I could write this summary.
Performance Tuning Is Hard
Everyone knows this. That’s why we all spend money on 3-5 day training classes. We all want to be a master at performance tuning because we want our phone to stop ringing. It wasn’t that long ago I told a manager that “I would be done faster if he would stop calling me every half-hour asking me how much longer” before I fixed the problem (which turned out to be bad firmware in a NIC card, not exactly something inside the wheelhouse of what most DBAs would think to troubleshoot). Anyway, the point is that this stuff is hard for a reason: because we are incredibly disorganized.
Think about this for a moment and ask yourself: what is my performance tuning process? My guess is that it consists of something like this:
- Wait for the phone to ring
- Try to fix whatever it is the end user is talking about
- Do it quickly so they don’t call back every half-hour
So we pay good money for people to teach us all sorts of tips and tricks about SQL Server in order to have a chance to fix problems quickly. But how much good is it if most of your answers to your end users are “you need to rewrite your query”? Not very good at all, since no one wants to touch code, everyone wants you to find some secret knob that just makes things work faster. And despite those training classes you are not always going to find yourself in a situation where the material is applicable (and when it is applicable you can’t touch code, so it really doesn’t help anyway).
Performance Tuning Made Easy
Borrowing from my Six Sigma training I give you….DMAIC! I’ve written about DMAIC before and I am always looking to put it into action. In the session I have been giving this fall at SQL Saturday’s all over the country I explain what DMAIC is and how you can use it to develop an actual performance tuning process. Let me break it down for you here as well.
Always the hardest part in my opinion. This is where you sit down with your end users and document what is an acceptable level of performance. Since that is almost always impossible (because the answer is either “I don’t know”, “as fast as possible”, or “I’ll let you know when it is slow”) you need to resort to other means to define what is and is not acceptable. There are two ways that I recommend:
- Use thresholds defined by Microsoft
- Use a baseline of your environment
You can download a poster that contains a great condensed version of the thresholds defined by Microsoft here. The advantage to using these thresholds is that if someone comes to you and says “the server is spiking with CPU utilization” you can quickly reference what is an acceptable range. It may not be perfect but in the absence of having something already defined between you and the end user, it is better than nothing.
Having a baseline is another great way to define acceptable performance. That way if someone comes to you and says “the server (or query) is slow” you can refer back to previous executions of the query and determine if it is running slower or not. Often times it is a matter of user perception and they may have never noticed that something takes one minute to execute or they just want to complain about something. Having some baseline information and trend analysis data to fall back upon helps you to define acceptable performance.
Obviously you can’t have a baseline of your systems unless you are measuring them in some way. But what to measure? And how? You have three options:
- Roll your own solution using in house tools
- Purchase a tool from a 3rd party
- A combination of both 1 and 2
I used to roll my own but it proved to be cumbersome. And after a while I found myself spending more time tending to my own application(s) than I did to the administration of my shop. So I decided to purchase a tool in order to save myself and my team a lot of time, money, and aggravation. We still had some in house stuff, just not as much. And if something went wrong it was always nicer to say “let me call the vendor” than to say “I screwed up”.
There are only four resource bottlenecks: memory, disk, CPU, and network. Make certain you are taking measures in each of those areas. It is also very important that you be able to tie everything back to a specific SQL statement. It doesn’t do you any good to know that your server spiked at 90% CPU last night unless you know the query (or queries) that caused the spike.
This is where I like to talk about response time analysis and using waits and queues in order to resolve performance issues. For more information you go read the whitepaper yourself. I plan on putting this information into a new talk for 2011 where we can take a deeper dive.
OK, this is where you really earn your keep as a DBA. To be fair, we know that most of your true value is never really known or understood. For example, getting through the first two stages here (define, measure) is quite valuable but also quite unseen. It is in the analyze phase that you have the most visibility. This is where someone will essentially dump a problem in your lap and run away to a dinner party while you fix everything for them. It is also where you can be proactive and fix items that have the potential to be problems later on down the road.
I am here to give you the secret to proper analysis: refer back to your definitions! Do not get caught in the trap of looking for problems (or trying to solve problems) where no problems exist. All too often I see people get frustrated when trying to do performance tuning because they are chasing some mythical ideal. They think that they need to buy more memory but they have no idea if it will help or not, they are just guessing at a solution. But a quick check of some memory counters can confirm or deny if that is even a bottleneck.
When presented with a performance tuning opportunity make certain you refer back to your definitions. It will save you time.
If you are presented with the chance to offer suggestions on rewriting a query I highly recommend that you become familiar with the concept of SQL Diagramming by Dan Tow. Remember that most people have no real process with performance tuning in general. The same thing holds true for rewriting queries. The idea behind SQL Diagramming is that it gives you some structure around something that has previously been an unstructured hack.
In my talk I outline the 12 steps I take when trying to rewrite a query. It was originally 13 steps but since most other programs I am in have 12 steps I just thought it was best to figure out a way to get this done in 12 as well. I’m not going to dive into all 12 steps here, maybe leave that as a post for another day. But if you show up in NYC (or Washington D.C., or Cleveland) you will get a full walk-through.
Easily my favorite part of the entire process. Why? Because it is already done for you: just run your same measure again to see if your improvements have had the desired effects. Yeah, it really is that easy.
In fact, lets think about this for a minute…
Define phase can be considered done if you default to the Microsoft defined thresholds for performance. The measure phase is also done if you are already collecting information on your servers. The Analyze phase is using your own brain to review your measures against your definitions. The Improve phase involves following a process (SQL Diagramming) in order to rewrite queries. And the Control phase is just the same as the measure phase.
In other words, you already have most of this in place.
You are closer than you may think to not only having a performance tuning process, but to making performance tuning easier.