I’ve been involved in a virtualization projects for almost ten years now. In that time I’ve had the opportunity to track my own list of “best practice” items. It’s a list I share freely with clients and customers that seek me out for virtualization advice. I can tell that virtualization (and Cloud) efforts are on the rise simply by the number of requests I get for help, specifically for virtualizing SQL Server.
I like to call this list my “facepalm” special, as any one of these essentially triggers a facepalm reaction. They have helped my customers and clients in the past and I am certain they will help you.
1. Build Your Own
Don’t build a host – especially a PRODUCTION host – out of spare parts leftover from servers that are near the end of their life. If you want to go on the cheap and use spare parts, do it for a development host and get ready to spend extra time keeping that bucket of bolts together. If you are going virtual, you will want to buy new hardware to use for your hosts, and hardware that is more powerful than the servers you already have deployed. There is also licensing considerations here. It could be the case that it is cheaper to buy new hardware and have less to license overall.
2. No Performance Expectations
You cannot go virtual without having any idea as to what is an acceptable performance level. VMWare has a whitepaper that essentially says they can offer you 98% of the same performance as a current physical implementation of SQL Server. Note that doesn’t mean you will get better performance by moving to VMWare itself. Often times you get better performance because you have moved to better hardware (see the first item in this list). But if you don’t know what your current performance SLAs are then you won’t have any idea if you have still met the SLAs once you have converted to virtual. Get your expectations set now so you can track them going forward.
3. Select Wrong Disk Choice
You have two main options here: raw device mappings (RDM) and virtual machine disk format (VMDK). Which one do you want to use, and when? VMWare says that in terms of performance the difference is minimal. The real difference is functional, or architectural (I know I just scared away some DBAs because I used the word ‘architecture’, but yeah I went there). VMWare has published a list of scenarios where RDMs would be a better solution for your shop. You need to know these differences before you start deploying a solution that fails to meet some critical business requirement.
4. Thin Provisioning
Thin provisioning is one of those bad ideas that sounds good and often produces the same results as do-it-yourself dentistry. It starts out innocently enough: someone wants to save space and only allocate storage as needed. The end result is that no one keeps efficient track of what VMs have been thin provisioned and eventually as the files grow in size they fill up all the available storage until all activity stops because the disk is full. VMWare has a recommendation for you listed here: use vMotion to migrate the guests to new hosts where they will fit. Great advice, but I’m guessing you didn’t have enough room to start with, otherwise you wouldn’t be using thin provisioning.
5. Over-Overallocation Of Memory/CPU
It’s okay to want to over allocate your memory and CPU resources. Want you don’t want to have happen is to have them over committed, as that’s where performance issues manifest themselves. When I am reviewing a customer’s configuration I tell them the line in the sand I draw is 1.5:1 ratio as an upper bound default for CPU resources (so, 16 logical cores means you can allocate 24 CPU as a baseline and adjust up or down as needed based upon workload and load balancing allows). You can find a good description on allocating vCPU at this blog post. For memory settings I follow what is outlined in the VMWare Performance Best Practices guide which states “…avoid over-allocating memory”. In other words, I’m much more conservative with memory over allocation than with CPU over allocation.
6. Trusting O/S Counters
When you go virtual that means you have an additional layer of abstraction (i.e., the hypervisor) between you and your data. I usually just say “there are many layers of delicious cake between you and your data”. The trouble is that you need to know which layer is causing you performance issues. Is it the host? Is it the guest? As such, you need to rely on the VM performance counters in order to get a complete picture of what is happening. You can see the counters explained in more detail from this page. If you are still relying on standard O/S metrics for a virtualized server then you are doing it wrong. (And if you need a tool to show you all those layers of cake, I can help.)
7. Running It All At Once
Remember how I said that you want to avoid over committing all your resources at once? That’s where load balancing and knowing your workloads are key. You cannot carve out a dozen guests to be used as production database servers to be run during regular business hours and expect that performance will remain at that 98% mark that VMWare suggests is attainable. You have to balance your workload otherwise you are going to find that your over allocation of resources is now an over commit of resources. Yet I still see customers stretching their hosts way too thin.
These are the seven items that I see hurting a majority of virtualization efforts. They result on bad performance that leaves users and administrators frustrated. They are also easily avoidable with just a bit of up front knowledge and requirements gathering.
Nice post! Number 5 is a struggle at times for us. All 7 of these are spot on.
Thanks Chris, glad you enjoyed the post.
Which O/S counters do you see trusted the most for virtualized SQL Servers but shouldn’t be? And what would the relevant counters be to look at (in VMWare inmy case :)? Great post btw