Everything I Needed To Know About Waits and Queues I Learned From the TSA

My job at SolarWinds is all about waiting.

Database Performance Analyzer polls your systems tables to track your SQL statements and capture wait information. I often give data demos to customers from my home office, which means I spend some time waiting for the customer to join the meeting. I also travel to visit with customers and sometimes I need to wait in a conference room before the meeting. And with my travel I often find myself at the airport where I am forced to wait in a variety of lines including, of course, the ones for security that are run by the Transportation Security Administration (TSA).

Recently while observing some standard gate-rape procedures at a local airport I found myself in a bit of a deja-vu situation. At first I thought I was recalling what life was like for me back in college, or those three months in prison, but then it hit me: the TSA has developed a working model for how SQL Server scheduling works inside of the SQLOS.

Since most people have no idea about scheduling, how it works, what the SQLOS is, or why they would even care about any of this I decided to do two things. First, was to write this blog post. Second, was to put together a presentation for 2011 that helps to describe how all the pieces work together and why you should care. I will use the TSA example to help you understand in greater detail.

First, let’s imagine you are going on a trip and need to fly somewhere. You get to the airport, get your ticket, check your bag, and stand in line for the security checkpoint. You inch your way forward and eventually a TSA agent reviews your ticket and identification and then allows you to enter into one of the screening lines. Again you inch your way forward and just as you are done putting your stuff onto the x-ray belt you are asked to step into a machine that may or may not deliver a cancerous dose of radiation. So you say the words “I opt out” and a TSA agent then gets to fondle every part of your body whether you like it or not (PRO TIP: don’t offer money at the end, even if you enjoyed the process) After they are done groping your body you are allowed to continue through the security line and collect your things that have been sitting at the end of the belt (well, hopefully they are still there and no one has walked off with your personal belongings).

And now you know how the SQLOS handles scheduling. Class dismissed. What’s that? The example went too fast for you? You never heard the words running, runnable, or waiting? OK, let’s review it again.

After your ID is checked by the TSA agent and before you place your belongings on the belt, you are called a runnable process (well, to be precise for this example, you are what I am calling a runnable process). When your belongings are going through the x-ray machine and you are being processed (stepping through a metal detector or the cancer machine) you are considered a running process. And when you are pulled aside for a little extra loving, touching, and squeezing you are considered waiting.

And that’s how the SQLOS handles the execution of SQL statements. When a CPU is processing a statement it is running. Statements that are ready to be run are lined up in what is called the runnable queue. And statements that need something extra before they can be run are placed into the waiting queue.

It’s that simple. But why should you care? Well, that’s simple too.

If you know what waits your server is experiencing the most, then you can take steps to tune your server to reduce those times which will increase performance. You can get the information from the sys.dm_os_wait_stats view:

SELECT *
FROM sys.dm_os_wait_stats

If you know what waits individual SQL statements are experiencing the most then you can take steps to reduce those waits and boost performance for that query. Borrowing a script from Glenn Berry (blog | @GlennAlanBerry) you can get those details with something as simple(!) as this:

SELECT LTRIM (st.[text]) AS [Command Text],
       [host_name], der.session_id AS [SPID],
       der.[status], db_name(database_id) AS [Database Name],
       ISNULL(der.wait_type, 'None') AS [Wait Type],
       der.logical_reads, der.cpu_time, der.total_elapsed_time
FROM sys.dm_exec_requests AS der
INNER JOIN sys.dm_exec_connections AS dexc
ON der.session_id = dexc.session_id
INNER JOIN sys.dm_exec_sessions AS dexs
ON dexs.session_id = der.session_id
CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS st
WHERE der.session_id >= 51
AND der.session_id <> @@spid -- eliminate this connection
ORDER BY der.[status]

And if you find that your statements are stuck in the runnable queue more often than waiting for other resources (or extra loving) then you know you have CPU pressure. How do you get that information? That’s a lot easier, you go back to Glenn Berry (blog | @GlennAlanBerry) and to use the sys.dm_os_wait_stats DMV again and do some simple math:

SELECT
 CAST(100.0 * SUM(signal_wait_time_ms) / SUM (wait_time_ms) AS NUMERIC(20,2))
            AS [%signal (cpu) waits],
 CAST(100.0 * SUM(wait_time_ms – signal_wait_time_ms) / SUM (wait_time_ms) AS NUMERIC(20,2))
            AS [%resource waits]
FROM sys.dm_os_wait_stats

Inside that DMV you have a column named wait_time_ms (the total time, in milliseconds, that queries have been waiting for a resource) and one named signal_wait_time_ms (the amount of time that queries have spent waiting for a signal, which we call runnable). The query above does some quick calculations for you. There are no hard numbers here, but if you have more than 15% of your total waits coming from statements that are in the runnable queue (this is what the first result column is telling you) then you should investigate a possible CPU resource contention issue. I said “possible”, nothing here is concrete.

Before I forget to mention it, the second column returned in that query gives you a percentage of time spent on resource waits.

And you get all of that information by doing something so simple, just measuring your waits.

I am putting together my slides over the holidays and will be presenting the talk for the first time at the North Texas SQL Server User Group meeting in January. If you are in the area stop by and we can talk waits, signals, and how to properly tip your TSA agent.

11 thoughts on “Everything I Needed To Know About Waits and Queues I Learned From the TSA”

  1. instead of der.session_id >= 51

    I would do dexs.is_user_process = 1

    to filter out system/background processes

    I have servers with 15+ mirrored databases and I see SPIDs in the 70s and 80s range with DB MIRROR, BRKR EVENT HNDLR AND BRKR TASK

    Reply
  2. Really enjoyed your presentation at NTSSUG. It covered a topic that, as a developer, I never really looked into. I’ll see what I can do to look at these items when developing procedures and functions.

    Reply

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.