Why Is My Server Running Slow?

It’s Sunday evening. You are sitting down to have dinner with your family. Suddenly your phone rings. You want to ignore it because, let’s face it, your phone rings a lot. But you see the person calling and knowing they don’t call very often you decide to answer. On the other end of the line a harried voice tells you the words that every DBA dreads hearing:

“The server is slow.”

Many thoughts run through your head: Well of course it is, otherwise you wouldn’t be calling, right? And why doesn’t anyone ever call me to tell me that the server is running fast? And what, exactly, is meant by “slow”? 

“OK, do you have any other details for me?” you ask.

“Yeah, I sent you an email an hour ago with some charts that PAL returned. Don’t you check your email all day on Sundays?”

You resist the urge to hang up and go to your laptop to examine the images that were sent. You see the following:

Page Life Expectancy
Lazy Writes per Second
Page Reads per Second
Full Scans per Second
Batch Requests per Second

So, why is the server running slow?

Next week Jason Strate (blog | @stratesql) and I will be doing a presentation for the Minnesota SQL Server User Group (PASSMN). The title of the talk is “Choose Your Own Adventure: Performance Tuning”. We will walk you through a scenario not unlike the one I have outlined above. We will talk about the various techniques you have at your disposal when it comes to performance tuning and help you understand which techniques are most useful.

And you, the audience, will decide how we troubleshoot and solve the scenario presented. Life is an adventure, and at times so is performance monitoring and troubleshooting.

Think about what you would do first for the scenario above, or what a new DBA might do, or what an MCM might do. Think about how long you have before your manager decides to hire a DBA that has all the answers already. Think about what you can do to fix this issue immediately. What steps would you take? What questions would you ask? Leave your comments below.

34 thoughts on “Why Is My Server Running Slow?”

  1. Did someone kick off an index rebuild or a checkDB around 4:45?

    Something’s just displaced the entire buffer pool, but neither batches/sec nor full scans/sec show a sharp increase at the same time (just what look like regular patterns)

    Also did the SQL memory usage just get a severe trim (check the PAL logs for total server memory)? If so, see if there’s anything odd running on the server, or maybe a large file copy in progress.

    Reply
    • No, there was no index rebuild or checkdb kicked off, and that was my first thought as well. But those aren’t done at that time of day on a Sunday, so I ruled it out just as quickly.

      (also, i don’t believe checkdb goes through the buffer pool anyway, so that shouldn’t have flushed everything).

      The batches/sec and full scans/sec are just the numbers of batches and scans, and has nothing to do with the AMOUNT of data being scanned. 

      Nothing odd running on the server, no file copies, no memory trimming.

      Reply
  2. So it looks like one or a few queries that are reading much more than they should – this could be caused by bad /outdated statistics, a bug in a query that isn’t called very often (e.g an inadvertent cartesian join).

    Run sp_who2.  Look for queries running for a suspiciously long time,  run dbcc inputbuffer(spidID) to find out the queries.  Debug.

    Also, depending on how well you know the db and you’re sure the consequences are acceptable for killing a spid, you can use the kill command to take out long-running queries.

    Reply
    • Amax,

      Statistics are not outdated, and there is no bug in the query. 

      However, you are the closest in guessing that one (or a few) queries are reading a lot of data.

      Reply
      •  If it’s one or a couple of queries, it’s tough to diagnose exactly what is causing the problem just from the graphs presented, because there are many things that it could be, at least given my level of understanding.  The lazy writes are spiking because the db needs to clean the dirty pages that other operations are running so as to free up more memory for the read, I’m assuming.   I’m curious if there’s a further way to diagnose the problem just from what you present.  I personally would run a sp_who2 followed by a dbcc inputbuffer at this point.  I want to say that this might be a blocking/locking problem, but really there are lots of issues it could be given this level of info  

        Reply
        •  So, if my job depended on getting this fixed immediately, I’d find and identify query belonging to the spid that’s long running using sp_who2 /dbccinputbuffer(spid).  Kill the spid (this will make the database ‘fast’ again, unless the behavior recurs)  Find the query causing the problem and run it using show estimated plan.  look for weirdness in the plan.  Look at indexing issues first (sometimes a query works efficiently at a low level of data in a table but fails when the table gets above a certain size, because the queryplan is outdated, because the stats are outdated).  See if I can replicate the behavior by running the query that I’ve identified as slow, being ready to stop or kill the query if it shows signs of taking over the server again.

          Reply
        • Amax,

          Yes, you are on the right track, no question. The graphs don’t give all the details, because we still need to know (1) who and (2) what was running. And that’s where the DMVs or sp_who2 would be helpful. 

          Reply
          • I would start investigating if it system(user iniciated) or user query. If it is system check for backups running, a Partition rebuild/splitting or even someone decided to change the storage method and switch compression on/off.

            This to me a process reading alot of data and if statistics are not the problem nor indexes or no bad plan then my best guess would be a extract from the DB to a file where way to many “key lookups” are done (key lookups are BAD but does that qualify as a bad plan?

  3. I’d start by looking at wait stats to see if there is anything obvious there. This looks similar to issues we see often where poor execution plans have suddenly been generated resulting in large table scans. 

    Has there been a large data load process that has impacted the statistics?

    Reply
    • Smellypom,

      There was no large data load, and statistics are not an issue. The wait stats tell you that the disk activity is high (PAGEIOLATCH).

      Reply
      • Don’t suppose we have sp_whoisactive installed do we? 

        I’m still thinking bad plan causing high reads, possibly parameter sniffing? 

        Is there a single query causing the issue or a high number of waiting tasks.

        I’m out of ideas but interested in the outcome.

        Reply
  4. Page life expectancy really dipped low. Did someone fire a lot of poor adhocs leading to useful data being pushed out of cache.?

     

    Reply
  5. An “event” took place at ~4:49pm that wiped out the PLE.  Full scans and batches/sec don’t seem to have changed.  I would check the following:

    – recompiles/sec (did somebody flush the cache or restart the instance?)
    – is SQL using the max memory allocated to it? (not using locked pages, something external took memory away?)
    – is there a full backup underway?

    Reply
    • Djl,

      I posted the answer…and another scenario…here: http://thomaslarock.com/2012/03/what-happened-here/

      Reply
      • Ha! I knew the right answer was a bad query!
        Maybe the report showed something more than just “42”, but it definitely read a lot of data.
        Sometimes the simplest answer is the right one!

        Thanks for sharing.

        Reply

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.