18 Aug HOW TO: Find Currently Running Long SQL Agent Jobs
A while ago I had posted a script to help find jobs that were currently running longer than expected. Since I’ve been sharing some scripts recently, I decided to verify that this script still works for finding long running jobs in SQL Agent for SQL Server 2014.
Why This Script Matters
Because you want the opportunity to know about issues before your end users call to complain, to be proactive. SQL Server doesn’t persist data about queries between restarts of the instance, but job history is persisted between restarts. That means we can do some data analysis on those details to find if there are jobs currently running longer than two standard deviations away from the mean (i.e., you have what is known as an outlier).
Let’s take a minute to review what the code should be doing.
First, you will want to use this script to poll your running jobs at a decent interval, perhaps every ten or fifteen minutes. Whatever interval you choose you should understand the relation between your polling and the @MinAvgSecsDuration parameter. That parameter is the minimum threshold for job duration average. If I set it to 300, then I won’t be concerned with any jobs that have a historical average less than five minutes in duration. I probably don’t want to poll more than 300 seconds apart because there is a chance I could miss an outlier (say if I was polling every fifteen minutes) because this code is filtering for jobs that are currently running.
Second, there is a parameter (@MinHistExecutions) for the minimum number of successful job executions. I like to set this to five or seven, in order to give me a decent sample size to serve as a base. Combined with the @MinAvgSecsDuration parameter this lets me avoid having a lot of false positives for jobs that are newly added.
Third, at the end of the code is a line (AND crj.job_state = 1) that checks to see if the jobs are currently running. That’s the point of the script, after all. However, if you comment this line out then this script would serve as a way for you to run a report, perhaps daily, to look for jobs that had any durations more than two standard deviations away.
The last thing I want to mention is that you can also control the amount of history you use for your average calculation. It is set to grab all history by default, but you could configure it to use a rolling window if desired. For example, you may just want to consider the job history for the previous twelve months.
As always, feel free to modify the code for your own needs.
Here’s the script, which you can also download here. Also, here’s my usual disclaimer:
Script disclaimer, for people who need to be told this sort of thing:
DISCLAIMER: Do not run code you find on the internet in your production environment without testing it first. Do not use this code if your vision becomes blurred. Seek medical attention if this code runs longer than four hours. On rare occasions this code has been known to cause one or more of the following: nausea, headaches, high blood pressure, popcorn cravings, and the impulse to reformat tabs into spaces. If this code causes your servers to smoke, seek shelter. Do not taunt this code.
/*============================================= File: long_running_jobs.sql Author: Thomas LaRock, http://thomaslarock.com/contact-me/ http://thomaslarock.com/2014/08/find-currently-running-long-sql-agent-jobs/ Summary: This script will check to see if any currently running jobs are running long. Variables: @MinHistExecutions - Minimum number of job executions we want to consider @MinAvgSecsDuration - Threshold for minimum job duration we care to monitor @HistoryStartDate - Start date for historical average @HistoryEndDate - End date for historical average These variables allow for us to control a couple of factors. First we can focus on jobs that are running long enough on average for us to be concerned with (say, 30 seconds or more). Second, we can avoid being alerted by jobs that have run so few times that the average and standard deviations are not quite stable yet. This script leaves these variables at 1.0, but I would advise you alter them upwards after testing. Returns: One result set containing a list of jobs that are currently running and are running longer than two standard deviations away from their historical average. The "Min Threshold" column represents the average plus two standard deviations. Date: August 19th, 2014 SQL Server Versions: SQL2005, SQL2008, SQL2008R2, SQL2012, SQL2014 You may alter this code for your own purposes. You may republish altered code as long as you give due credit. THIS CODE AND INFORMATION IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE. =============================================*/ DECLARE @HistoryStartDate datetime ,@HistoryEndDate datetime ,@MinHistExecutions int ,@MinAvgSecsDuration int SET @HistoryStartDate = '19000101' SET @HistoryEndDate = GETDATE() SET @MinHistExecutions = 1.0 SET @MinAvgSecsDuration = 1.0 DECLARE @currently_running_jobs TABLE ( job_id UNIQUEIDENTIFIER NOT NULL ,last_run_date INT NOT NULL ,last_run_time INT NOT NULL ,next_run_date INT NOT NULL ,next_run_time INT NOT NULL ,next_run_schedule_id INT NOT NULL ,requested_to_run INT NOT NULL ,request_source INT NOT NULL ,request_source_id SYSNAME NULL ,running INT NOT NULL ,current_step INT NOT NULL ,current_retry_attempt INT NOT NULL ,job_state INT NOT NULL ) --capture details on jobs INSERT INTO @currently_running_jobs EXECUTE master.dbo.xp_sqlagent_enum_jobs 1,'' ;WITH JobHistData AS ( SELECT job_id ,date_executed=msdb.dbo.agent_datetime(run_date, run_time) ,secs_duration=run_duration/10000*3600 +run_duration%10000/100*60 +run_duration%100 FROM msdb.dbo.sysjobhistory WHERE step_id = 0 --Job Outcome AND run_status = 1 --Succeeded ) ,JobHistStats AS ( SELECT job_id ,AvgDuration = AVG(secs_duration*1.) ,AvgPlus2StDev = AVG(secs_duration*1.) + 2*stdevp(secs_duration) FROM JobHistData WHERE date_executed >= DATEADD(day, DATEDIFF(day,'19000101',@HistoryStartDate),'19000101') AND date_executed < DATEADD(day, 1 + DATEDIFF(day,'19000101',@HistoryEndDate),'19000101') GROUP BY job_id HAVING COUNT(*) >= @MinHistExecutions AND AVG(secs_duration*1.) >= @MinAvgSecsDuration ) SELECT jd.job_id ,j.name AS [JobName] ,MAX(act.start_execution_date) AS [ExecutionDate] ,AvgDuration AS [Historical Avg Duration (secs)] ,AvgPlus2StDev AS [Min Threshhold (secs)] FROM JobHistData jd JOIN JobHistStats jhs on jd.job_id = jhs.job_id JOIN msdb..sysjobs j on jd.job_id = j.job_id JOIN @currently_running_jobs crj ON crj.job_id = jd.job_id JOIN msdb..sysjobactivity AS act ON act.job_id = jd.job_id AND act.stop_execution_date IS NULL AND act.start_execution_date IS NOT NULL WHERE DATEDIFF(SS, act.start_execution_date, GETDATE()) > AvgPlus2StDev AND crj.job_state = 1 -- comment this line if you want to report on all jobs GROUP BY jd.job_id, j.name, AvgDuration, AvgPlus2StDev