I just put the finishing touches on a new custom monitor in Operations Manager. This monitor rolls up into the ‘Availability’ monitor for the SQL 2000 and SQL 2005 DB targets. The monitor will run a script that reads from the msdb database, joined to the master database. The idea is that we only want to check for databases that are currently on the instance (I wish a certain report from a certain vendor in their backup software would be smart enough to NOT generate false alarms for databases that no longer exist…but I digress).
Anyway, I configured the t-sql to return the length of time, in days, since the last full dump and last diff dump for each database found in master..sysdatabases (or, master.sys.databases). This info is tossed into a PropertyBag, which is then used to evaluate the health of the monitor. If the last full dump was done over seven days ago, we have an unhealthy situation (we do full dumps weekly here). If we have a full dump within the past seven days, but no diff dump in the past day, then we are degraded (yep, we do nightly diffs here). If we have a full dump within the past week, and a diff dump from last night, then we are healthy.
So, this is yet one more way to ensure that your databases are being dumped. The other ways are through monitoring the SQL Agent jobs themselves for failures. But I have seen cases where a backup job may not finish completely, but the error is not caught by Operations Manager (it depends on what gets written to the event logs). So, rather than try to code for every possible failure, I decided to go right to the source, the msdb, and generate a monitor on that information. After all, the msdb only gets written to after a dump succeeds, so it seems logical to look there for your dump information anyway, and to not rely on the monitoring of the job itself, or an email alert.
If anyone is interested in the gory details, I am planning on putting together an article. But first, there is an election coming up that needs my attention…