Uptime? You betcha.

Just thought I would share some info from Operations Manager that shows the amount of uptime on our servers here since 1/1/2008. Why? Well, since I am the official DBA of Uptime, a distinction that many have questioned, it felt right to provide some numbers.

I am sure many others will be able to provide better numbers than what I have. The numbers are based upon sixty-five (65) production servers here. I set up a custom group in Operations Manager for our production servers and ran the canned ‘Availability’ report against this group. The report worked well, it took a minute or two for it to run. One thing lacking is that there was no overall percentage of uptime, so I had to export the raw data to Excel and do some manual calculations.

I ran the report for the time period starting on 1/1/2008 until about 10:50 AM EST today, which was an opportunity of roughly 8,568 total minutes. Now, one caveat here…Operations Manager will compose this report based upon the state that is defined in the Management Packs. So, if I place a box into ‘Maintenance Mode’, I can avoid having it be considered as unplanned downtime. However, if we have a maintenance window on Sunday, and we do work on the box as part of a planned window, but do not put the box into maintenance mode, then it gets marked as unplanned downtime. Make sense?

Well, most of the time we do not bother with maintenance mode, so I am comfortable that this report shows a very close representation of our production servers. Well, except that some of the downtime on the server could mean the OpsMgr agent wasn’t running, so there would be no way to capture the data on that box without the agent running. So, if the agent is down, but the box is up, the report shows that as ‘monitoring unavailable’. Now my head hurts.

Okay, here is what you want to know. We had 99.999397% uptime for the current year. That uptime is a simple calculation of the number of hours the OpsMgr agents were up and running and detected the MSSQL service was running, divided by the total number of hours possible. So, it is possible that the service was up and running but users could not connect, which you would (rightfully) want to consider as a problem. But hey, no report is perfect, right?

I will keep trying to sift through these OpsMgr reports because I know that many people are interested in them and truthfully I have not been paying much attention to them because they seem to be somewhat nebulous in their details. But before I dig any further, let me make certain I tell my boss about my uptime, and hope he does not ask too many questions.

3 thoughts on “Uptime? You betcha.”

  1. Hi,
    If you could shed light on how to run meaningful reports that would be appreciated. I often find that the reports have no labelled axes and sometimes they are completely blank. Figuring out what to target is, I guess, my biggest headache.
    Worked examples are probably the more useful way of explanation, judging from the numerous posts I have read echoing the same frustrations.
    thankyou,
    John Bradshaw

    Reply

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.