There was a conversation a while back on Twitter regarding a survey that claimed to represent “Americans”. I don’t recall the survey itself, but I believed the end result used words such as “62% of Americans think bacon should replace turkey as a Thanksgiving meal.” The survey itself was conducted using 1,000 people, and someone commented that only sampling 1,000 people to represent all of the USA was not enough of a sample size, to which I disagreed almost immediately.
Why Are You Boring Me With This Today?
What’s my point? Good question.
Let’s think about your primary function as a DBA, recovering data. Now, when was the last time you did any verification regarding your database backups. Not that the backups ran, but the files they produced can be recovered and restored? Last week? Last month? Last year?Or do you consider each restore you conduct for clients as your own way of testing your recovery process?
Many moons ago I was a mathematician. Even more recently I participated in some Six Sigma training. So the idea of using statistical sampling is not exactly foreign to me and it is why I knew that a truly random sample of 1,000 Americans is enough to have a certain level of confidence in the results.
But how many databases should you use as a random sample to be, say, 95% confident that all your database backups can be restored and recovered?
Is 95% confidence even good enough when it comes to your database recovery plan? Should you go with 99% instead? Some people may build a process that will verify each and every database, but what if you have thousands to verify? Seems like a lot of extra I/O to generate for verification only. And is random sampling even the correct way to approach this verification? How do you conduct such a test?
I thought about all of this and decided to put it all into an article which was published yesterday over at Simple-Talk. Head over there to read more about how I was inspired by a Twitter conversation and my love of crabs.
Seriously.
I like this article, but it is of real limited use to anyone trying to implement it in their own environment because you don’t mention how your sample size (n = 3000) is used. I’m guessing it is involved in getting the magic 1.96 number, but unless we know how, there is no way the reader can use this method to determine how many backups they need to test in their environment. (Unless they also happen to be responsible for 3000 databases as well.)