Data is the most critical asset that you, or any company, owns. Without data, your company would cease to exist. All that hardware you bought? Yeah, that’s just there to help data get from one place to another faster. It’s all about the data, so you’d better treat it right.
I’ve said this before but it bears repeating: You get paid for performance, but you keep your job with recovery.
Not everyone understands just how important data is until it is gone. When disaster strikes, and you can’t recover, you are likely to be shown the door…if your company still exists at all.
Here are six ways that you can treat your data right.
Establish Objectives
Establish a Recovery Point Objective (RPO) that determines how much data loss is acceptable. Understanding acceptable risk levels can help establish a baseline understanding of where you should focus their recovery efforts.
Then, work on a Recovery Time Objective (RTO) that shows how long you can afford to be without access to the data as it is being restored. Is a two-day restore period acceptable, or does it have to be 15 minutes?
Once you nail down those RPO and RTO objectives, I suggest you consider defining an alt-RPO and alt-RTO objectives. These are your alternatives for when you need to pull the plug on recovery and reboot everything. Like what Delta did last year when they found out recovery was going to take longer than expected. Rather than wait for an unknown amount of time, they took the alternative method and knew they would be back up and running in a few hours.
Finally, remember that “high availability” and “disaster recovery” are different. Data isn’t the only thing that gets replicated, so do errors and corruption. Having two (or more) copies of errors and corruption won’t help the buisness fix those issues. So you better have a plan in place to recover when this happens (because it will happen).
Understand that snapshots are not backups
There’s a surprising amount of confusion about the differences between database backups, server tape backups, and snapshots. For instance, many people have a misperception that a storage area network (SAN) snapshot is a backup, when it’s really only a set of data reference markers. Remember that a true backup, either on- or off-site, is one in which data is securely stored in the event it needs to be recovered.
Consider the backup rule of three, which dictates that you should save three copies of everything, in two different formats, and with one off-site backup. Yes, I’m that paranoid when it comes to my data.
Make certain the backups are working
Although many DBAs will undoubtedly insist that their backups are working, the only way to know for sure is to test the backups by doing a restore. This will provide assurance that backups are running and not failing. Oh, and it wouldn’t hurt to know if the backup files are still available.
Use encryption
Instead of spending time trying to determine if a piece of data should be classified as “sensitive” and therefore needs to be encrypted, you should treat all your data as sensitive. At a minimum data-at-rest on the server should always be encrypted. Also, you should default to using backup encryption for the database backup file(s). You can either encrypt the database backup file or encrypt the entire database using Transparent Data Encryption (TDE). That way, if someone takes a backup, they won’t be able to access the information without a key.
You should also note that some storage arrays, like Pure Storage, perform encryption for you already. This means you could rely on their encryption and not deploy a feature such as TDE. The point here is that as a DBA you should take steps to ensure that if a device is lost or stolen, the data stored on the device remains inaccessible to users without proper keys.
Monitor and collect data
Real-time data collection (RTC) and real-time monitoring (RTM) should be used together to protect data. Combined with network performance monitoring and other analysis software, RTM and RTC can improve performance, reduce outages, and maintain network and data availability.
With RTC, administrators can capture events as they come in, allowing them to establish a real-time collection of information they can then store to do proper data forensics. This will make it easier to track down the cause of an intrusion, which can be detected through monitoring.
RTM, database analysis, and log and event management can help you understand if something is failing. They’ll be able to identify potential threats through things like unusual queries or suspected anomalies. They can compare the queries to their RTC historical information to gauge whether or not the requests represent potential intrusions.
Test, test, test
This is assuming you have already tested backups, but let’s make it a little more interesting. Let’s say a DBA is managing an environment with 3,000 databases. It’s impossible to restore them every night; there’s simply not enough space or time.
In this case, DBAs should take a random sampling of their databases to test. Shoot for a sample size representing at least 95 percent of the 3,000 databases in deployment, while leaving a small margin of error (much like a political poll). From this information DBAs can gain confidence that they will be able to recover any database they administer, even if that database is in a large pool. If you’re interested in learning more, check out this post, which gets into further detail on database sampling.
Summary
Data lasts longer than code, treat it right.
Don’t treat it like it’s anything but the most critical asset you or your company owns. Make sure no one is leaving server tapes lying around cubicles, practice the backup rule of three, and, above all, develop a sound data recovery plan and make certain that it works.