Differential Backups, Differential Terms

Different DifferentialsBackup or dump, which do you prefer to take?

I have often referred to a database backup as a database dump. The reason for this goes back to an event that happened many years ago where the database backup files filled up a disk drive. As a result we were not able to recover data as expected, not even if we went to tape, because the backups were never being done due to the disk being full and no one knowing. I remember being in a meeting and whenever I would say “backup”, I was politely corrected by a manager who would say “dump”. At the time I really didn’t understand why he would keep correcting me, but it soon became clear.

Servers are backed up to tape. Databases get dumped to a file.

At least that’s how this manager would think. And yes, I understand that you can also say that servers are dumped to tape. But you can avoid confusion between the terms by using the phrase “database backups” because to a server engineer that term could just mean the database server was backed up to tape. So over the years I would use the word “dump” more frequently than “backup”. I’d like to think it helped reduce confusion as to what topic was being discussed.

But I doubt it has helped at all, since the word “dump” is overloaded with meanings. Many folks consider a data dump to be an export of data, sometimes also referred to as a logical backup (as opposed to a physical backup, which is something that is meant to ensure transactional consistency).

There is no shortage of words and phrases that are overloaded with meanings in the world. Even the word “pants” can mean vastly different things.

Different Differentials

More important than the syntax on dumps versus backups is something that is far more confusing. It is a term that is used easily by both server and database experts. It is a term that can cause no end of confusion for everyone.

I’m talking about, of course, differentials.

For the SQL Server database guy, a database differential dump will create a file that contains all of the changes to the database since the last full dump. Simple, right?

For the server guy, a tape differential will create a file that contains all of the changed files since the last full or incremental tape backup. Simple, right?

Hmmm…what does ‘incremental’ mean, exactly? Well, an incremental tape backup is a backup that contains all the changes since the last full or incremental backup. Which actually sounds more like a database differential dump, right? Kinda, but not exactly, because a differential database dump is only based upon the last full database dump, and not any other intermediate dumps in between.

In some ways, the incremental tape backup is closer to a transaction log dump, as a transaction log dump will contain all changes since the last transaction log dump.

Now, the only reason you need to know any of this is if you are responsible for building a recovery plan. If you are still reading this post, great. Here’s a hidden nugget for you:

You don’t need the last full database dump to exist on disk in order for a differential dump to complete successfully. The details from the database dumps are stored inside the msdb database, so you can archive your full dumps by moving them off of disk (say, if you needed the space and they were archived to tape already) and your differentials will still run just fine.

You do, however, need the last full tape backup to exist in order for you take a differential or incremental tape backup. If the full tape backup does not exist then the differential (or incremental) tape backup will fail. So, you can’t save space by removing the full tape backup, you need to lug that around until you take the next full tape backup.

Just something to consider should you find yourself spending a significant amount of time each day managing files on disk in an effort to save storage space.

To Make It Even More Confusing

Enter the term “snapshot”. Starting with SQL Server 2005 users were able to create a snapshot of their database. Around that same time, as virtualization products such as VMWare took hold in the marketplace, the concept of a VM snapshot became a common phrase. To complicate things even further the concept of Volume Shadow Copy Service (VSS) also used the term snapshots. So, three different pieces of technology, all of them overloading the word “snapshot” in order to describe some part of a data recovery process.

There was a brief period in time when I would hear a server admin, fresh into the virutalization world, suggest that taking a snapshot of a VM running SQL Server was “good enough” for a backup. They would even suggest that there was no need to take a database backup dump because they could just do a snapshot.

Initially this was easy enough to refute. We just needed to run some transactions and show them that the restore of the VM didn’t include the data that was in flight. By itself a snapshot of a VM was not going to provide you with a transactionally consistent database. Once everyone could see that their data was not intact the idea of using VM snapshots as a database backup method went away.

With the introduction of VSS running at the O/S level the topic of VM snapshots came up again. For a while the official stance from Microsoft was that VM snapshots, even with VSS running, were not a supported technology for a data recovery plan. This was the official statement from Microsoft:

Virtualization Snapshots for Hyper-V or for any virtualization vendor are not supported to use with SQL Server in a virtual machine. It is possible that you may not encounter any problems when using snapshots and SQL Server, but Microsoft will not provide technical support to SQL Server customers for a virtual machine that was restored from a snapshot.

However, this is the updated statement found in this KB article:

SQL Server supports virtualization-aware backup solutions that use VSS (volume snapshots). For example, SQL Server supports Hyper-V backup.

Virtual machine snapshots that do not use VSS volume snapshots are not supported by SQL Server. Any snapshot technology that does a behind-the-scenes save of a VM’s point-in-time memory, disk, and device state without interacting with applications on the guest using VSS may leave SQL Server in an inconsistent state.

I’ve worked with products that use snapshots and VSS in order to manage SQL Server backups. To be honest, I’ve never been comfortable with the idea that all my data is really there. With hundreds of GB to search through how am I to know that everything is there anyway? I often wonder if these snapshots end up missing a few bytes here and there and no one notices.

I must be old school, but I’d take a database transaction log dump over a VM snapshot any day.

6 thoughts on “Differential Backups, Differential Terms”

  1. I’ve never been comfortable with the Snapshot approach. Fulls, Diffs, T-Log backups… let SQL Server do its thing! This was quite and interesting article. I’ve had the Differential vs. Incremental talk with our OPS team, it was crucial to everyone understanding expectations.

    Reply
  2. Perhaps the “Dump” mentality came from the UNIX world. Remember the dd command? We used to nickname it data dump, as it would essentially copy files no matter the circumstances, and consultants would equate that to a raw dump of data. FWIW.

    Reply

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.