Data Obfuscation

Do you have data? OK, I know that you do, so here’s a better question for you: is it sensitive data? And if it was, how would you know?

Chances are as a DBA you really don’t know if the data is sensitive or not. I mean, if I see a piece of data in the format of 123-45-6789 I might think it was a Social Security number, but I would be guessing. And what if it was simply stored in a different format? Again, how do I, as a DBA, know if the data is sensitive or not?

Now, how many times have you taken a copy of a production database and refreshed a development, test, or QA database with that production data? Many times, I am sure. And I am also certain that you have locked down your production environment so that only the people necessary to view the sensitive data have the ability to do so. However, I am not so certain that you have locked down all of your other environments in a similar manner, which means all of that sensitive data is now able to be viewed…and leaked…by many, many people.

So what can you do? Well, you can look to obfuscate your data in some manner. There are a lot of different tools and techniques available for you to use these days. You need to be aware of the limitations of each and how well they meet your specific requirement needs. That’s the easy part, really. It would be harder for you to explain the career of Judd Nelson than to identify data as sensitive and deploying some type of encryption. No, here is the part that is the most difficult:

Getting your users to sign off.

See, most of your end users want to see the actual data before they sign off on any changes about to be deployed. So while you think it would be great to mask the Social Security numbers in your non-production environment your end users are probably going to want to see the actual data before they agree to any changes.

And therein lies the real problem. How do you get your end users to understand that they don’t need to match every last detail? I don’t have an answer for you as it will depend on your industry, your particular shop, and how well you can sell the concept. The way I see it, two things will need to happen:

You need to build better code: If your code is so fragile as to break because of a few pieces of garbage (or obfuscated) data then you really should have some type of obfuscation or data generation tool in order to find and resolve those issues. If your systems have a history of being this fragile then it is likely also the reason why your end users insist on seeing the real data.

Your end users need to get over it: They will need to understand that they are not always going to be able to see the real names, addresses, and other bits of sensitive data. And they may even like being able to see that the system can handle junk data without breaking.

Look, there is always going to be risk in your environment. There is risk when you move production data outside your production environment. There is risk when your users sign off on a report that has obfuscated data. The trick is for how well your shop will manage the risk and trade off the areas in which the majority of the risk exists.

 

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.