Why Correct Data Entry Is Important

The question this week has to do with your data entry validation. For most ETL processes there is a lot of effort put into scrubbing of data for various reasons and benefits. So, how and what do you use to scrub your data?

And what about the data that, while technically “fits” into a particular field, is still flat out wrong? How do you verify that it is correct? Consider this fine example:

rr_crossing

4 thoughts on “Why Correct Data Entry Is Important”

  1. Personally, I always try to cover this during my requirements gathering follow up meetings, after we’ve identified the data we’re going to be moving.

    This should allow the business users to give you some kind of rules that you can implement in the ETL process to either bin or correct bad data. I’m not a big fan of correcting it though as separating out bad data and notifying a business user that it needs to be corrected is a good way to get your users to start being more responsible for the data they input.

    Another rather novel approach if your users are less than knowledgeable about what good and bad data can look like is to create a clustering mining model with the record set you’ll be moving in the ETL process. This could, in theory, identify data that “doesn’t fit” in your record set based on the data you’ve already collected.

    Of course, the best thing to do is design the data collection front end with good enough data validation that none of these measures would be necessary, but that is unfortunately not an option most of the time especially when dealing with vendor software.

    Reply
    • Thanks Brett, you raise some very good points to consider, including the experience level of the end user.

      Reply
  2. My last company simply hired an IT guy to stay up late in case any bad data crashed the overnight processing.

    Reply

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.