Still don’t think you have a big data problem?
I’m here to tell you that you’re doing it wrong.
Everyone has a big data problem, they just don’t know it yet.
Two weeks ago at the PASS Business Analytics Conference I heard Steven Levitt talk about how there is a severe shortage of people in the world that can properly analyze data. Considering the success that Levitt has had (despite being an economist that doesn’t understand math or business) his words underscored for me the misconception that “big data” has to be big or that your issues with big data can be solved by throwing hardware at the problem.
First things first: big data doesn’t have to be big. It is just a bad name for a complex set of problems.
Let’s look at the most common definition of big data: the four V’s. The words used to describe big data often are velocity, volume, variety, and veracity.
That means I can receive one file, once a day, that is less than 1GB in size, filled with accurate details and it would qualify for a big data scenario by satisfying the four V’s. Here’s where the problems start for most companies: they have few people available to analyze such data properly and arrive at sound conclusions.
That’s a big data problem. It is also likely to be your problem, too.
No, it’s not a hardware problem. You don’t need better tools or servers. You need better people.
Somewhere along the line these past two decades we’ve lost the ability to educate people with basic analytical skills. The idea of causation versus correlation is as foreign of a concept for most as if we were talking about walking on the Moon. Today we have a plethora of self-service BI tools and a dearth of people who know how to use them properly.
Dr. Levitt was right to point this out to everyone during his keynote. He repeated it several times including during a private reception held afterwards. It has been ringing in my ears ever since.
If you want to get ahead in your career as a data professional here’s what you can do starting today.
1. Sign up with Coursera
Coursera offers a lot of content on analytics for free. I’m in a course right now called “Computational Methods for Data Analysis“. It’s been great to reinforce a lot of the concepts I picked up in graduate school many years ago. It has also made me understand that my passion for numbers and data has never gone away. You should also look for courses online that help you understand more about your business and industry. That’s one of the key ingredients revealed in the Phoenix Project, how once they all understood more about the business as a whole it helped them be more efficient in their roles.
2. Start using the self-service BI tools such as Data Explorer
There are many wonderful tools available for you to use. Data Explorer is one of the shiny new ones, as it makes it real easy for you to locate and import disparate data sets. Start using these tools now so they become familiar to you over time.
3. Start asking questions
During the private reception with Dr. Levitt I asked him the following question: “How do you know what questions to ask?” His answer was wonderful “That’s hard to do” is what he told us. He then went on to explain that you need to have a good conversation to understand what problems are trying to be solved. Often that will help lead you to the right questions. From there you can start collecting your data and then analyze.
Sure, it sounds easy. In reality it is not an easy task. Very few people can do this effectively.
Too many companies think the answer lies in crunching 8 petabytes of data daily in order to have it spit out an answer you should blindly follow. Too few companies are asking the right questions or collecting the right data to begin with, making all that crunching worthless.
You do have a big data problem. Fortunately you know about it now before it is too late and you are left too far behind.
I don’t have a big data problem. I have opportunities to learn from data.
When I hear “Big Data”, I need to start thinking something different. Currently I think data storage and retrieval tools.
I had dinner one night at PASSBAC with a couple of BAs. When they hear big data, they think of Data Explorer and a new world of data being opened up. They also think of unstructured data. But mostly “small” datasets that I can easily handle with existing tools.
If big data means data outside my current ecosystem now being exposed to my organization, then I agree, there are opportunities.
If it’s magic that’s going to fix all IT problems, then it’s just today’s buzzword that has c-levels distracted like too many other things in the past two decades.
I think the question is what do c-levels think about when they hear Big Data: tools or data outside the ecosystem?
Someone needs to do a survey and put the results in an infographic.
Thanks for pointing out Coursera. Good thing to keep an eye on.
Scott,
Thanks for reading, and thanks for the comments!
I’d be interested in such a survey as well.
Tom
Just signed up for Introduction to Data Science. Good stuff.
Awesome! Glad I could help point you in the right direction.
Tom