In the past year, you may have heard me mention my certificates from the Microsoft Professional Program. One certificate was in Data Science, the other in Big Data. I’m currently working on a third certificate, this one in Artificial Intelligence.
You might be wondering why a database guy would be spending so much time on data science, analytics, and AI. Well, I’ll tell you.
The future isn’t in databases, but in the data.
Let me explain why.
Databases Are Cheap and Plentiful
Take a look at the latest DB-Engines rankings. You will find there are 343 distinct database systems listed, 138 of those are relational databases. And I’m not sure it is a complete list, either. But it should help make my point: you have no idea which one of 343 database systems is the right one. It could be none of them. It could be all of them.
Sure, you can narrow the list of options by looking at categories. You may know you want a relational, or a key-value pair, or even a graph database. Each category will have multiple options, and it will be up to you to decide which one is the right one.
Decisions are made to go with whatever is easiest. And “easiest” doesn’t always mean “best.” It just means you’ve made a decision allowing the project to move forward.
Here’s the fact I want you to understand: Data doesn’t care where or how it is stored. Neither do the people curating the data. Nobody ever stops and says “wait, I can’t use that, it’s stored in JSON.” If they want (or need) the data, they will take it, no matter what format it is stored in to start.
And the people curating the data don’t care about endless debates on MAXDOP and NUMA and page splits. They just want their processing to work.
And then there is this #hardtruth – It’s often easier to throw hardware at a problem than to talk to the DBA.
Technology Trends Over the Past Ten Years
Here’s a handful of technology trends over the past ten years. These trends are the main technology drivers for the rise of data analytics during this timeframe.
Business Intelligence software – The ability to analyze and report on data has become easier with each passing year. The Undisputed King of all business analytics, Excel, is still going strong. Tableau shows no signs of slowing down. PowerBI has burst onto the scene in just the past few years. Data analytics is embedded into just about everything. You can even run R and Python through SQL Server.
Real-time analytics – Software such as Hadoop, Spark, and Kafka allow for real-time analytic processing. This has allowed companies to gather quality insights into data at a faster rate than ever before. What used to take weeks or months can now be done in minutes.
Data-driven decisions – Companies can use real-time analytics and enhanced BI reporting to build a data-driven culture. We can move away from “hey, I think I’m right, and I found data to prove me right” to a world of “hey, the data says we should make a change, so let’s make the change and not worry about who was right or wrong.” In other words, we can remove the human factor from decision making, and let the data help guide our decisions instead.
Cloud computing – It’s easy to leverage cloud providers such as Microsoft Azure and Amazon Web Services to allocate hardware resources for our data analytic needs. Data warehousing can be achieved on a global scale, with low latency and massive computing power. What once cost millions of dollars to implement can be done for a few hundred dollars and some PowerShell scripts.
Technology Trends Over the Next Ten Years
Now, let’s look at a handful of current trends. These trends will affect the data industry for the next ten years.
Predictive analytics – Artificial intelligence, machine learning, and deep learning are just starting to become mainstream. AWS is releasing DeepLens this year. Azure Machine Learning makes it easy to deploy predictive web services. Azure Machine Learning Workbench lets you build your own facial recognition program in just a few clicks. It’s never been easier to develop and deploy predictive analytic solutions.
DBA as a service – Every company making database software (Microsoft, AWS, Google, Oracle, etc.) is actively building automation for common DBA tasks. Performance tuning and monitoring, disaster recovery, high availability, low latency, auto-scaling based upon historical workloads, the lists go on. The current DBA role, where lonely people work in a basement rebuilding indexes, is ending, one page at a time.
Serverless functions – Serverless functions are also hip these days. Services such as IFTTT make it easy for a user to configure an automated response to whatever trigger they define. Azure Functions and AWS Lambda are where the hipster programmers hang out, building automated processes to help administrators do more with less.
More chatbots – We are starting to see a rise in the number of chatbots available. It won’t be long before you are having a conversation with a chatbot playing the role of a DBA. The only way you’ll know it is a chatbot and not a DBA is because it will be a pleasant conversation for a change. Chatbots are going to put a conversation on top of the automation of the systems underneath. As new people enter the workforce, interaction with chatbots will be seen as the norm.
Summary
There is a dearth of people able to analyze data today.
Data analytics is the biggest growth opportunity I see for the next ten years. The industry needs people to help collect, curate, and analyze data.
We also need people to build data visualizations. Something more than an unreadable pie chart. But I will save that rant for a different post.
We are always going to need an administrator to help keep the lights on. But as time goes on we will need fewer administrators. This is why I’m advocating a shift for data professionals to start learning more about data analytics.
Well, I’m not just advocating it, I’m doing it.
Good read! I recall from a post I read on your site once that you were working on a master in mathematics way back when when you went into IT? Well anyway, your heavy math background positions you well for the move from data management to data science. Those of us, like myself, who are weak in math and got into the data management gig from the business side (MIS), are in deep trouble. I’m working to gain skills in data analytics as well but realized that before I can understand a p value I need to understand a quadratic equation. I’m all the way back now to 8’th grade algebra which it seems I never really mastered. After that comes trig, and then calculus, and then basic statistics and probability, and then discrete math, and then maybe I might be ready to embark on data science. Dr. Codd imagined a world where the engineers built a relational DBMS with a logical data model that abstracted all of this math for the business analyst. Now it seems the opposite has happened and to make any use at all of all this data in the morass of 343 different DBMS you need to be master statistician, computer scientist, software developer, and data analyst all rolled into one. I’m betting he is rolling over in his grave right now. On the positive side, diving back into the math is a worthy endeavor and keeps the brain young 🙂
Ha! Yeah, I do have a math background. However, much of the math needed these days is not that complex. And you don’t have to do a lot of calculations, many programs do that for you. You definitely need to know *why* the math is returning the results, and how to interpret them. And that’s just a skill, same as any other skill, that you learn by doing.