Why I’m Learning Data Science

To be fair, it is more a case of me re-learning data science, as the concepts are familiar already. With an MS in Mathematics, I have dabbled in statistics for more than 30 years now. So when Microsoft announced they were partnering with edX to offer a certificate in Data Science I decided it was the perfect time to dust off my Z-tables and get back to my roots. Launched just over a year ago, the online course allows for anyone to take classes for free. There is an option to pay individual course fees of $49 and $99 to earn a verified certificate. With ten courses in total, your final costs are about $540 USD. That’s more than a fair price for this content.

The quality and quantity of the online content were a perfect mix for me. Besides the math and statistics you would expect, there was also PowerBI, Excel, R, Python, as well as SQL Server. Here’s a partial list of topics:

  • Query relational data using T-SQL
  • Analyze and visualize data using PowerBI (or Excel, if preferred)
  • Understanding statistics
  • Exploring data with code (using R or Python)
  • Understanding core data science concepts
  • Principles of machine learning
  • Using code to manipulate and model data (again, using R or Python)
  • Applied machine learning

 

For a data geek like myself, this was heaven. You can see a full list of the courses here: https://academy.microsoft.com/en-us/tracks/data-science

One of the course highlights for me was finding Wayne Winston as an instructor. Imagine being able to learn statistics from Wayne Winston! It costs thousands of dollars to attend Indiana University, where he is a professor. His books cost money, too. But in this edX course THIS KNOWLEDGE CAN BE YOURS FOR FREE.

I started the courses towards the end of 2016, but in February I made it a priority to get them all done. I finished all but the final project by mid-April. The final happens every three months, and I missed the April deadline. I had to wait until July to try again. Last week, as I returned from the beach family vacation, I pushed aside everything on my schedule to work on the final project. When I woke last Friday, this was waiting for me:

Why I'm Learning Data Science

Honestly, this certificate means more to me than my SQL Server MCM. However, I liked the structure of this course so much that I wish Microsoft would construct something similar for SQL Server and do a reboot of the MCM program. (If anyone from Microsoft Learning is reading this, email me, I’d love to help.)

So, besides this coursework being a way for me to turn back the clock, why would I want to spend the time to learn data science? Let’s break it down.

The traditional role of the DBA is being automated away, right in front of our keyboards. It’s easy to throw hardware at a query, or a database, and make things run faster. Platforms such as CosmosDB are the beginning of the end for DBAs as the machines are close to automating away your job as a DBA.

It won’t be long before fiscal-minded people will use cloud platforms as a gauge for DBA salaries. When companies understand that the systems can tune themselves it’s going to be harder to earn a dollar tuning queries. Sure, there might be a need for a sysadmin to help configure that system, but the number of DBAs is a shrinking pool, not growing.

The reason for this trend is easy to comprehend when you understand that computers are only good at providing answers. It is up to us to ask the questions. Humans are better at understanding if the answers make sense. There is a dearth of people in this world that can analyze data well. Data science and analytics is a growth field. Data administration is not. Hitch your career path to something on the rise, not to something that can be replaced by a handful of PowerShell scripts.

We know that the world of tech moves quickly, and just gets faster. In the world of data science, there are new tools and integrations introduced weekly. The acceleration of new tools to the market is a good thing. As new tools come on the market it becomes easier for everyone to have access to insights that data will bring. As an example, after my project was complete Buck Woody (blog | @buckwoodymsft) told me about XGBoost. I didn’t know it existed, and now I can’t wait to see if that will help make my predictive model even better.

Getting tools to the masses so that everyone can work with data has ancillary benefits as well. If everyone practiced or learned data science, we would be building systems that treated data like a first class citizen. Right now, data gets overlooked as a critical asset. But data lasts longer than code. It’s about time we treat data as the most critical asset your company owns. Because most companies don’t until it is too late.

The volume and velocity of data available today can make the simplest data science project difficult. The result is that data cleansing is 93.7% of any data science project. Nobody goes to school to become a data janitor. If you decide to dive into data science you must understand how data gets cleaned. As a DBA for many years the idea of replacing missing data with zeros seemed…well…flat out wrong. Until I saw my Root Mean Square Error drop down into the .23 range, and then those zeros didn’t matter at all to me.

Speaking of that, I loved trying to tune my model and improve upon my final score. Data science isn’t just sexy, it’s addictive. I spent far too much time trying to work my way to the top of the leaderboard even after my grade was complete. Part of that was my general competitive nature, but it was also part of the learning process. Another part of the process is this:

Xmas in July! Here’s the Azure ML yule log.

Also part of the final course was that we had to submit a report. We then had to do a peer review of other student reports. Reading those reports I found myself thinking about what I could have done differently to approach a solution. Reading the reviews of my own report I also got a sense of collaboration about the nature of the project. For me, data science invokes an almost philosophical discussion of the data and the problem we are trying to solve. The focus is more on the process, what the data was telling us, and the outcome. And it all appeared to be far less combative than what you would find in any forum on query tuning.

Getting back on the data science train is a smart move for anyone these days. I’m not saying you need to quit your job. What I *am* saying is that you should look to augment your current job with some data science skills. Microsoft has shifted their data platform offerings for a reason.

And as any good Microsoft MVP knows, it’s not a bad thing to keep pace with trends as Microsoft shifts.

For me, that area is data science.

18 thoughts on “Why I’m Learning Data Science”

  1. Tom, I think that would be SUPER cool if you could team up with some other Data Science folks and help formulate a curriculum for doing this in Microsoft software. (I guess other folks can do it for other vendors.) If nothing else, it would point out the things a person would need to learn to do this kind of thing well! Probably send a tweet to @BecomingDataSci for part of it maybe.

    Reply
    • Thanks for the comment, and for the vote of confidence, much appreciated! I’ve helped build a data science curriculum for a college already, would be fun to do something with Microsoft and edX.

      Reply
  2. The price has been bumped to $99 for most courses for a couple of months now. If you were already in the program, you got a discount code so you only needed to pay the $49. However, for new people, the entire program will be around $1000. Personally I’m sad that it costs this much to educate yourself. However, you can do the entire program for free and don’t pursue the certificate.

    I probably going to do the final project in the October run.

    Reply
  3. Hi Tom, congratulations on getting the certificate. Did you have to do all your courses again second time around or just the ones which you did not complete

    Reply
    • Sorry for the trouble with the blog these past few days. The post was picked up by Google RSS feeds and I had 120k pageviews in 24 hours. My current host wasn’t able to handle the traffic. Everything should be better for now and I’m going to take steps to help prevent these issues from happening again. If you are still unhappy just drop me an email and I’ll refund your money.

      Reply

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.