Predictive Analysis: I Bet You Didn’t Know I Would Blog This Today

It's all bullshit.Quick question: Where would you go to find some basil in your local grocery store? Did you say “by the other herbs?” Or did you say “by the tomatoes?” If you are like me, then you would go looking for basil by the other herbs because, well, that is a natural, logical grouping of products. However, since many people tend to associate basil and tomatoes together, like for a pasta sauce, then some stores may locate basil there instead. Many stores will do both, but some stores that are short on shelf space could decide one over the other.

So, I was “that guy”. Yeah, the one in a million that might actually need basil for a recipe that did not call for tomatoes. So, I went to the section where all the herbs are located and did not find any basil. I then had to report back to Congress that I had failed, that the mission was not accomplished, and we went ahead with dinner plans even though we knew that our meal was not going to be complete.

Why am I telling you this? Good question. Any other questions?

You will hear parts of business intelligence (or business analytics) that are called “Predictive Analysis”. An example of this is the business of product placement in grocery stores, such as basil and tomatoes. When enough people buy things together, stores will start to associate products and group them in displays. Hey, it makes sense, and I doubt the idea of basil and tomatoes being located together is anything new that required a 1.7TB cube to be processed in Excel for some grocery executive. But what is interesting is the idea that you can examine (what they call “data mining”) your data to look for trends so that you can predict the likelihood of future events.

Which is where I call bullshit.

See, I have worked off and on in the financial world for over fifteen years now. And there is one thing you learn right at the start, and that is this:

Past performance does not guarantee future results.

In other words, you cannot predict the future. Yet the title “Predictive Analysis” would lead to you to believe otherwise. There are thousands of business leaders out there right now, many of them in the financial world, that think if they have enough data they can predict the future. And that is just not true. Take my case for example, I just wanted basil. Sure, perhaps you could argue that I fell outside of your 95% confidence interval, and that outliers will always exist, and that your models work a majority of the time. But the end result is the same, you cannot predict events no matter if you have a crapload of data at your disposal or not. [I think some folks like to just stop and call that “free will”.]

When I was participating in Six Sigma training we spent considerable time on a section called, quite simply, statistical analysis. As near as I can tell, there is little difference between regular statistical analysis and predictive analysis except for (1) the names and (2) the idea that someone wants you to believe that you can predict the future. In fairness, I can provide you two examples of where I see what could be a reasonable case made for the use of predictive analysis. One is Amazon, and the other is Netflix. Both services do a great job in trying to sell you additional products based upon either your recent selections or the selections of other people that made similar choices. If you want to call this predictive analysis, fine. I call it suggestive selling, based upon statistical analysis, but perhaps that is because I understand you cannot predict the future.

Need another example? Well, check out my library bookshelf. I started placing reference books onto my site, and yes they link back to Amazon. If someone stops by and orders a book I get roughly a dollar for providing the link. Not big money, and no I am not looking to get rich, I just wanted to buy a cup of coffee every now and then. So, in less than a week, no books have been sold, which is to be expected. But what was not expected was the fact that someone decided to browse Amazon a little bit and ended up buying this.

Yeah. And I want to see the analysis that predicted that correlation between that and a book on SQL 2008.

Want to go one step further? How about the fact that most companies compare results to previous quarters and years? How ridiculous is it that you would compare your Q1 results to the previous year, without any other information? Or to compare the results from one region to another? You simply cannot make important decisions on such information without a consideration of all the external factors.

There are three kinds of lies: lies, damned lies, and statistics.Benjamin Disraeli

13 thoughts on “Predictive Analysis: I Bet You Didn’t Know I Would Blog This Today”

  1. People who use data mining in retail product placement piss me off. It’s a good idea online, because when you search for “basil”, you’ll still find the damn basil. But when they put it beside the tomatoes, even if you’re looking for tomatoes and basil, you’re liable to walk right by it.

    What’s next – beano in the chili aisle? Migrane medication with the baby food? Ex-Lax with the bacon?

    Reply
    • i like the example of the correlation made between diaper sales and beer sales, spiking on friday nights as new fathers are told to stop and pick up a few things on their way home from work.

      Reply
  2. Brett,

    The point of my post was to hope that someone like yourself would read it and help me to understand a little more about the use of statisical analysis. drop me an email as i would enjoy discussing this with you further.

    when you said “The idea isn’t to predict the future, but to give you an idea of what to expect.”, how is that different than predicting the future? If I said that we could expect there to be a tornado in a trailer park next week, isn’t that also predicting the future to some degree?

    I understand the idea behind CI’s, and if there is one point i would like to make it is that you are not going to ever have a perfect system, a perfect model, and you should not be trusting people to make business decisions that end up putting all of their eggs in one basket as a result of predicitve analysis.

    Reply
  3. Okay, you made me bite. As a data analyst at a predictive analytics company, I’ve got to call you on this one.

    The first thing to note is that generally speaking predictive analytics isn’t about predicting individual outcomes, it is about predicting distributions of collections of outcomes.

    This might sound like a meaningless distinction, but it isn’t.

    In predictive analytics we use groups of predictions, and the ultimate prediction is that a group of predictions will, in aggregate, match the predicted values for that group.

    Let’s use a simple numeric example on something like stocks. I hate using stocks as an example, but its an easy thing for people to understand. (Progressive Insurance BTW was/is a leader in predictive analytics in the industry, which is where a lot of predictive analysis got its start)

    Now, no one will claim that they can predict the value of an individual stock 1 day, 1 week, 1 month, or 1 year into the future. Heck, we can’t even predict the value of a group of stocks into the future, but here is what we can do, we can rank order groups of stocks with a very high level of precision.

    What this means is that we can build a model that takes all of the stocks in an index, for example the NYSE, a splits that index of roughly 2,700 stocks into 5 or 10 or 20 groups, and predicts the relative performance of each group over a given period of time.

    Let say that there are 2,700 stocks on the NYSE and we split them up into 10 buckets. And what we want to do is allocate the stocks into each of the buckets such that all of the stocks that we predict to be the worst performing go into bucket 1 and all of the best performing will go into bucket 10, and the rest get distributed accordingly.

    Now, what we’ll do is predict the relative performance of each bucket to the others. In other words, we’ll predict that the average performance (percent price change) of the stocks in bucket 1 will be 10% worse than the stocks in bucket 2, and 2 will be 4% worse than 3 and 3 will be 6% worse than 4, etc. and that 9 will be 5% better than 8 and 10 will be 10% better than 9, or something like that.

    Now, within each of the buckets individual stocks will be all over the place, some in bucket 1 will actually perform very well and some in bucket 10 will do very poorly.

    But what we are “predicting” is NOT the performance of each individual stock, what we are predicting is the RELATIVE AVERAGE performance of each BUCKET.

    Now, in order for “predictive analytics” to provide value, in order for the models to be “correct”, they don’t have to predict the actual values of each individual stock, they just have to be correct in “rank ordering” the buckets.

    In other words, if I can tell you, “If you buy the 250 stocks that fall into bucket #10, that bucket of stocks will perform 20% better than average over some defined period of time, and 40% better than the worst bucket of stocks that we allocated,” that’s valuable! (In fact, this is basically what hedge funds do, it’s also what insurance carriers do, etc.)

    Not only is that valuable, but its a prediction that can be made with great accuracy.

    But now you are saying, “Well wait, aren’t these models the things that crashed our economy in 2008!” Why YES, they did play a major role, but they weren’t the only thing. What really crashed the economy was leverage, and the models failed precisely because they weren’t using enough (or even the right) information. The crash was very predictable, the people who built (and used) the models were just idiots (no really…)

    And as for simple statistical analysis vs predictive analytics, there is a difference, because predictive analytics using modeling based on time sequenced data with “future targets”.

    Traditional statistical analysis simply finds correlations in old data, predictive analytics finds correlations between past and future data. Of course, in realty all of the data is in the past, it has to be, but what we are talking about is finding correlations between things that were known at one point and time and things that became known at a later point in time.

    So, for example, if you want to build a predictive model for stock performance, first you decide on the time span of the prediction, in other words, do you want to predict performance 1 day, 1 week, 1 month, or 1 year into the future, etc? Then, you sequence the data. Let’s say you decide this is going to be a 1 month prediction. You sequence the data such that you are going to be finding correlations between inputs that are known 1 month prior to the outcomes in your data, so in your data you are going to have information from say 1/1/2010 and stock prices as of 2/1/2010. You are then going to be looking at correlations between the information known as of 1/1/2010 and the outcomes as of 2/1/2010, and this “information” is not something simple, you can start with tens of thousands of variables when starting to build a model, including many derived values based on observed data, etc.

    Now it is true, if the future is significantly different from the past, then the predictions are going to degrade, but this has more to do with “lack of data” than anything else. The problem is really that models are built on relatively narrow sets of data. Do, for example a stock market model may be built on prior stock performance data, plus a bunch of benchmarks for things like the price of gold, commodity prices, unemployment rate, etc., etc., but it may not be built on weather data at all, so unusual weather events could throw things off, but again, this is really due to a lack of data.

    I agree that, in reality, right now, there is always a good chance that the future will not be like the past, but I also believe that, “given enough data”, you could theoretically predict the future, at least the broad strokes of it…

    Reply
    • RG,

      I find your comment fascinating and would like to subscribe to your newsletter.

      I also think you and I agree more than you may think. You are saying that predictive analysis is about relatives, not absolutes, and that ultimately you can’t predict the future for any one specific item but you *may* be able to guess accurately about a group of items.

      I can buy that. Otherwise they wouldn’t put the basil by the tomatoes.

      Thanks for the feedback. I find myself being more drawn into data analysis these days (and enjoying it).

      Tom

      Reply
  4. I didn’t even get half-way through this post because it was like hearing my own voice; I didn’t need predictive analytics to know where it was going.

    As soon as someone says the word ‘predict’, I feel humored- as though I’m expecting to be entertained by a psychic or other medium. When I hear someone say the word ‘predict’ in a more serious tone, I suddenly think, ‘let’s go to Vegas and test this out… if you’re serious, that is’.

    Vegas is probably the best proving ground for any kind of predictive analysis. The odds DO work out, just as published by the casinos. Of course, those odds occur over a time frame and do not exist in a vacuum. Casinos understand that they will lose and win money in the short run, but over time the odds will trend toward expected values- all they have to do is keep enough cash on hand to ensure they have the time for the predictions to pay off.

    And there’s the rub in predictive analytics. Sure, you can ballpark the possibility of an idea/correlation/etc., but does your business have the resources and luck for it to pan out as predicted? Beta was a good idea, Dewey made the headlines for beating Truman in the Presidential election, but no one ever put a better phrase to sure plans than Mike Tyson who said, “Everyone has a plan ’till they get punched in the mouth.”

    Big Data looks like snake oil, smells like snake oil and tastes like snake oil. YMMV.

    Reply
  5. I have to agree with your article. Once you combine stats + money and you cannot guarantee me positive returns, then it is crap. The whole predictive analytics business is a big lie. Go with the facts…
    -T

    Reply

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.