MSSQL

Predictive Analysis

Quick question: Where would you go to find some basil in your local grocery store? Did you say “by the other herbs?” Or did you say “by the tomatoes?” If you are like me, then you would go looking for basil by the other herbs because, well, that is a natural, logical grouping of products. However, since many people tend to associate basil and tomatoes together, like for a pasta sauce, then some stores could locate their basil there instead. Many stores will do both, but some stores that are short on shelf space could decide one over the other.

So, I was “that guy”. Yeah, the one in a million that might actually need basil for a recipe that did not call for tomatoes. So, I went to the section where all the herbs are located and did not find any basil. I then had to report back to Congress that I had failed, that the mission was not accomplished, and we went ahead with dinner plans even though we knew that our meal was not going to be complete.

Why am I telling you this? Good question. Any other questions?

See, part of the BI seminar I took part in last week spent time on what was called “Predictive Analysis”. Part of this discussion involved the business of product placement in grocery stores, such as basil and tomatoes. When enough people buy things together, stores will start to associate products and group them in displays. Hey, it makes sense, and I doubt the idea of basil and tomatoes being located together is anything new that required a 1.7TB cube to be processed in Excel for some grocery executive. But what is interesting is the idea that you can examine, what they call “data mining”, your data to look for trends so that you can predict the likelihood of future events.

Which is where I call bullshit.

See, I have worked off and on in the financial world for over fifteen years now. And there is one thing you learn right at the start, and that is this:

Past performance does not guarantee future results.

In other words, you cannot predict the future. Yet the title of this part of the session would lead to you to believe otherwise. There are thousands of business leaders out there right now, many of them in the financial world, that think if they have enough data they can predict the future. And that is just not true. Take my case for example, I just wanted basil. Sure, perhaps you could argue that I fell outside of your 95% confidence interval, and that outliers will always exist, and that your models work a majority of the time. But the end result is the same, you cannot predict events no matter if you have a crapload of data at your disposal or not.

When I was participating in Six Sigma training last year we spent considerable time on a section called, quite simply, statistical analysis. As near as I can tell, there is little difference between regular statistical analysis and predictive analysis except for (1) the names and (2) the idea that someone wants you to believe that you can predict the future. In fairness, I can provide you two examples of where I see what could be a reasonable case made for the use of predictive analysis. One is Amazon, and the other is Netflix. Both services do a great job in trying to sell you additional products based upon either your recent selections or the selections of other people that made similar choices. If you want to call this predictive analysis, fine. I call it suggestive selling, based upon statistical analysis, but perhaps that is because I understand you cannot predict the future.

Need another example? Well, check out my library bookshelf. I started placing reference books onto my site, and yes they link back to Amazon. If someone stops by and orders a book I get roughly a dollar for providing the link. Not big money, and no I am not looking to get rich, I just wanted to buy a cup of coffee every now and then. So, in less than a week, no books have been sold, which is to be expected. But what was not expected was the fact that someone decided to browse Amazon a little bit and ended up buying this.

Yeah. And I want to see the analysis that predicted that correlation between that and a book on SQL 2008.

Want to go one step further? How about the fact that most companies compare results to previous quarters and years? How ridiculous is it that you would compare your Q1 results to the previous year, without any other information? Or to compare the results from one region to another? You simply cannot make important decisions on such information without a consideration of all the external factors.

There are three kinds of lies: lies, damned lies, and statistics. -Benjamin Disraeli

Discussion

8 comments for “Predictive Analysis”

  1. People who use data mining in retail product placement piss me off. It’s a good idea online, because when you search for “basil”, you’ll still find the damn basil. But when they put it beside the tomatoes, even if you’re looking for tomatoes and basil, you’re liable to walk right by it.

    What’s next – beano in the chili aisle? Migrane medication with the baby food? Ex-Lax with the bacon?

    Posted by Aaron Alton | May 5, 2009, 10:20 am
  2. i like the example of the correlation made between diaper sales and beer sales, spiking on friday nights as new fathers are told to stop and pick up a few things on their way home from work.

    Posted by SQLBatman | May 5, 2009, 10:31 am
  3. Actually, for the record, I said “Produce” for the basil question. It’s “sooooo” the librarian in me to categorize, I realize that. That said so many people I’ve found use software and devices not in the way they were originally intended. Predictive Analysis? Seems next to impossible to me. There are always variables that one cannot possibly think of. Now, time for lunch. I smell some bacon cooking :-)

    Posted by wnylibrarian | May 5, 2009, 11:33 am
  4. perhaps a bacon and basil sandwich?

    Posted by SQLBatman | May 5, 2009, 11:37 am
  5. I wonder why they don’t put the Tylenol next to the condoms, because every time I want to – wait – never mind.

    Posted by Brent Ozar | May 5, 2009, 12:47 pm
  6. It seems that you might have missed the point of predictive analytics. The idea isn’t to predict the future, but to give you an idea of what to expect. It’s not meant to be taken with absolute authority. That’s why we have things like confidence intervals and that’s why you test and re-test your mining models to see if they are producing results in the range that you deem accurate.

    Predictive analysis is based on taking the data that you have, and trying to extrapolate what that might mean for you in the future using that data. Statistical analysis can be used in predictive analysis and should be. You have to understand what has already happened to understand what might happen the future. Amazon is taking into account your past searches, wish list items, past purchases, etc, and trying to predict what you might want to buy next and putting in front of you. They do this by analyzing all the data they’ve gathered on you in the past and trying to predict what you might want now (as a future sale for them).

    I’m not sure I 100% understand the point of your post, so I could be entirely off base, you might be thinking about it too much.

    Posted by Brett Flippin | May 6, 2009, 8:22 am
  7. Brett,

    The point of my post was to hope that someone like yourself would read it and help me to understand a little more about the use of statisical analysis. drop me an email as i would enjoy discussing this with you further.

    when you said “The idea isn’t to predict the future, but to give you an idea of what to expect.”, how is that different than predicting the future? If I said that we could expect there to be a tornado in a trailer park next week, isn’t that also predicting the future to some degree?

    I understand the idea behind CI’s, and if there is one point i would like to make it is that you are not going to ever have a perfect system, a perfect model, and you should not be trusting people to make business decisions that end up putting all of their eggs in one basket as a result of predicitve analysis.

    Posted by SQLBatman | May 6, 2009, 10:21 am
  8. Your comment cleared quite a bit up in my understanding. For me, predicting the future is saying that with 100% certainty something is going to occur. Predictive analysis is the act of finding your “best guess” or what you expect to happen, based on the analysis of data you’ve already gathered. You aren’t predicting with 100% certainty unless your data is so good and complete that you can. The likelihood of which is virtually impossible.

    I guess that’s where I differ because I wouldn’t call predictive analysis predicting the future, it’s more of this is what I think is going to happen in the future with a reasonable margin of error.

    I agree absolutely that people that wholly trust their mining models are missing the point of predictive analysis entirely. Analysis can give you insight, but only an intelligent person with a good mind for data can make correct decisions given all the data and all the analysis in the world.

    Posted by Brett Flippin | May 6, 2009, 2:15 pm

Post a comment