Quick question: Where would you go to find some basil in your local grocery store? Did you say “by the other herbs?” Or did you say “by the tomatoes?” If you are like me, then you would go looking for basil by the other herbs because, well, that is a natural, logical grouping of products. However, since many people tend to associate basil and tomatoes together, like for a pasta sauce, then some stores could locate their basil there instead. Many stores will do both, but some stores that are short on shelf space could decide one over the other.
So, I was “that guy”. Yeah, the one in a million that might actually need basil for a recipe that did not call for tomatoes. So, I went to the section where all the herbs are located and did not find any basil. I then had to report back to Congress that I had failed, that the mission was not accomplished, and we went ahead with dinner plans even though we knew that our meal was not going to be complete.
Why am I telling you this? Good question. Any other questions?
See, part of the BI seminar I took part in last week spent time on what was called “Predictive Analysis”. Part of this discussion involved the business of product placement in grocery stores, such as basil and tomatoes. When enough people buy things together, stores will start to associate products and group them in displays. Hey, it makes sense, and I doubt the idea of basil and tomatoes being located together is anything new that required a 1.7TB cube to be processed in Excel for some grocery executive. But what is interesting is the idea that you can examine, what they call “data mining”, your data to look for trends so that you can predict the likelihood of future events.
Which is where I call bullshit.
See, I have worked off and on in the financial world for over fifteen years now. And there is one thing you learn right at the start, and that is this:
Past performance does not guarantee future results.
In other words, you cannot predict the future. Yet the title of this part of the session would lead to you to believe otherwise. There are thousands of business leaders out there right now, many of them in the financial world, that think if they have enough data they can predict the future. And that is just not true. Take my case for example, I just wanted basil. Sure, perhaps you could argue that I fell outside of your 95% confidence interval, and that outliers will always exist, and that your models work a majority of the time. But the end result is the same, you cannot predict events no matter if you have a crapload of data at your disposal or not.
When I was participating in Six Sigma training last year we spent considerable time on a section called, quite simply, statistical analysis. As near as I can tell, there is little difference between regular statistical analysis and predictive analysis except for (1) the names and (2) the idea that someone wants you to believe that you can predict the future. In fairness, I can provide you two examples of where I see what could be a reasonable case made for the use of predictive analysis. One is Amazon, and the other is Netflix. Both services do a great job in trying to sell you additional products based upon either your recent selections or the selections of other people that made similar choices. If you want to call this predictive analysis, fine. I call it suggestive selling, based upon statistical analysis, but perhaps that is because I understand you cannot predict the future.
Need another example? Well, check out my library bookshelf. I started placing reference books onto my site, and yes they link back to Amazon. If someone stops by and orders a book I get roughly a dollar for providing the link. Not big money, and no I am not looking to get rich, I just wanted to buy a cup of coffee every now and then. So, in less than a week, no books have been sold, which is to be expected. But what was not expected was the fact that someone decided to browse Amazon a little bit and ended up buying this.
Yeah. And I want to see the analysis that predicted that correlation between that and a book on SQL 2008.
Want to go one step further? How about the fact that most companies compare results to previous quarters and years? How ridiculous is it that you would compare your Q1 results to the previous year, without any other information? Or to compare the results from one region to another? You simply cannot make important decisions on such information without a consideration of all the external factors.
“There are three kinds of lies: lies, damned lies, and statistics.” -Benjamin Disraeli




