Yesterday I showed you some real world SQL joins and their associated diagrams. I also left as an exercise for the reader this question regarding a standard three table join:
“explain why I could need 127 different diagrams”
I had a few people guess the answer on Twitter and G+ as well as some emails. So I decided to post the answer here today in case you wanted to understand it a bit more.
Let’s consider the seven regions that we have in the following diagram:
Now, consider the fact that no matter what type of join we want to execute (INNER, CROSS, LEFT, RIGHT, FULL OUTER) the end result is that we are either going to return data from one of those seven regions or not return data. In my diagram the green shaded regions represent areas for which we will return data and white is for regions where no data is returned.
In a nutshell: they are bits. On or off.
Most people still stumble at what needs to be done next, so I offer them a more familiar example. Consider you have a coin to flip, and also consider that after each flip we will either have a heads or a tails (yes, we will leave out the other possibilities such as landing on an edge, or disappearing into a wormhole, or having it swallowed by a marmot). If I flip that coin three times then this becomes the entire list of possible outcomes (H = result of flip is heads, T = results of flip is tails):
HHH
HHT
HTH
HTT
THH
THT
TTH
TTT
Eight possible outcomes, right? Most people see this example and say “yeah, I know that, but how does that help me with the above diagram?”
It is because each region in the diagram is just like a coin flip. Either we are getting data or not. Having seven regions is like having seven coin flips.
When I flip a coin three times I end up with 2^3 = 8 possible outcomes.
When I flip a coin seven times I end up with 2^7 = 128 possible outcomes. But one of those outcomes would be for when all regions are not selected, which is a fairly trivial example that we wouldn’t put on a diagram, so we are left with 127 total diagrams that we would need to draw.
That being said, many of the diagrams are repeats of each other. The join syntax for returning one table and the intersection of two other tables is the same; we just swap A with B with C and we can consolidate the need for a lot of the 127 diagrams.
But it was still far too many diagrams for me to think about wanting to draw and show the syntax for. Also: it isn’t practical, because the examples I gave yesterday are what we see in reality. Who wants to see textbook examples of joins anyway?
Just old math geeks like myself, I suppose.
How can I get only the area number 6 from your diagram using only sql joins?
You would filter to include the items which are in both 5 and 7, but exclude items that are in 1, 5, and 7.