Linked servers are driving me crazy.
Well, not them exactly. Rather, how they are used, or misused, by people. I am not certain why, but people tend to think that you can run a query that pulls from a linked server and the results should be similar as if the data was local to the box. Of course that is not true, but how am I to get inside people’s heads and make them understand that the data actually sits hundreds of miles away?
As if that was not enough, Microsoft has made our lives even more interesting by changing the behavior of the optimizer depending on how the queries are constructed. On the one hand, you can construct a query that will perform the bulk of the work remotely and return only the desired result set. On the other hand, the same query can be altered in a way such that one row at a time is brought back across the link and then processed, making your execution time slower.
People are always amazed at this little nuance. But we live in a distributed world, and our data is not always going to be local. So what options do we have? Sure we could copy the data locally, perhaps have a database in two locations, but there are other costs associated with such a design.
My feeling is that our world, the distributed world, is growing at a faster pace than the technology we are currently using. And the first person or company that is able to break through this barrier and figure out a way to move data efficiently no matter how crazy the code is behind the scene, well, they are going to make a ton of money.
I have no idea if anyone sees this as an issue right now or not. For me, it is an annoyance, but I do not see any evidence that it is high on any one’s to-do list right now, not at Microsoft at least. They are certainly focusing on better analysis tools, but from what I can tell the data they are targeting is always local. Maybe if they ran their demos across some linked servers and saw a dip in performance they would look to design things better.
Then again, for all I know, this is as good as it gets.