All the blabbing that I did about Trees would make zero sense if I didn't mention WHY is it all like that and WHAT do we get out of it. And this is what I want to talk about today :)
Obviously, we all know that Indexes speed up search; or at least most of the time they do. I hope that by now we're all clear on HOW they are organized, how data is stored, etc. But one interesting topic is all the ways we can search the B+Trees.
First one is rather obvious - searching for a SPECIFIC record (e.g. SELECT ... WHERE id = 12345) is a typical. It'd take the height of tree (which usually doesn't exceed 2-3 levels btw!) to get to the appropriate record.
However, there's also a cool one called RANGE scan, which also happens to be blazing fast (e.g. SELECT ... WHERE id BETWEEN x AND y). Namely, given that B-Tree is a form of Binary Tree, and it follows some of the constraints BSTs have (i.e. all left children have to be LOWER than parent, and all RIGHT ones have to be HIGHER than parent), combined with a fact that data is ALWAYS found on leaf level AND that leaves are linked - you get a framework for doing blazing fast range searches.
Say you were looking for a DATE range - a date between X and Y. Well, all you need to do is to drive down to the first leaf that has X value (i.e. lowest one), and then simply traverse leaves (remember: they are linked!) until you reach a node that contains Y. It's basically O(logn + |total # of nodes that satisfy the condition|), which is amazing!
(one might argue that range scan in BST is fast as well - just do in-order scan, but one would be at wrong as well because nodes might be on different Pages, so you'd have to do A LOT of loading)
Finally, and this will become more obvious in next article where we discuss Clustered Tables, if you iterate over a key that is indexed - you don't have to sort! Your data is already sorted for you simply due to the nature of B-Trees (left & right children are lower/higher bla bla)!
Point being - carefully choosing WHICH columns to index might provide TREMENDOUS efficiency, all due to the nature of B+Trees. And I think that's just amazing :)
Next week we're going to discuss Heaps & Clustered Tables and Hash Indexes. But until then - stay tuned!
Mihailo, after reading a sequence of your articles on B Trees, HDD and Indexes, I found these topics to be very interesting, even though I do not cross paths with them at work. Surely its the way how you present it, very amusing and full of enthusiasm :)