Kicking of the Deep-dive-into-indexes series with a question that everyone is familiar with but probably a handful could reason fully about (**cough** myself included **cough**).
Why the HECK do we need indexes, a young padawan asks?
”Well”, that Senior on your team answers - “because they make stuff FASTER. That slow SELECT statement you have - indexes make it sonic speed!”.
But WHY, right? Why is DB not smart enough to be fast enough out of the box? Why do I need to EXPLICITLY make it fast? Where's all that AI that everyone flaunts about? How come in 50 years we still have to tell DB how to be fast?
Turns out there's a LOT to this story! And most of it originates from the era where Hard Drives actually had to do mechanical work in order to read data (i.e. spin the magnetic head, position the mechanical arm and extract a block of data; rinse&repeat until all is read).
If you recall past articles on the inner-design of Storage Engine, you’d know that EVERYTHING, and I mean EVERYTHING in your DB is stored on Pages:
Just like with book analogy - suppose that you wanted to search for a specific chapter. If you had no Table of Contents, your best bet would either be randomly sifting through pages, or sequentally turning until you find a proper one. And the bigger the book is, the more time it takes.
Well, Indexes are that missing Table of Contents. But you probably already knew that. And likely what you care about is how do they work, right? Well, that’s what the series of these articles will go in-depth of.
We’ll start with Binary Trees, then expand on B-Trees, then discuss WHY and HOW it’s all stored on Disk Drive and why are B-Trees superior, and then proceed to discuss Clustered Indexes, Heap Tables, Secondary Indexes, SARGable preicates, etc.
So, stay tuned!