How HDDs work (so that we can understand B-Trees better!)

Chronicles of DB Indexes - Part 4

Sep 26, 2022

Kicking off the new bitesized week with how HDDs work. And we need this knowledge because B-Trees were architected to work specifically with HDDs! So, understanding B-Trees requires understanding WHY were they made the way they were :)

HDDs (and frankly not even SSDs) can't access single 1s and 0s. Instead, they work by reading a single chunk of data at the time (which is usually 4KB of size) and this chunk of data is called "a block". Because what it really is is a block of 1s and 0s packed together.

The way that this all works is by having a magnetic plate that spins at ultra high speed. And on this plate you have your 1s and 0s stored on. It's really SIMILAR (although quite different lol) to gramophone. You have a spinning disk (magnetic instead of plastic one) and an arm that hovers over part of your disk. And that arm can load 4000 bits (ones and zeros) from a single position, hence the size of block is 4KB.

As you can imagine, reading 10 consecutive blocks is blazing fast - you just keep spinning in the same direction and read as you go.

BUT! And this is tricky - if your data ends up being fragmented (some blocks on NORTHERN side, some blocks on SOUTHERN side) - then your disk has to spin a lot to collect all the fragments. And that sucks!

So to better understand all of this, I want you to think of it as a CRANE. The crane is your mechanical arm. And disk blocks are containers with data. And that crane can't pick up a single object (due to sheer size of it's hook) but instead has to pick up a whole container at once.

So if you aim at designing anything efficient - you want to pack as much of the data to as much of adjacent containers as possible. And surprise surprise, as you will see in later articles - that's exactly what B-Trees do :)

Stay tuned as there's more to come in the next days!

Bitesized Engineering

Discussion about this post