Do you know what Buffer Pool is?

Chronicles of SQL Server's Storage Internals - Part 3

Sep 06, 2022

Previous post discussed what Pages are and why they're the most fundamental thing when it comes to Databases.

However, one thing that was left unclear was - HOW are the pages Read & Written to HDD?

Here's the thing - Databases are one of those things that experience REALLY HUGE I/O! As in - massive one! As in - "holy crap is that even possible" kind of load.

Being as busy as they are, it goes without saying that they had to be architected for massive I/O throughput.

However, as every Engineering student learns (in one way or another) every operation requiring disk access is to be avoided (unless you're ok with latency). And that's because HDDs (and even SSDs) are simply - SLOW. Slow as in - magnitudes slower than RAM memory.

So how do you make I/O access fast, when you know that you are bound to the HDD speed?

It's actually really simple -- you load data into RAM and serve all Reads/Writes from there! :)

And that's what Buffer Pool really is! It's a RAM cache of the Pages that are being READ / WRITTEN. It serves as a proxy, of a sort!

Whenever you request a piece of data, SQL Server will first check it's Buffer Pool and if the data is not there - it first ADDS it there and THEN and only THEN does it send back to client.*

Same goes for writing (e.g. INSERT, UPDATE, DELETE, etc.). All of those Page changes first go to Buffer Pool (well, strictly speaking, they first go to write-ahead log) and then after some time they are synced to disk.

At the end of the day, it's really a simple technique and yet quite a powerful one! It removes the bottleneck of HDD by leveraging the speed of RAM, at the cost of switching from synchronous to asynchronous model :)

Now, if you're one of those nerds, you might be wondering - so, if all INSERTS/UPDATES are written to RAM first, and we all know RAM is volatile, what happens if the server crashes BEFORE it syncs data to disk? Does it mean that data will be lost?

Good one, nerd! NO! I mean, YES, your statement is RIGHT, but the conclusion is WRONG.

Do you recall the first article I published? How .MDF and .LDF files are created? If not - now's the time to revisit that one :)

.LDF file is write-ahead log. That means that ALL writes will first be appended there (appending data is always blazing fast) and once you have it there (i.e. it's secure and we can recover in case of crash), then, and only then, is the Buffer Pool updated.

And yes, Write-Ahead log is the next article that I will publish next week :)

Until then!

* SQL Server actually tries being smart arse by preloading couple of pages as well, as it assumes you might need them as well.

Bitesized Engineering

Discussion about this post