A word (or two) about Write-ahead logging

Chronicles of SQL Server's Storage Internals - Part 4

Sep 06, 2022

How does DB ensure durability (i.e. that your data isn't lost)? Especially given that it uses Buffer Pool and in-memory cache?

It's actually EXTREMELY simple - it uses a technique called Write-ahead log!

And yes, those are what the .LDF files are for :)

I want to emphasize that this isn't something that DBs invented. This technique has been employed for as long as WRITING has existed!

For thousands of years have people written down the raw words spoken by emperors and wise people. And they used them so that others can, wait for it, recreate and re-engage the wisdom of those who have spoken!

The only difference now is that YOU are the wise one whose words (or, more specifically - SQL commands) are jotted down.

It all boils down to following - whenever you send a command that will alter the state of Database (e.g. a new INSERT or UPDATE or table ALTER or CREATE INDEX), the FIRST thing you do is WRITE IT DOWN IN ITS RAW FORM. IMMEDIATELY! Caps lock is intentional because it's SO IMPORTANT to do it IMMEDIATELY :)

Before you mumble and argue that that disk i/o is expensive bla bla - STOP. Random disk reads are expensive, yes. But, if you do append-only writes - that's quite fast actually!

Write-ahead log is append-only file which NEVER gets modified. You just keep appending raw commands to it so it's really blazing fast.

Now imagine that an abrupt crash happens (e.g. somebody trips over the power cable of your always-on cloud server) and your RAM gets removed. How do you recover from that?

Simple! You just take everything that was said and re-apply it to DB :)

Replaying the commands in the same order that they came in is guaranteed to get you back to the same state that you were in in the first place. Again - really simple!

You can do the same to REPLICATE as well - you just send those wise thoughts to another server and have it re-execute. As long as your commands are deterministic (and DBs will always bother you if they aren't!) - you will ALWAYS end up at exactly the same state.

Having this raw log is also useful for auditing! As in - checking WHO did what (and then asking them for WHY).

Same goes for Security auditing - if an incident happens, you can always check WHICH commands were executed, WHEN and by WHOM.

All in all - extremely powerful and yet really damn simple technique!

Bitesized Engineering

Discussion about this post