How SQL Server stores Large(r) rows?
Chronicles of SQL Server's Storage Internals - Part 8
(There are three illustrations in this Newsletter instead of one. That’s because I think one makes not much sense without the other!)
Ever wondered why 8000 characters (or, you know, bytes) is the limit for most columns? It’s because your whole row has to fit into a single page (which, if you recall, is also 8KB in size!). This is especially true for fixed-size columns. Try creating two CHAR(8000) columns and see what happens (hint: you’ll get an error).
People complained over it, as you could imagine. And this limit got “relaxed”. But only for variable-sized columns (e.g. you can have multiple VARCHAR(8000) but you still can’t have fixed-size columns over 8KBs).
The implementation is rather simple and calls for introduction of three types of Data Pages:
IN_ROW data pages are exactly what the name implies. Your initial row data that has to fit into 8KB limit. There’s no way around this.
OVERFLOW data pages that allow you to store more than 8KBs worth of TOTAL data (i.e. your single column STILL has to fit into a single page, hence the 8000 limit on VARCHAR).
LOB data pages which allow you to store up to 2 gigs of data.
The inner workings are actually quite simple!
Suppose you define a table with two columns - CHAR(6000) and VARCHAR(3000). Their total size is 9000 bytes which exceeds the single page size.
I’ll give you three scenarios. Do note that CHAR column will ALWAYS allocate 6000 bytes, no matter what. Here’s what happens:
In first case, we add 1000 characters to VARCHAR column. Total row size is 6000 + 1000 = 7000 bytes. This fits a single page so SQL Server happily stores this in IN_ROW page.
In second case, we add 3000 characters to VARCHAR column. Total size now is 6000 + 3000 = 9000 bytes; and we’re over 8000 bytes limit; bummer!
What happens is, as usual - quite simple. SQL Server stores what it can inside the IN_ROW data page, but it also creates a NEW PAGE (called OVERFLOW_DATA page) where it stores the data that overflows (i.e. 3000 bytes from the second column).
First page gets to keep only a pointer to the new page.
In third case, we add 5000 characters to VARCHAR column. SQL Server reports error because that’s over the definition of the column (VARCHAR(3000) :)
That’s really how OVERFLOW works. Whenever a variable-lengthed column is about to go over the Page size, it gets moved to a separate Page, while the original page keeps just a pointer to it.
Keep in mind you STILL have to conform to 8KBs per overflow column, which is why you can’t specify anything more than 8000 for VARCHAR :)
Finally, you likely know that you can use VARCHAR(MAX) to store up to 2 gigs. So, what’s the deal with that?
The deal is that when you use MAX as length identifier, you are actually creating LOB (Large OBject) type of page. This one is also quite easy! It literally takes your big data, splits it into 8KB chunks, and stores it on as many pages as it needs, chaining them along the way. In a way it’s similar to OVERFLOW pages :)
And there you have it! Hope that sheds a bit more light to why 8000 and how it works under the hood!
Next time we are going to talk more about IAM pages and how they fit into the picture now.
Thanks for reading Bitesized Engineering! Subscribe for free to receive new posts and support my work.