Howdy!
I planned to publish this one on Saturday, but I got an unexpected “night off” yesterday so I spent it polishing this article and decided to kick it off today :)
The question that really bothered me was “why is Stack faster than Heap”? Truthfully, I wasn’t even aware of this before I jumped into these deep-dives. Well, I’ll give my best to try and answer that :) As usual, additional text can be found BELOW the image.
(click on image to enlarge)
⚡ 𝐒𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞 𝐢𝐭𝐬𝐞𝐥𝐟 𝐢𝐬 𝐰𝐡𝐚𝐭 𝐦𝐚𝐤𝐞𝐬 𝐢𝐭 𝐟𝐚𝐬𝐭𝐞𝐫 - there's nothing inherently "better" about Stacks. Just like Heaps, they are part of RAM; and all access times on RAM are the same. What distinguishes Stack from Heap is THE WAY IT'S USED.
▶ 𝐒𝐭𝐚𝐜𝐤 𝐢𝐬 𝐚𝐥𝐥𝐨𝐜𝐚𝐭𝐞𝐝 𝐰𝐡𝐞𝐧 𝐭𝐡𝐫𝐞𝐚𝐝 𝐬𝐭𝐚𝐫𝐭𝐬 - this is really the first and foremost thing. You DO NOT have to ask your OS for memory, but instead you get a preallocated chunk (e.g. 1MB on Windows). To be more specific, what you get is a pointer that is set at memory address where your chunk of memory begins. Let's say the address is "0001d" ("d" in suffix implies I'm going to use decimal notation - base10. I'm doing this to make it more readable). This is where your "int main()" kicks off. Furthermore, for sake of ease, let's assume that 1MB would be "500" more bits. That means that your stack (i.e. your scratch space exists between 0001d and 0500d).
Let's further assume that you have 3 variables inside main(), each taking up 10 bits (I'm making sizes up to make it easier on your brain machinery). That means that you will allocate them at 0001d (to 0009d), 0010d, and 0020d. So far so good. But now we have another function call - foo(), and previous variables shouldn't be visible there, right. So what do we do? We just move our stack pointer to 0030d, and our program will think that it's Stack space is from 0030d to 0500d. Easy peasy! We keep putting stuff on top, etc. And then once we are done with foo(), what do we do? We just move the stack pointer back to 0029d!
💡 (𝐃𝐞)𝐀𝐥𝐥𝐨𝐜𝐚𝐭𝐢𝐨𝐧 𝐢𝐬 𝐚 𝐬𝐢𝐧𝐠𝐥𝐞 𝐂𝐏𝐔 𝐢𝐧𝐬𝐭𝐫𝐮𝐜𝐭𝐢𝐨𝐧 - given that we are really just moving up & down inside our own 1MB of space, all we need to do in order to allocate and wipe out data is to move a pointer that tells us where's our "roof" :) And moving that pointer turns out to be a single CPU instruction (meaning - quite fast!). Compare that with a Heap (de)allocation which takes number of instructions to execute!
🔢 𝐀𝐝𝐝𝐫𝐞𝐬𝐬𝐞𝐬 𝐜𝐚𝐧 𝐛𝐞 𝐩𝐫𝐞𝐜𝐚𝐥𝐜𝐮𝐥𝐚𝐭𝐞𝐝 𝐚𝐭 𝐜𝐨𝐦𝐩𝐢𝐥𝐞 𝐭𝐢𝐦𝐞 -- think about it this way - when you ask for memory from Heap, you could get memory on any address. What's more, every time you ask for additional memory, it could go to totally different segment. You just have no clue where your bits will end up (it's Random Access Memory after all!). Contrast that to Stacks where you know EXACTLY which memory belongs to you. And you know EXACTLY which variables will be put there (because each new one gets added on top of old one). What does that mean? This means that compiler can precalculate offsets and just apply them at runtime!
Here’s an example:
int main() {
int a = 5; // We know that int takes exactly 32 bits
int b = 10; // Same as above
// We do know that int* takes 64 bits, but allocating 32 bits will
// have no clue where they end up
int* c = new int(15);
}
Now bear with me. We know for a fact that “a” will end up at Stack location zero (i.e. beginning of Stack). We also know that next thing (“b” in this case) will end up 32 bits from “a”. And we also happen to know that next thing (“int* c” - pointer to an integer) will end up again 32bits after “b”. So even without knowing EXACT addresses, once we do get where our Stack begins, we can calculate where others are simply by applying offsets! But what we CAN NOT know is where the value of “c'“ will end up! That’s allocated at runtime and there’s no guarantee that anything allocated on Heap will end up on consecutive addresses.
So the whole point is - for Stacks, compiler can precalculate variable offsets and use them at runtime to simply point to memory locations of it’s variables. And hell I think that’s just beautiful!
📦 Stack’s bits are usually cached in CPU - this goes beyond my memory management articles, but you are probably aware that CPU tries to cache both current bits and adjacent addresses (gross oversimplification, I know!). Well, due to nature of Stack itself, most of the adjacent stuff will actually be needed and, as such, will end up near CPU with zero latency to fetch. Amazing :)
And that’d probably be about it. Next article will likely do a bit more of a dive in order to explain some concepts I feel aren’t clear yet, but after that I’m moving to Heaps. And if you haven’t subscribed already - now’s a good time to do so :)
P.S. If you missed previous articles, here are the latest three: