Values and References - a deeper look into their behaviors

Chronicles of .NET runtime - Part 11

Apr 25, 2023

Hi there! 👋

Last time I discussed the topic of who lives where. And I mentioned that Values live on the Stack, whereas References live on the Managed Heap. All good. But since then, and since digging way deeper into the topic, I realized that this shouldn’t be the MAIN thing you care about. Should you know about it? Sure! But should it be the main thing you care about - no! The main point of consideration should be about the behaviors that they exhibit. And the behavior is determined based on how they are represented. So, let’s talk more about that.

If you’re a newcomer, this is article is a continuation of my Deep-dive into CLR (i.e. .NET’s runtime) series. If you haven’t seen any of the past articles - I’d recommend you start from beginning.

As usual, we start with the infographic first:

Let me immediately quote Eric Lippert’s rant on References vs Value types:

Almost every article I see that describes the difference between value types and reference types explains in (frequently incorrect) detail about what “the stack” is and how the major difference between value types and reference types is that value types go on the stack. I’m sure you can find dozens of examples by searching the web.
I find this characterization of a value type based on its implementation details rather than its observable characteristics to be both confusing and unfortunate. Surely the most relevant fact about value types is not the implementation detail of how they are allocated, but rather the by-design semantic meaning of “value type”, namely that they are always copied “by value”. If the relevant thing was their allocation details then we’d have called them “heap types” and “stack types”. But that’s not relevant most of the time. Most of the time the relevant thing is their copying and identity semantics.
Source: Eric Lipperts’ Blog

Given that, indeed, it gets hard to find valuable information, doubly so if you are looking for an image representation, I decided to create one. And here is the enlarged version:

In order to explain the behavior, you need to understand WHY is that behavior happening in the first place. And having an image should help.

Think of both Value and Reference types as Boxes. That’s it. Boxes. And there’s content inside of those boxes. What the exact content is, I’ll explain on the example of trying to store integer “42” inside the box.

Value Types store actual value inside of them. As in - actual bits that make up the information that you are trying to store.

In case of “42”, if we stored it inside the variable called “valueVar”, that variable would literally contain the bits that make up 42 (i.e. 000…00101010). I guess that’s simple enough to digest, right?

When it comes to Reference Types, the story is a bit different.

Reference Types store the address where a value can be found. And that address points to the Managed Heap.

In case of “42”, if you were to store it in the variable called “referenceVar”, you’d effectively be storing the “42” (i.e. 000…00101010) somewhere on the Managed Heap, and the actual 64-bit address where the value could be found would be stored inside “referenceVar”.

valueVar = 42; // valueVar now contains 101010 (i.e. 42 in binary)

referenceVar = 42; // referenceVar now contains 64-bit address where 42 can be found (e.g. 010110101010011...01010)

Why that matters? Well it matters because of the behaviors they exhibit. And these behaviors are fundamentally different and in order to understand them, we need to quickly remind ourselves of how CPUs work.

CPU by itself has no knowledge of Value / Reference types. Hell, not only does it NOT have the knowledge, but it doesn’t care either. When it comes to CPU, all it does is moves the bits back & forth between RAM & registers. That’s it.

So what really happens with Value and Reference types is that, whatever is inside of them, will be copied back & forth. If it’s an actual value - the value itself gets copied. If it’s a reference - well, the memory address gets copied. But ultimately, CPU doesn’t care what it is! It’s just bits!

Let’s take a very simple example of passing both value & reference type to some function and then changing it:

valueVar = 42;

referenceVar = (42); // I added brackets to denote that it's a "reference" to 42 being stored elsewhere

changeVar(variable):
  variable = 88;

Can you blind-guess what happens with each of these? From point of view of CPU, it just copies whatever is inside the variable and it doesn’t care about WHAT it represents.

So, in case of valueVar, the bits that represent 42 get copied to another location on the Stack. And then those bits get updated to bits that represent 88. But did it affect the original “42”? Of course not - you changed the bits of the copy of the value!

In case of referenceVar, though, what gets passed to the function is the ADDRESS. Address where “42” is stored. And once you change it to 88, what you are effectively doing is storing “88” to the address where 42 was initially stored. As you can imagine, this effectively “overwrites” the value. But I think “overwrite” is a really shitty term. You are passing an ADDRESS, a WAYPOINT of a sort, and then saying to store something else to that ADDRESS. Whatever was there will be overwritten by new value. So you are not really “overwriting the variable”, but you are effectively changing the content of whatever is stored at the address that the variable holds. Simple! :)

But, let’s for the sake of fun, say that you want to update the content of a referenceVar. So you don’t want to update the content at the address that referenceVar holds, but you rather want to update the variable itself. So, how do you do it? Well, turns out that that’s exactly what “ref” keyword in C# does - it passes the address of the referenceVar (i.e. a reference TO THE VARIABLE), so any changes to it would, effectively, overwrite the content of the variable itself.

Finally, let’s talk about Boxing and Unboxing. They are just ways to convert between Value and Reference types. And that’s actually pretty much it. Boxing a value type means - take the value from variable, store it somewhere on the heap, and store the address back to the variable, effectively making it “reference variable”. Unboxing is vice-versa - it takes the value from the heap, and stores its bits into a variable, effectively making it a “value type”.

Here’s a very simple C# code that shows that:

int valueVar = 42; // Value Type

object referenceVar = valueVar; // Boxes the "valueVar" by copying 42 to somewhere on the Heap, and then stores the address in referenceVar

int anotherValueVar = (int) referenceVar; // Does the exact opposite by "unboxing"

For the sheer fun of it, here’s the IL code for the code above:

IL_0000: nop
IL_0001: ldc.i4.s 42
IL_0003: stloc.0
IL_0004: ldloc.0
IL_0005: box [System.Runtime]System.Int32
IL_000a: stloc.1
IL_000b: ldloc.1
IL_000c: unbox.any [System.Runtime]System.Int32
IL_0011: stloc.2
IL_0012: ret

Finally, here’s an amusing piece of code to amuse you:

int fooBar = 42;

Console.WriteLine("Value is: {}", fooBar)

Can you guess why it has to box the fooBar before printing it? :)

Finally, again for the sheer fun of it, I asked ChatGPT to summarize Values and References as if I was a five years old, and here’s what came up as a result:

Okay, let's imagine you have a toy car, and you want to give it to your friend to play with.
Now, there are two ways you can do this:
You can give your friend the exact same toy car that you have. In this case, your friend has a copy of the car, but it's still a separate car from yours. This is like a "value type."
You can show your friend your toy car and say "Here, play with this car!" In this case, your friend is playing with the same car you have, because you're referring to the car that you already have. This is like a "reference type."
In programming, value types are like a copy of the toy car, while reference types are like the original toy car that you're referring to. When you use a value type, you're creating a new thing that's separate from the original thing. When you use a reference type, you're talking about the same thing that already exists.
Source: ChatGPT

And that’d be all for today. I hope you enjoyed it and if you did - do consider subscribing if you haven’t already :)

Thanks for reading!

Bitesized Engineering

Discussion about this post