Database Isolation Levels - explained as if you were building a Snowman
Chronicles of making Databases Fun
Hi there! đ
Itâs Saturday and what a better way to spend 3 mins of your weekend than by refreshing your knowledge on DBâs Transaction Isolation Levels, right? RIGHT? đ€
I know, I know, who cares about that? Itâs a boring and hard-to-grasp topic. And you have your fancy ORM that does it for you anyway ⊠Which all makes sense until you learn that there are some weird scenarios like âLost Updatesâ and âDirty Readsâ. Ouch!
âWell, no problemâ, you say, âIâll just tighten my Transaction Isolation Level. I donât even need to understand how that works, because the higher the betterâ. And thatâs true - higher isolation level is better; until you see the performance hit :)
Hereâs what I want to do - I want to give you a frame in which to think about DB Transaction Isolation levels, but through lens of something that you can easily remember. Through the lens of Building a Snowman :)
As usual, we go with infographic first and then with more detailed text.
I hope that was amusing enough :) Because, Isolation levels are actually really cool concepts to grasp. They do have their own set of intricacies, but, for starters, you should at least be aware on what they are.
Before I give you more context, I just want to let you know of the results of my 15-sec google search on Default Isolation levels in each DB:
MySQL - REPEATABLE READ
PostgreSQL - READ COMMITTED
SQL Server - READ COMMITTED
Oracle - READ COMMITTED
I want to emphasize that I literally did a 5-sec search for each and this is what came as a result. As usual, your mileage on your production database may vary, so do make sure to check it out.
So what is this all about? Well, itâs really simple - Transaction Isolation Levels are all about what happens when multiple users are trying to read & write data from the same row. Thatâs concurrency. It tries to answer the question of âwhat will happen and how do I resolve the conflictsâ. You can also think of it as two users trying to edit the same Excel row at the same time. Thatâd be concurrency as well.
And hereâs what happens under each level:
Under READ UNCOMMITTED, everything is fully transparent and everybody sees everything. If one user starts a transaction and starts editing the data, all other users will see all their changes even though they werenât committed. This is great in cases where you know that concurrency is highly unlikely to happen (e.g. you have single writer and multiple readers or whatever).
The problem would arise if user reads something that another user decides to roll back. We call this scenario a âdirty readâ and depending on the context, consequences could range from âmehâ to âwell, weâre in deep shit nowâ.Under READ COMMITTED, database ensures that you see only the data that has been committed to DB. This is actually pretty good for most scenarios, except, and this is amusing - except if you wanted to ensure that you donât see different data within a same transaction.
Itâs a hard to grasp scenario so Iâll give you an example - long-running analytics query; or a machine learning thingy, whatever. Under READ COMMITTED, you would see some data that has been written to DB. All good. But imagine if your transaction was a long-running one (e.g. it lasts for several minutes). What could happen is that somebody modifies the data AGAIN, and if you re-read this youâd see a different value. This problem of possibly seeing DIFFERENT data under two SELECT statements in a single transaction is something that is not ideal for all scenarios. And thatâs what gets solved with REPEATABLE READ.Under REPEATABLE READ your transaction is guaranteed to see the EXACT SAME data all the time, for as long as transaction takes. Even if data changes in the meantime, you are still guaranteed that, in a single long-running transaction, you keep seeing the exact same values.
Frankly speaking, this is probably OK for 99% of the scenarios. Those 1% of cases would be if you want to ensure that no new data is added or removed while your transaction is running (e.g. think if your query depends on the exact number of rows). For such cases, you want to go with Seralizable Level.Under SERIALIZABLE level, DB ensures that concurrent transactions are executed serially (i.e. one after another). Think of it as physically locking the database while you are doing stuff in it. Itâs super-slow because thereâs no concurrency, but itâs effective if you need to ensure data is exactly the same.
And that would be it, really :) As you can see - itâs not that hard, right? READ COMMITTED would serve most of the use-cases, but if you need to level-up the game - you can always upgrade to REPEATABLE read.
Finally, based on your use-case, you could also consider DOWNGRADING the isolation level. For example, if you are 100% sure that two transactions would never touch the same row, or that theyâd operate on different columns - you can always consider easing the isolation and effectively improving the performance :)
Hope you enjoyed this article! If you did, Iâd appreciate if you share it with others as well :)
You can also subscribe if you havenât already :) I publish at least twice a week, but usually way more often:
Thanks for reading!