Explain like I’m 5 (ELI5) comes from the popular concept that if you really know something well, you can explain it simply, so a five year old can understand. I’ll try to do an ELI5 about Slowly Changing Dimension (SCD) to prove I know it well. Also, this post has the added benefit of preparing study material for five year olds who want to become data engineers.
SCD is a concept in data warehousing that aims to solve how to store information about a dimension as it changes over time. The rate of change is considered slow, although I don’t know if there’s a percise function for calculating its speed.
My nephew just turned five. He loves dinosaurs, so I’ll pretend I’m explaining SCDs to him. His name is Rhys In-tech.
So Rhys, a dimension is just a fancy word for a person, place, or thing. It can be a thing that you can touch or a thing you can’t touch. You can touch your stuffed animals, they’re things, but a party is also a thing, but you can’t touch it. If you aren’t sure if something is a thing, try to put an “a” in front of it: a stuffed animal, a party, a car, a house. That’s a dimension!
So a stuffed dinosaur is a dimension because it’s a thing. But we have lots of stuffed dinosaurs, so let’s say we wanted to list them. We may give them numbers (surrogate keys) like 0,1,2,3 and 4, but these don’t mean anything about the animal itself. It’s just so we can quickly identify them. Also, if we ever want to give the two dinosaurs the same name, their numbers would distinguish them1.
So we list our animals! We record their number and some stuff about them (attributes). Let’s record their name and their description
But sometimes we may want to change our list. We may get new animals as presents! Or we may rename one. We would want our list to be up to date right? Adding a new animal is easy, we can just add it to the bottom:
Easy! But what if we want to rename one of our dinosaurs? That happens, doesn’t it? Well we have a few options of how to do this:
Let’s call this way Type 1. In this way, we make a whole new list with the updated info. So if we want to rename Brachie to Bronchy, we would pull out a new piece of paper and write:
Awesome! Only now if someone asked us about Brachie, we might not remember which one that used to be or when we changed his name. We don’t have any of his old data. If we lost a dinosaur and made a new list, we may forget that dinosaur ever existed.
Maybe we want to cross out the Brachie line lightly so we can still read his old name, and also write down the date that we crossed it out. That way, we can see his old name but we know it’s not real anymore. Instead of crossing it out, let’s put an end date to show where we would have crossed it out, then we can add his new information under it (with a new fake number). Let’s also record when we got them as a start date! Like this:
|Number||Name||Description||Date we got it||Date changed|
I know it looks like you have 5 dinosaurs when you really don’t, but that’s okay. You just have to look at all the rows where there is no “date changed” date.
We have all the information we want recorded, but I guess it looks like you have five dinosaurs when you don’t, and if people think you have more than you really do, they might stop buying you gifts. I get that. What if we recorded both names? The old and the new? Like this:
|Number||Original Name||Description||New Name|
Now we have both names and we know we only have four stuffed animals! The only problem is– what if the description changes too? Do we have to make a new column for everything that changes? Also, what if we change the name again? Do we make another column or lose the first name?
I personally favor Type 2 best. Do you? I get that, but I really think your Auntie Katie will buy you stuffed animals no matter how many are on your list.
1 When I was younger, I had two similar stuffed dinosaurs that I named Cindy and Cindy because of my long held belief that twins were given the same name. My sisters said I even named my Barbies the same thing until they explained to me that twins, although they may look similar, are usually given their own names. Obviously the need for surrogate keys was present even in my childhood.
One of many sources potentially on the web for SCD: http://www.oracle.com/webfolder/technetwork/tutorials/obe/db/10g/r2/owb/owb10gr2_gs/owb/lesson3/slowlychangingdimensions.htm