“Oh I just needed something random but also human-readable” you said, as you casually called .Substring(8) on a UUID.
You probably also “casually” mutilate animals like you did to that poor UUID. Great job on that name, too, Shakespeare. Item_019b1999 is going to be the next buzzword all the youths are yelling. Very human-readable.
If it wasn’t bad enough, you kept the first eight characters and not the last. Do you even know what a UUID is? Let me show you.
Here’s the popular UUIDv7 everyone uses:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ...
“Oh I just needed something random but also human-readable” you said, as you casually called .Substring(8) on a UUID.
You probably also “casually” mutilate animals like you did to that poor UUID. Great job on that name, too, Shakespeare. Item_019b1999 is going to be the next buzzword all the youths are yelling. Very human-readable.
If it wasn’t bad enough, you kept the first eight characters and not the last. Do you even know what a UUID is? Let me show you.
Here’s the popular UUIDv7 everyone uses:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| unix_ts_ms |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| unix_ts_ms | ver | rand_a |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|var| rand_b |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| rand_b |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
It’s a terrible diagram; so let me simplify that for you:
All of the randomness is that the end!
Okay, okay, the first 12 characters are actually an encoded Unix timestamp with millisecond precision, so only IDs generated within the resolution of that timestamp would collide. Let’s see what happens as you truncate the UUID.
| Length | Interval where all IDs get the same value | Rough human equivalent |
| 12 | 1 ms | Camera flash |
| 11 | 16 ms | Monitor refresh |
| 10 | 256 ms | Slow mosquito flaps its wings |
| 9 | ≈ 4 s | Sound of a firework travels one mile |
| 8 | ≈ 1 min | Toweling off body after shower |
| 7 | ≈ 17.5 min | Cartoon show episode |
| 6 | ≈ 4.5 hr | Cook 20lb turkey |
| 5 | ≈ 3 day | Roof replacement |
| 4 | ≈ 50 day | Oldest fruit fly |
| 3 | ≈ 2 yr | Parmesan cheese aging |
| 2 | ≈ 12.5 yr | Chinese zodiac cycles through all animals |
| 1 | ≈ 557 yr | The Ottoman empire |
By truncating your UUID to 8 characters, you’ve ensured that all items generated while I was microwaving my rice have the same value. Congratulations for creating a nightmare.
Oh, but Andy, we use UUIDv4 where it’s all random
(Some libraries just call this a UUID but they really mean v4)
- Not true; five percent of it is not random, and
- You’re missing the point
The reason to use a UUID is for uniqueness. If you don’t want it, generate some random bits yourself.
As you truncate your UUIDv4, here’s how many IDs you can generate until you have a greater than 50% chance of a collision.
| UUID Length (chars) | Number generated | Rough human intuition |
| 32 | 2.7 Quintillion | A third of all insects on earth (How???) |
| 31 | 680 Quadrillion | Atoms of gold worth $0.33 ($4,300/oz) |
| 30 | 170 Qa | Kg of mass to power the Sun for 1.3 yrs |
| 29 | 42 Qa | 260k years worth of parcels shipped (161B/yr) |
| 28 | 10.5 Qa | Nanoseconds in 350 years |
| 27 | 2.5 Qa | Volume of Lake Superior (gal) |
| 26 | 660 T | 2x the global real estate market ($) |
| 25 | 165 T | Data center energy usage in 2023 (kWh) |
| 24 | 41.5 T | Cells in a human body |
| 23 | 10 T | Trees on 3.5 earths |
| 22 | 2.5 T | Meters to Uranus |
| 21 | 650 B | 136 years of orders at Amazon (9k/sec) |
| 20 | 160 B | Stars in the Milky Way |
| 19 | 40 B | Neurons in a Gorilla |
| 18 | 10 B | People on earth |
| 17 | 2.5 B | Ping pong balls to fill 16 Olympic pools |
| 16 | 1.3 B | All dogs |
| 15 | 316 MM | People in USA |
| 14 | 79 MM | Travelers passing through LAX each yea |
| 13 | 20 MM | 3 days of orders at Amazon (9k/sec) |
| 12 | 20 MM (yes) | 3 days of orders at Amazon (9k/sec) |
| 11 | 5 MM | LEGOs produced every 3 days |
| 10 | 1.2 M | 30kg of white rice grains |
| 9 | 300 k | Monster energy drinks sold every 4 hours |
| 8 | 80 k | 9 minutes of orders at Amazon (9k/sec) |
| 7 | 20 k | Pickleball courts built in 2024 |
| 6 | 5 k | Pack of staples |
| 5 | 1.2 k | Stack of paper the height of a soda can |
| 4 | 300 | Bag of Dum Dums lollipops |
| 3 | 80 | People on a city bus |
| 2 | 20 | Seconds for a human to urinate |
| 1 | 5 | Your IQ if your truncate a UUID |
So yeah, if you truncated your UUIDv4 to 8 characters at Amazon you’d probably get your PIP in 6 months instead of the 2 year average.
And this is just when the probability of generating UUIDs crosses the 50% threshold, which you would never even want to get close to. If you’re going to produce 100 billion UUIDs over the lifetime of your app (very realistic in modern enterprise), you want the probability of a collision to be disappearingly small, approaching 0.
For the sake of argument, let’s say that while you YOLO your way through life failing up the entire time, you decide that a 1% probability of collision is “good enough”. After truncating your UUID to 8 characters, you will hit a 1% chance of collision after just 9,300 IDs. That’s the number of steps a Spaniard takes in a day (three times as many as you).
Hell, even if you deigned to allow the UUID to retain half its original length (16 characters), you’d still have a 1% chance of collision after 150 MM, or the number of Snickers bars produced in 10 days.
And you’re still missing the point.
It’s not human readable anyway
- End users don’t give a shit what your IDs look like, they’re going to copy and paste if they ever need to (which is roughly never)
- Assumed user preference does not DICTATE HOW FUCKED YOUR DATABASE SHOULD BE
Use a different encoding
Did you know you can get shorter IDs without sacrificing uniqueness? UUIDs are hex-encoded (base 16) and only 4 bits of information can fit into each character. If you change your encoding to base 32, wow you can have an ID of length 26 instead of 32. If you used raw base 64, you could get down to 22 characters.
This is the idea behind the TypeID specification: a type-specific prefix followed by a base32 encoded UUID. Now you can generate IDs like Item_01kccskbjfff08mh2ttwpvjf9c which are equally human readable as before (meaning kind of but not really) without sacrificing the entire reason for its existence.
On human-readable IDs
Give up. Or at least give up on your end users easily remembering (or caring). Even random phrase generators that puke out something like “Parchment-Pellet-Closeable-Whoopee” only sound human readable at first. Without looking back, can you remember the alleged human-readable passphrase I just mentioned? Didn’t think so.
For you and all the coworkers that loathe you for truncating UUIDs, okay yeah maybe TypeIDs are helpful. These allow you to parse into stronger types like class FistId so that someone can’t accidentally use a FistId when they should have been using a FaceId. And it makes reading logs way easier.
Conclusion
Recently I reviewed some code that tried to cram an entire directory structure worth of IDs into one: {grandparent}_{parent}_{child}_{UUID}, except the entire string was only allowed to be 80 characters long. So it was truncated at the end. Fortunately (unfortunately?) two of the IDs were simple integers up to 8 digits, meaning our string could be 83 characters. So we were truncating the last 3 characters. Of the only part of the ID that made it unique in the first place. Turns out a simple UUID worked just fine and the other information could be gleaned from elsewhere.
For every character of a UUID you lop off, you are increasing the odds of a collision by four times (and the odds that I find you and lop off the end of your ring finger). It’s even worse if you’re using a UUID where the first few characters are an encoded timestamp.
Before you truncate a UUID ask yourself a few questions:
- Are the extra few characters really making anything less readable?
- Can I use a different encoding or a different kind of ID instead?
If you still decide you must truncate, then at least keep the last characters because they’re likely to give you better odds.
Until next time.