UUIDs are so funny today because almost all of their deployments is "we take a string of mostly-random bits but give it structure for no apparent reason"

it's true that you could populate some of those bits with predictable sequences to avoid clashing (a la snowflake) but guess what: you can do this to a hex string or a number (as does snowflake). i think the only reason UUIDs are as popular as they are today is Microsoft?

@whitequark Most applications that use UUIDs these days tend to use version 7 UUIDs, which use milliseconds since the Unix epoch as the most significant bits. This embeds the creation timestamp in the ID and allows for sorting, while also adding in sufficient randomness so they're not incremental and multiple systems can generate them with low probability of collision.

@whitequark @ramsey I've only ever seen version 4 (unstructured random) in production that i can recall

@azonenberg @whitequark @ramsey

V7 is fairly new, standardised around 2024. They've got a bit more adoption in databases over the last year.

@intrbiz @azonenberg @whitequark @ramsey Yeah, postgres 18 added support for generating them IIRC, though UUIDs are inherently "mostly backwards compatible" unless you're trying to parse them for some godforsaken reason, so older versions support it just fine if the client generates them.

They make for much happier indexes and sharding vs the ones with leading-random, because most workloads don't have truly random access patterns...

@becomethewaifu

Indeed I did a talk at POSETTE last year talking about encoding information into UUIDs and some of the index issues.

IMHO you can have more fun that just encoding generation time into them.

@intrbiz @becomethewaifu Version 8 is probably more suited to those use-cases, though.

@ramsey

Indeed, my code was generating UUIDs marked as V8. The version is just a nibble that's been standardised. And it's handy to have a standardised version number for custom generation schemes.

@intrbiz @ramsey time-based (type 1 back in the day, now type 7) is really useful for debugging; the creation date and time can be a really strong hint.

@kw217 @intrbiz Version 1 still exists, but it’s based on the weird value of 100-nanosecond intervals since the Gregorian epoch in 1582. Version 7 is based on milliseconds since the Unix epoch.

@kw217 @intrbiz One reason not to use version 1 is that it leaks details about the system (i.e., the MAC address). Another reason is that the values aren’t sortable. Version 6 was introduced to solve this. It’s also based on 100-nanosecond intervals since the Gregorian epoch, but it’s sortable and uses random bytes following the timestamp, rather than the MAC address.

But, for most purposes, version 7 is the right solution, unless you need to create UUIDs for dates earlier than 1970.

@ramsey @intrbiz good ol' Microsoft "hectonanoseconds". Weird unit but presumably shoehorned into a particular bitwidth and range that made sense back in the day (Windows NT?)

@ramsey @intrbiz that article has the Apollo timestamp (v0) as 48 bits, which isn't wide enough for hectonanoseconds. Wikipedia agrees they came in with Windows in the nineties.

@kw217 @intrbiz You’re right. The article doesn’t say when they started using the Gregorian timestamp. I was making an assumption. I guess Microsoft was the first to do that?

@ramsey @intrbiz I'm making an assumption too; I was hoping Raymond Chen would have something definitive but he doesn't. NT uses 100ns units but a different origin (1601 AD).

@kw217 @intrbiz I wonder why 1601. I know that Great Britain didn’t adopt the Gregorian calendar until 1752, so that year would make sense to me.

Sign in to participate in the conversation
Mastodon

Time for a cuppa... Earl Grey please!