@whitequark Most applications that use UUIDs these days tend to use version 7 UUIDs, which use milliseconds since the Unix epoch as the most significant bits. This embeds the creation timestamp in the ID and allows for sorting, while also adding in sufficient randomness so they're not incremental and multiple systems can generate them with low probability of collision.
@ramsey oh, I didn't know this!
@whitequark @ramsey I've only ever seen version 4 (unstructured random) in production that i can recall
@azonenberg @whitequark @ramsey
V7 is fairly new, standardised around 2024. They've got a bit more adoption in databases over the last year.
Indeed I did a talk at POSETTE last year talking about encoding information into UUIDs and some of the index issues.
IMHO you can have more fun that just encoding generation time into them.
@intrbiz @becomethewaifu Version 8 is probably more suited to those use-cases, though.
Indeed, my code was generating UUIDs marked as V8. The version is just a nibble that's been standardised. And it's handy to have a standardised version number for custom generation schemes.
@kw217 @intrbiz One reason not to use version 1 is that it leaks details about the system (i.e., the MAC address). Another reason is that the values aren’t sortable. Version 6 was introduced to solve this. It’s also based on 100-nanosecond intervals since the Gregorian epoch, but it’s sortable and uses random bytes following the timestamp, rather than the MAC address.
But, for most purposes, version 7 is the right solution, unless you need to create UUIDs for dates earlier than 1970.
@ramsey @kw217 @intrbiz as a privacy person we should say that individuals and activist groups probably want to avoid leaking timestamps, especially fine-grained timestamps, pretty much everywhere and always, although the attack models are highly indirect and people don't usually know what they're protecting until they've lost it
it's fine for corporate use as long as it can't be tied to an individual
@ireneista @ramsey @intrbiz entirely fair - this was in a corporate context; in activism circles (hmm, in most places now) one should definitely think carefully about privacy properties.
Valid concern in some domains for sure. I tend to keep the time buckets pretty big. There are block based approaches which remove the time related issues but still reduce issue with things like indexes.
@ireneista @ramsey @kw217 @intrbiz huh? what does that reveal, other than a time that your computer was on and (maybe, if your UUID generation code is deeply misconfigured) your timezone?
@AVincentInSpace @ramsey @kw217 @intrbiz well, exactly that. whether that's a problem depends on what you use it for.
@ireneista @AVincentInSpace @kw217 @intrbiz Right. Let’s say that someone is trying to find out what you were doing at a certain time. If they find an ID that was generated at a specific time and owned by your account, then they can deduce a general idea of what you might have been doing at that time (i.e., you were probably using a certain app during that time).
@ramsey @ireneista @AVincentInSpace @intrbiz having very fine grained timing info lets you very precisely correlate messages across systems. Of the thousands of messages that went across the network in this second, despite any crypto, you can say *this* one was the one your subject sent, and *this* is where it went in the network, with high degree of confidence. (Other attacks too, like fingerprinting the sender's clock, but they're a bit more involved.)
Wikipedia maybe has the answer: https://en.wikipedia.org/wiki/1601
Seems it goes back to ANSI and COBOL.
@ramsey @intrbiz https://devblogs.microsoft.com/oldnewthing/20090306-00/?p=18913 it simplified leap year reckoning
@intrbiz @azonenberg @whitequark It was standardized in 2024, but the concept has been around since at least 2002, under the name “COMB.”
https://web.archive.org/web/20240118030355/https://www.informit.com/articles/printerfriendly/25862
@intrbiz @azonenberg @whitequark @ramsey Yeah, postgres 18 added support for generating them IIRC, though UUIDs are inherently "mostly backwards compatible" unless you're trying to parse them for some godforsaken reason, so older versions support it just fine if the client generates them.
They make for much happier indexes and sharding vs the ones with leading-random, because most workloads don't have truly random access patterns...