Generating IDs

1) Why pay attention to identifiers

Identifier (ID) - the fundamental key of the entity: database lines, messages, file, order. Its properties depend on:

Uniqueness and scale (collisions, horizontal growth).
Order and sorting (time correlation, replication, dedup).
Storage performance (indexes, hot pages, key size).
Safety (unpredictability, leaks, guessing).
Usability/integration (short, URL-safe, not case sensitive).

Choosing ID is a compromise between entropy, orderability, length, generation rate, and exploitation.

2) Key requirements and terms

Uniqueness: the probability of collision must be lower than the acceptable risk.
Entropy: "how much randomness" contains ID (bit).
Time-sortable/k-sortable-Lexicographic ≈ time-based sorting.
Monotony: a non-decreasing sequence within a node/stream.
Locality of entry: how much the new insert is concentrated in the "tail" of the index (danger of hot pages).
Predictability: Is it possible to guess neighboring IDs (important for security/API).
Representation: binary/string, Base16/32/36/58/64, hyphens, case.

3) Major identifier families

3. 1 UUID

v4 (random): 122 bits of entropy. Disordered, good for safety and simplicity. Minus: "chaotic" indices due to random distribution - which, however, evenly dissipates loads and removes "hot pages."

v1 (time + MAC): arrange, but carries MAC/time (privacy); often avoided.
v7 (time-ordered): millisecond time + random part. Design for lexicographic sorting by time and good compression in the database. Compromise: The index's "hot tail" appears; treated by shardening/prefixes/increment.

Tips

For external APIs and lax order requirements - v4.
For event/log databases and "sorted" keys - v7.

3. 2 ULID (Crockford Base32)

128 bits: 48 bits of time (ms) + 80 bits of randomness. Lexicographically sorted by time, man-friendly (without 'I, L, O, U'), URL-safe. There is a monotone variation (with the same time stamp, the random part increases).
Pros: readability, orderability, portability.

Cons: with a very high frequency of inserts at one point in time - "hot tail."

3. 3 KSUID

160 bits: 32 bits of time (sec) relative to the epoch + 128 bits of randomness. Larger time range and stable sorting, strings shorter than ULID? (no - longer, but with its own encoding), good for distributed logs and objects.

3. 4 Snowflake-like (k-sortable flake IDs)

Classic schema (custom):


[ timestamp bits ][ region/datacenter bits ][ worker bits ][ sequence bits ]

Properties: monotone growth on a node, quasi-global uniqueness, short (64 bit) binary representation.
Risks: clock dependence (time drift/regression), exhaustion of sequence in one tick, coordination of region/worker bits.
Treated: protection against "clock back," reserve sequence, time detector, PTP/NTP discipline.

3. 5 DB sequences (SEQUENCE/IDENTITY)

The simplest monotone generation in one DBMS/shard.
Pros: short, fast, convenient for local tables.
Cons: difficult globally in a distributed cluster; predictable (insecure as a public key), creates a hot tail of the index.

3. 6 Content-address IDs (hash content)

Content SHA-256/Blake3 → stable ID, deduplication, integrity checking, caching.
Pros: determinism, protection against substitution.
Cons: expensive generation (CPU), collisions are practical zeros, no time sorting, length.

4) Collisions and the "birthday paradox" (intuitive)

The collision probability for a random ID of size'b 'bits at'n' generations is approximately:


p ≈ 1 - exp (-n (n-1 )/2/2 ^ b) ≈ n ^ 2/2 ^ (b + 1) (for small p)

Examples:

UUIDv4 (122 bits) at n = 10 ^ 12 (trillion) → p ~ 1e-14 (negligible).
64-bit random → with n = 10 ^ 9 already p ~ 0. 027 (notable risk).
Conclusion: 64-bit random is often not enough for huge systems; use 96/128 bits.

5) Indexes, hot pages and storage

Random keys (v4) evenly distribute inserts across the index tree → there is no "tail," but cache locality is worse.
Time-sorted (v7/ULID/Snowflake) are inserted "in the tail" → better locality and compression, but the risk of hot pages under high parallel recording.

Hot tail mitigation:

prefixes/sharding by tenant/region (add 1-2 bytes before time);
interleaving: part of the randomness in the higher bits;
batch inserts, fillfactor in B-tree, auto-transition to BRIN/clustering for large logs.

Size is important:

'UUID (16B) 'vs' BIGINT (8B) '/' INT8'saves memory/cache; Base32/58/64 rows increase size by 20-60%. For the database, store binary, serialize to a string on the edge.

6) Security and privacy

Do not use SEQUENCE/INT as public IDs in the URL/API: guessable → enumeration of resources.
Add random, unpredictable IDs (v4/v7/ULID/KSUID) for external references.
Do not encode PII into ID. If you want to enable the attribute, encrypt/sign (for example, JWE/JWS) or use opaque tokens.
URL-safe encodings: Base32 Crockford, Base58 (without '0OIl'), Base64url.

7) Multi-tenancy, prefixes and routing

Format: '[TENANT _ PREFIX] - [ID]' or binary: 'tenant _ id | | id'.
Pros: quick filters/tenant parties, protection against N + 1 scans.
Cons: may worsen the entropy density in the higher bits → consider the distribution (prefix hash).
Hash suffix (2-3 bytes) reduces collisions and helps shard routing: 'shard = hash (id)% N'.

8) Practical recommendations for selection

API, public links, distributed services without strict order: UUIDv4, ULID/KSUID.
Logs/events/orders, where we often sort by time: UUIDv7 or ULID (monotone).
Ultra-high bandwidth with local monotony and short key: Snowflake-like 64-bit (time discipline required).
Vaults of artifacts/builds/blobs: content-addressable (SHA-256), and on top - a man-friendly short "showcase" (Hashids/link).
Local tables in one database: SEQUENCE/IDENTITY + external "wrapper" for public links (masking).

9) Implementations and examples

9. 1 PostgreSQL

Store UUID binary, indexes - 'btree' or 'hash' as needed.

sql
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

CREATE TABLE orders (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(), -- или uuid_generate_v4()
created_at timestamptz NOT NULL DEFAULT now(),
tenant smallint NOT NULL
);

-- For time-sortable (UUIDv7) store binary (uuid), generation in the application.
-- If you want a cluster by time:
CREATE INDEX ON orders (created_at DESC);

Sequential hot fix: for time-sorted ID, add "salt" to the upper bits or score by tenant:

sql
CREATE TABLE orders_t1 PARTITION OF orders FOR VALUES IN (1);
CREATE TABLE orders_t2 PARTITION OF orders FOR VALUES IN (2);

9. 2 Redis (atomic counters/monutonia)

bash
INCR "seq: orders" # local sequence combine: epoch_ms<<20     (worker_id<<10)      (seq & 1023)

9. 3 Snowflake-like generator (pseudocode)

pseudo const EPOCH =  1704067200000  # custom epoch (ms)
state: last_ms=0, seq=0, worker=7, region=3

next():
now = epoch_ms()
if now < last_ms: wait_until(last_ms)    # защита от clock back if now == last_ms:
seq = (seq + 1) & ((1<<12)-1)      # 12 бит if seq == 0: wait_next_ms()
else:
seq = 0 last_ms = now return (now-EPOCH)<<22      region<<17      worker<<12      seq

9. 4 ULID/UUID in applications

go
// ULID t:= time. Now(). UTC()
entropy:= ulid. Monotonic(rand. New(rand. NewSource(t. UnixNano())), 0)
id:= ulid. MustNew(ulid. Timestamp(t), entropy)

//UUID v7 (if there is a library)
id:= uuid. Must(uuid. NewV7())

Node. js

js import { ulid } from 'ulid';
import { v4 as uuidv4 } from 'uuid';
const id1 = ulid();
const id2 = uuidv4(); // v4

Python

python import uuid, time id_v4 = uuid. uuid4()
For v7, use a library (for example, uuid6/7 third-party packages)

10) Encodings and representations

Binary in the database ('BYTEA', 'UUID') → compact and fast. At the edge, convert to:

Base32 Crockford (ULID): case insensitive, no visually similar characters.
Base58: in short Base32/64 for human-readable tokens, URL-safe.
Base64url: short, but '-' and' _ 'in the URL.

Stabilize case and format (hyphens/none) to avoid duplicates when comparing strings.

11) Test playbooks and observability

Collisions: metric 'id _ collision _ total' (must be 0), alert at> 0.
Prefix distribution: histogram of high bytes - we are looking for buying.
Generation rate: 'ids _ per _ sec', p99 generator latency.
Clock skew (for Snowflake): offset nodes, "clock went back" events.
Index tails: p95/p99 'INSERT' latency; proportion of locks/hot pages.

Game day:

Injection "clock drift/back" → make sure that the generator is waiting/switching.
'sequence 'overflow in milliseconds → next_ms waiting check.
Mass parallelism → whether there are storms of locks in the index.

12) Anti-patterns

AUTO_INCREMENT/SEQUENCE as a public ID: guessed, leaks. Use a public opaque ID over an internal one.
UUIDv1 (MAC/time) out: privacy.
64-bit random ID per trillion entries: real risk of collisions.
Global "central generator" without HA: SPOF and bottleneck.
Time-sorted IDs without clock back protection: duplicates/regression of order.
Mixing different ID formats without an explicit version/prefix → chaos in the debate/migrations.
Saving ID as a string with different registers/forms → hidden duplicates.

13) Implementation checklist

Selected format (v4/v7/ULID/KSUID/Snowflake/SEQ/hash) for domain requirements.
Order requirements defined (whether sortability is required).
The probability of collisions (b bits, n generations) is estimated and the risk threshold is set.
The encoding is designed (binary in DB + human-readable showcase).
For time-sorted - clock back protection, sequence limits and NTP/PTP discipline.
For public IDs - unpredictability (random/ULID/KSUID), absence of PII.
Thought out hash (id)% N, multi-tenant prefixes.
Observability: collision, distribution, latency, clock skew metrics.
Sequence/Contention/Window Length Overflow Test Cases.
Format, version, epoch, bitmap, and migration plan documentation.

14) FAQ

Q: What to choose "default" for microservices?
A: UUIDv7 or ULID: time ordering, a lot of entropy, simple generation at the edge. For external APIs, the ULID/UUIDv4 is also approx.

Q: Need a short and human-readable ID.
A: ULID/KSUID or Base58-128-bit random/temporary ID encoding. Remember about length and collisions.

Q: Is it possible to make "short numerical" IDs, but safe?
A: Yes: store the internal SEQ, and outside give the opaque token (random 96-128 bits) or Hashids with salt + signature.

Q: How do I migrate from SEQ to UUIDv7?
A: Enter a new column 'id _ new' (UUID), two-track, publish references to the new ID, then switch DC/foreign keys and delete the old one.

Q: Why did my ULID inserts get "hot"?
A: Insert strictly increasing keys into one index. Partition/tenant, mix high-order bits, use batch inserts.

15) Totals

A good ID is the correct set of properties for the problem: enough entropy, predictable sorting (if necessary), safe publicity and healthy exploitation of indices. Choose UUIDv4/ULID/UUIDv7/KSUID for simplicity and distribution, Snowflake for dense monotony and short keys (for time discipline), sequences for local tables, content hashes for artifacts. Lay down observability and tests - and identifiers will cease to be a source of surprises.

Generating IDs

Tips

Node. js

Python

Get in Touch

Quick Contact

The video will be updated soon

We are currently very busy with projects