Distributed locks

1) Why (and when) distributed locks are needed

Distributed blocking is a mechanism that guarantees mutual exclusion for a critical section between several cluster nodes. Typical tasks:

Leader election for the background task/shader.
Restriction of a single performer over a shared resource (file movement, schema migrations, exclusive payment step).
Sequential processing of the aggregate (wallet/order) if it is impossible to achieve idempotency/ordering otherwise.

When it is better not to use the lock:

If you can make an idempotent upsert, CAS (compare-and-set) or per-key ordering.
If the resource allows commutative operations (CRDT, counters).
If the problem is solved by a transaction in one store.

2) Threat model and properties

Failures and complications:

Network: delays, partition, packet loss.
Processes: GC pause, stop-the-world, crash after lock capture.
Time: Clock drift and displacement break TTL approaches.
Repossession: The "zombie" process after the net may think it still owns the castle.

Desired properties:

Safety: no more than one valid owner (safety).
Survivability: the lock is released when the owner fails (liveness).
Justice: There is no fasting.
Clock independence: correctness does not depend on wall-clock (or compensated by fencing tokens).

3) Main models

3. 1 Lease (rental lock)

The lock is issued with TTL. The owner is obliged to renew it before the expiration (heartbeat/keepalive).

Pros: Crashing self-absorption.
Risks: if the owner is "stuck" and continues to work, but has lost the extension, double ownership may arise.

3. 2 Fencing token

With each successful capture, a monotonously growing number is issued. Resource consumers (database, queue, file storage) check the token and reject operations with the old number.
This is extremely important for TTL/lease and network partitions - it protects against the "old" owner.

3. 3 Quorum locks (CP systems)

Distributed consensus is used (Raft/Paxos; etcd/ZooKeeper/Consul), the record is associated with a consensus log → there is no split brain with most nodes.

Plus: strong security guarantees.
Minus: sensitivity to quorum (when it is lost, survivability is lame).

3. 4 AP locks (in-memory/cache + replication)

For example, a Redis cluster. High availability and speed, but without strong security guarantees for network partitions. Require fencing on the side of the bruise.

4) Platforms and patterns

4. 1 etcd/ZooKeeper/Consul (recommended for strong locks)

Ephemeral nodes (ZK) or sessions/leases (etcd): the key exists while the session is alive.
Session keepalive; quorum loss → the session expires → the lock is released.
Sequence nodes (ZK'EPHEMERAL _ SEQUENTIAL ') for the wait queue → fair.

Sketch on etcd (Go):

go cli, _:= clientv3. New(...)
lease, _:= cli. Grant(ctx, 10)            // 10s lease sess, _:= concurrency. NewSession(cli, concurrency. WithLease(lease. ID))
m:= concurrency. NewMutex(sess, "/locks/orders/42")
if err:= m. Lock(ctx); err!= nil { / handle / }
defer m. Unlock(ctx)

4. 2 Redis (neat)

Classic - 'SET key value NX PX ttl'.

Problems:

Replication/feilover may allow simultaneous owners.
Redlock from multiple instances reduces risk, but does not eliminate; is controversial in environments with an unreliable network.

It is safer to use Redis as a fast coordination layer, but always complement the fencing token in the target resource.

Example (Lua-unlock):

lua
-- release only if value matches if redis. call("GET", KEYS[1]) == ARGV[1] then return redis. call("DEL", KEYS[1])
else return 0 end

4. 3 DB locks

PostgreSQL advisory locks: lock within the Postgres cluster (process/session).
It's good when all critical sections are already in the same database.

SQL:

sql
SELECT pg_try_advisory_lock(42); -- take
SELECT pg_advisory_unlock(42); -- let go

4. 4 File/cloud locks

S3/GCS + object metadata lock with'If-Match '(ETag) conditions → essentially CAS.
Suitable for backups/migrations.

5) Safety lock design

5. 1 Owner identity

Store'owner _ id '(# process # pid # start _ time node) + random token for unlock verification.
Repeated unlock should not remove someone else's lock.

5. 2 TTL and extension

TTL <T_fail_detect (fault detection time) and p99 ≥ of critical section operation × spare.
Renewal - periodically (for example, every'TTL/3 '), with deadline.

5. 3 Fencing token on a bruise

The section modifying the external resource must pass' fencing _ token '.

Sink (DB/cache/storage) stores' last _ token'and rejects smaller ones:

sql
UPDATE wallet
SET balance = balance +:delta, last_token =:token
WHERE id =:id AND:token > last_token;

5. 4 Waiting Queue and Justice

In ZK - 'EPHEMERAL _ SEQUENTIAL' and observers: the client is waiting for the release of the nearest predecessor.
In etcd - keys with revision/versioning; order by 'mod _ revision'.

5. 5 Split-brain behavior

CP approach: without a quorum, you cannot take a lock - it is better to stand than to break safety.
AP approach: progress is allowed in divided islands → fencing is required.

6) Leader election

In etcd/ZK, the "leader" is an exclusive epemeric key; the rest are signed up for changes.
Leader writes heartbeats; loss - re-election.
Accompany all leader operations with a fencing token (era/revision number).

7) Errors and their processing

The client took the lock, but crash to work → the norm, no one will suffer; TTL/session will be released.

Lock expired in the middle of work:

Mandatory watchdog: if the extension fails, interrupt the critical section and roll back/compensate.
No "finish later": without a lock, the critical section cannot be continued.

A long pause (GC/stop-the-world) → the extension did not happen, the other took the lock. The workflow must detect the loss of ownership (keepalive channel) and abort.

8) Dedloki, priorities and inversion

Dedloki in a distributed world are rare (there is usually one castle), but if there are several castles, adhere to a single order of taking (lock ordering).
Priority inversion: A low-priority owner holds a resource while high-priority ones wait. Solutions: TTL limits, preemption (if business allows), sharding of the resource.
Fasting: Use waiting queues (ZK-sub-order nodes) for fairness.

9) Observability

Metrics:

`lock_acquire_total{status=ok|timeout|error}`
`lock_hold_seconds{p50,p95,p99}`
'fencing _ token _ value '(monotony)
`lease_renew_fail_total`
'split _ brain _ prevented _ total '(number of attempts denied due to lack of quorum)
`preemptions_total`, `wait_queue_len`

Logs/tracing:

`lock_name`, `owner_id`, `token`, `ttl`, `attempt`, `wait_time_ms`, `path` (для ZK), `mod_revision` (etcd).
"acquire → critical section → release" spans with the result.

Alerts:

Growth 'lease _ renew _ fail _ total'.
`lock_hold_seconds{p99}` > SLO.
"Orphan" locks (without heartbeat).
Bloated waiting lists.

10) Case studies

10. 1 Secure Redis lock with fencing (pseudo)

1. We store the token counter in a reliable store (for example, Postgres/etcd).
2. If'SET NX PX'is successful, we read/increment the token and make all changes to the resource with the token checked in the database/service.

python acquire token = db. next_token ("locks/orders/42") # monotone ok = redis. set("locks:orders:42", owner, nx=True, px=ttl_ms)
if not ok:
raise Busy()

critical op guarded by token db. exec("UPDATE orders SET... WHERE id=:id AND:token > last_token",...)
release (compare owner)

10. 2 etcd Mutex + watchdog (Go)

go ctx, cancel:= context. WithCancel(context. Background())
sess, _:= concurrency. NewSession(cli, concurrency. WithTTL(10))
m:= concurrency. NewMutex(sess, "/locks/job/cleanup")
if err:= m. Lock(ctx); err!= nil { /... / }

// Watchdog go func() {
<-sess. Done ()//loss of session/quorum cancel ()//stop working
}()

doCritical (ctx )//must respond to ctx. Done()
_ = m. Unlock(context. Background())
_ = sess. Close()

10. 3 Leadership in ZK (Java, Curator)

java
LeaderSelector selector = new LeaderSelector(client, "/leaders/cron", listener);
selector. autoRequeue();
selector. start(); // listener. enterLeadership() с try-finally и heartbeat

10. 4 Postgres advisory lock with deadline (SQL + app)

sql
SELECT pg_try_advisory_lock(128765); -- attempt without blocking
-- if false --> return via backoff + jitter

11) Test playbooks (Game Days)

Quorum loss: disable 1-2 etcd nodes → the attempt to take the lock should not pass.
GC-pause/stop-the-world: artificially delay the owner's flow → check that the watchdog is interrupting work.
Split-brain: emulation of the network separation between the owner and the side of the castle → the new owner gets a higher fencing token, the old one is rejected by the blue.
Clock skew/drift: take the clock away from the owner (for Redis/lease) → make sure that the correctness is ensured by tokens/checks.
Crash before release: process crash → lock is released via TTL/session.

12) Anti-patterns

Clean TTL lock without fencing when accessing an external resource.
Rely on local time for correctness (no HLC/fencing).
Distribution of locks through one Redis master in an environment with a feilover and without confirmation of replicas.
Infinite critical section (TTL "for the ages").
Removing someone else's lock without checking 'owner _ id '/token.
Lack of backoff + jitter → storm of attempts.
A single global lock "for everything" - a bag of conflicts; key sharding is better.

13) Implementation checklist

Resource type defined and can CAS/queue/idempotency be dispensed with.
Mechanism selected: etcd/ZK/Consul for CP; Redis/cache - only with fencing.
Implemented: 'owner _ id', TTL + extension, watchdog, correct unlock.
The external resource checks the fencing token (monotony).
There's a leadership strategy and failover.
Configured metrics, alerts, logging tokens and revisions.
Backoff + jitter and acquire timeouts are provided.
Game days held: quorum, split-brain, GC pauses, clock skew.
Documentation of the procedure for taking several locks (if required).
brownout plan - what to do when the lock is unavailable.

14) FAQ

Q: Is' SET NX PX 'Redis lock enough?
A: Only if the resource checks the fencing token. Otherwise, with network partitioning, two owners are possible.

Q: What to choose "by default"?
A: For strict guarantees - etcd/ZooKeeper/Consul (CP). For easy tasks inside one database - advisory locks Postgres. Redis - only with fencing.

Q: Which TTL to put?
A: 'TTL ≥ p99 critical section duration × 2' and short enough to quickly clear "zombies." Renewal - every'TTL/3 '.

Q: How to avoid fasting?
A: Waiting queue in order (ZK sequential) or fairness algorithm; limit of attempts and fair planning.

Q: Do I need time synchronization?
A: For correctness - no (use fencing). For operational predictability, yes (NTP/PTP), but don't rely on wall-clock for lock logic.

15) Totals

Reliable distributed locks are built on quorum levels (etcd/ZK/Consul) with lease + keepalive, and are necessarily supplemented by fencing token at the level of the resource being changed. Any TTL/Redis approaches without fencing are a split-brain risk. Think first about causality and idempotence, use locks where it is impossible without them, measure, test failure modes - and your "critical sections" will remain critical only in meaning, not in the number of incidents.

Distributed locks

Get in Touch

Quick Contact

The video will be updated soon

We are currently very busy with projects