GH GambleHub

Choice of leader

1) Why you need a leader and when he is justified at all

Leader - a node that has the exclusive right to perform critical actions: starting a crown/ETL, coordinating shards, distributing keys, changing the configuration. It simplifies invariants ("one performer"), but adds risks (SPOF, re-election, lag).

Use leadership if:
  • need uniqueness of execution (for example, a billing aggregator once a minute);
  • Changes need to be serialized (configuration register, distributed locks)
  • the cluster protocol assumes leadership replication (Raft).
Avoid if:
  • the problem is solved by idempotence and order by key;
  • can be parallelized through work-stealing/queues;
  • "leader" becomes the only narrow point (wide fan-in).

2) Base model: lease + quorum + epoch

Terms

Lease: The leader is entitled to T seconds; must renew.
Heartbeat: periodic extension/live signal.
Epoch/term: monotonously growing leadership number. Helps recognize "old" leaders.
Fencing token: the same monotone number that the resource consumer (database/storage) checks and rejects the operations of the old leader.

Invariants

At any time, no more than one actual leader (safety).
In case of failure, progress is possible: a new one (liveness) is elected in a reasonable time.
Leader operations are accompanied by an era; sinky only accept newer eras.

3) Overview of algorithms and protocols

3. 1 Raft (Leadership Replication)

Status: Follower → Candidate → Leader.
Timers: random election timeout (jitter), RequestVote; the leader holds AppendEntries as heartbeat.
Guarantees: quorum, no split-brain under standard prerequisites, logbook with logical monotony (term/index).

3. 2 Paxos/Single-Decree / Multi-Paxos

Theoretical basis of consensus; in practice - variations (e. g., Multi-Paxos) with a "chosen coordinator" (leader analogue).
Harder to implement directly; ready-made implementations/libraries are used more often.

3. 3 ZAB (ZooKeeper Atomic Broadcast)

ZK mechanism: leadership journal replication with recovery phases; epochs (zxid) and sequential ephemeral nodes for primitives like leadership.

3. 4 Bully/Chang-Roberts (Rings/Monarch)

"Training" algorithms for static topologies without quorum. Do not take into account partial network failures/partitions - do not apply in sales.

4) Practical platforms

4. 1 ZooKeeper

EPHEMERAL_SEQUENTIAL pattern: the process creates '/leader/lock-XXXX ', the minimum number is the leader.
Loss of session ⇒ node disappears ⇒ re-selection is instantaneous.

Justice through waiting for the "predecessor."

4. 2 etcd (Raft)

Native leadership at the cluster level itself; for applications - etcd concurrency: 'Session + Mutex/Election'.
Lease-ID с TTL, keepalive; You can store an epoch in a key value.

4. 3 Consul

'session '+' KV acquire ': whoever holds the key is the leader. TTL/heartbeat in session.

4. 4 Kubernetes

Leases coordination API (`coordination. k8s. io/v1`): ресурс `Lease` c `holderIdentity`, `leaseDurationSeconds`, `renewTime`.
The client library'leaderelection' (client-go) implements capture/renewal; ideal for leader-pods.

5) How to build a "safe" leader

5. 1 Keep the era and fencing

Each lead increases epoch (e.g. etcd/ZK revision zxid or separate counter).

All side effects of the leader (writing to the database, performing tasks) must be transmitted 'epoch' and compared:
sql
UPDATE cron_state
SET last_run = now(), last_epoch =:epoch
WHERE name = 'daily-rollup' AND:epoch > last_epoch;

The old leader (after split-brain) will be rejected.

5. 2 Timings

'leaseDuration '≥' 2-3 × heartbeatInterval + network + p99 GC pause '.
Election timeout - randomize (jitter) so that candidates do not collide.
If renewal is lost, immediately stop critical operations.

5. 3 Identity

`holderId = node#pid#startTime#rand`. When updating/removing, check the same holder.

5. 4 Watchers

All followers subscribe to'Lease/Election 'changes and start/stop according to status.

6) Implementations: fragments

6. 1 Kubernetes (Go)

go import "k8s. io/client-go/tools/leaderelection"

lec:= leaderelection. LeaderElectionConfig{
Lock: &rl. LeaseLock{
LeaseMeta: metav1. ObjectMeta{Name: "jobs-leader", Namespace: "prod"},
Client:  coordClient,
LockConfig: rl. ResourceLockConfig{Identity: podName},
},
LeaseDuration: 15 time. Second,
RenewDeadline: 10 time. Second,
RetryPeriod:  2 time. Second,
Callbacks: leaderelection. LeaderCallbacks{
OnStartedLeading: func(ctx context. Context) { runLeader(ctx) },
OnStoppedLeading: func() { stopLeader() },
},
}
leaderelection. RunOrDie(context. Background(), lec)

6. 2 etcd (Go)

go cli, _:= clientv3. New(...)
sess, _:= concurrency. NewSession(cli, concurrency. WithTTL(10))
e:= concurrency. NewElection(sess, "/election/rollup")
_ = e. Campaign (ctx, podID )//blocking call epoch: = sess. Lease ()//use as part of fencing defer e. Resign(ctx)

6. 3 ZooKeeper (Java, Curator)

java
LeaderSelector selector = new LeaderSelector(client, "/leaders/rollup", listener);
selector. autoRequeue();
selector. start(); // listener. enterLeadership () performs leader work with try/finally

7) Re-elections and service degradation

Sharp flappings of the leader → "fish bone" in the charts. Treated by increasing leaseDuration/renewDeadline and eliminating GC/CPU saws.
For the re-selection period, enable brownout: reduce the intensity of background tasks or completely freeze them to a confirmed leadership.
For long jobs, do checkpoints + idempotent dokat after a change of leader.

8) Split-brain: How to stay out

Use CP stores (etcd/ZK/Consul) with quorum; you cannot take a leader without a quorum.
Never build leadership on an AP cache without a quorum arbiter.
Even in the CP model, keep fencing at the resource level - this is insurance against rare abnormal scenarios (pauses, stuck drivers).

9) Observability and operation

Metrics

`leadership_is_leader{app}` (gauge 0/1).
`election_total{result=won|lost|resign}`.
`lease_renew_latency_ms{p50,p95,p99}`, `lease_renew_fail_total`.
'epoch _ value '(cluster monotony).
'flaps _ total'is the number of leader shifts per window.
For ZK/etcd: replication lag, quorum health.

Alerts

Frequent lead change (> N in an hour).
Renewal failures' renew '/high p99.
epoch infeasibility (two different epochs at different nodes).
There is no leader longer than X seconds (if the business does not allow).

Logs/Trails

Link events: 'epoch', 'holderId', 'reason' (lost lease, session expired), 'duration _ ms'.

10) Test playbooks (Game Days)

Partition: break the network between 2 zones - leadership is allowed only in the quorum part.
GC-stop: artificially stop the leader for 5-10s - should lose the lease and stop working.
Clock skew/drift: Make sure that correctness does not depend on wall-clock (fencing/epoch is saved).
Kill -9: Sudden leader crash → new leader ≤ leaseDuration.
Slow storage: slow down disks/Raft log - estimate election time, debug timings.

11) Anti-patterns

"Leader" via Redis' SET NX PX'with no fencing and no quorum.
'leaseDuration'is less than p99 of the critical operation duration.
Stopping/continuing work after losing leadership ("I'll finish a minute").
Lack of jitter in election timers → election storm.
A single long job with no checkpoints - each flap results in a replay from scratch.
Close link of leadership and traffic routing (sticky) without fallback - the bottoms with the flap get 5xx.

12) Implementation checklist

  • Quorum arbiter selected is etcd/ZK/Consul/K8s Lease.
  • Store and pass epoch/fencing into all leader side effects.
  • Configured timings are 'leaseDuration', 'renewDeadline', 'retryPeriod' with network/GC margin.
  • Built-in watchers and correct shutdown when leadership is lost.
  • Leadership tasks are idempotent and checkpoint.
  • Metrics/alerts and logging 'epoch/holderId' are enabled.
  • Held game days: partition, GC-stop, kill, clock skew.
  • Politicians are documented: who/what the leader does, who can replace him, how to resolve epoch conflicts.
[The] Degradation Plan: What a leaderless system does.
  • Performance test: flaps under load do not destroy SLO.

13) FAQ

Q: Can leadership be built without a quorum?
A: In prod, no. You need a CP component (quorum) or a cloud service with equivalent guarantees.

Q: Why epoch if there is lease?
A: Lease provides survivability, but does not protect against the "old leader" after separation/pauses. Epoch/fencing invalidates the effects of the old leader.

Q: What are the defaults of timings in the K8s?
A: Often used 'LeaseDuration≈15s', 'RenewDeadline≈10s', 'RetryPeriod≈2s'. Match your p99 load and GC.

Q: How do you test leadership locally?
A: Run 3-5 instances, emulate network (tc/netem), pause (SIGSTOP), kill leader (SIGKILL), check metrics/logs/epochs.

Q: What to do with long tasks when changing leaders?
A: Checkpoint + idempotent docat; in case of loss of leadership - immediate stop and release of resources.

14) Totals

A reliable choice of leader is a quorum arbiter + discipline of eras. Keep the leadership as a lease with a heartbeat, beat all the effects with a fencing token, set up timings with a margin, make the leader's tasks idempotent and observable, regularly lose crashes. Then "one and only one" performer will not be a slogan, but a guarantee that is resistant to pauses, network whims and human errors.

Contact

Get in Touch

Reach out with any questions or support needs.We are always ready to help!

Start Integration

Email is required. Telegram or WhatsApp — optional.

Your Name optional
Email optional
Subject optional
Message optional
Telegram optional
@
If you include Telegram — we will reply there as well, in addition to Email.
WhatsApp optional
Format: +country code and number (e.g., +380XXXXXXXXX).

By clicking this button, you agree to data processing.