Strong Consistency: When Needed

Strong Consistency is a model in which all operations look like they are performed instantly and consistently in a single global order consistent with real time. The user will read the last confirmed value, and two parallel clients will not logically overtake each other.

Strict consistency gives a simple mental model and protects hard invariants, but requires coordination (quorums/leader), which increases latency and sensitivity to network partitions.

1) When Strong is mandatory

Finance and Settlements

Balances and write-offs: "Double spending" is unacceptable.
Transfers and settlements: the same amount cannot be posted twice.

Inventory and Limits

Remaining goods/hotel space/tickets: you cannot go into negative values.
Transaction limits per unit of time (credit limits, API credits).

Uniqueness and integrity

Unique deduplication logins/IDs/rules.

Invariants at the domain level: "≥1 doctor must be on duty in the department," "there cannot be> N active tasks in the queue."

Auditing and unchanging logs

Events that serve as a legal source of truth: order and completeness are critical.

If violation of the invariant carries an unacceptable business risk (loss of money, sanctions, loss of trust) - choose Strong Consistency.

2) What exactly is "strict"

Linearizability (operational level): reading sees the most recent successful write; times are respected.
Serializable (transaction level): the result is equivalent to executing transactions sequentially (can be strong, but sometimes implemented without a hard real-time order).
An important difference: Serializable protects against transaction level anomalies (phantom/write-skew), and Linearizable protects against single instantaneity and order of single operations. Often you need both properties (for example, money in the database + event log).

3) Price rigor: PACELC and CAP

PACELC: When splitting a network (P), you have to choose C (rigor) or A (availability). Strong → CP: it is better to refuse or block than to violate the invariant. When there is no separation (EL), we pay with L - p95/p99 grows in coordination/quorums.
Practice: strong for the "kernel of invariants," around - fast projections/caches with eventual so that UX does not suffer.

4) How Strong Consistency is achieved

Leadership and quorums

The sole leader accepts the recordings; reading - at the leader or by the quorum of replicas.

Quorum'W 'for writing and'R' for reading with'R + W> N'improves the chances of reading "last."

Matching algorithms

Raft/Paxos: replication log, majority confirmations, term/indexes.
Synchronous replication - The record is validated only after persistence on the quorum.

Hours and order

TrueTime/Hybrid Logical Clocks (HLC): Limit clock misalignment for secure global serialization.
Fence tokens/versioning: protection against "morning" leaders and split-brain.

Transaction isolation

Serializable (SI + predicate conflict checking/lock): protection against phantom/write-skew.
Strict-serializable: serializability + linearizability relative to real time.

5) Multi-region: options and trade-offs

Global Leader (CP)

Records go through one leading region; reads - local caches/projections or through a leader.
Pros: Simple model. Cons: p95/RTT to the leader, with P - record locks.

Regional leaders + synchronous quorum

Geographically expanded quorum from several regions; each record is waiting for confirmations> 50%.
Pros: without a single "narrow neck," high stability. Cons: Intercontinental latency.

Geo-partitioning

Home data for the region (tenant/jurisdiction); global operations - through sagas/aggregates.
Pros: Low latency for local recordings. Cons: Planning data boundaries.

6) Set up R/W and reads

Entries: 'W = majority' is the standard for strong.

Readings:

"Freshest" - 'R = majority' or reading at the leader.
To reduce L - "stale-ok" reads from replicas for secondary screens (explicitly marked in UX).
Read-repair/lease read: optimization without loss of severity for short leases of the leader.

7) Performance and UX

Latency: Focus on RTT between customer and leader/quorum (interregionally hundreds of ms).
"write-strong, read-fast" pattern: strong on write + cache/projection on reads, with RYW for author.
Batch/packets: Group records, but watch for tail latency.
Degradation contours: in an incident - read-only, honest statuses, prohibition of dangerous mutations.

8) Observability of strict-path

Metrics

p50/p95/p99 latency: write quorum, read quorum, leadership readings.
Quorum success, replays/rollbacks, leader changes.
Replication lag (expected small, but monitoring is mandatory).
Share of "steil" reads (if included).

Tracing

Spans: "leader acceptance," "replication," "quorum commit."

Теги: `term`, `leader_id`, `quorum_size`, `region`.

Alerts

Growth p95/p99, frequent re-election leader, quorum-timeouts, split-brain indicators.

9) Tests and chaos

Jepsen-like: network partitions, delays, drops, clock-skew.
Safety-invariants: impossibility of double spending/negative balances/double booking.
Leadership: leader refusal, re-election under load, fence tokens.
Read consistency: reading immediately after writing should see "new" (RYW/linearizable read).

10) Incident playbooks

Quorum loss: switch to read-only, notify clients, send an entry to the "home" region if geo-partitioning is present.
The growth of latency is interregional: temporarily reduce the volume of strict records (migration of some streams in the queue/projections), localize traffic.
Leader Flap: Increase election timeouts, check networks/hour-long drifts/GC pauses.
Split-brain: enable fence-tokens/lease-checks, stop old leaders at the operator level.

11) Typical errors

Demand Strong "everywhere": an explosion of latency and cost instead of focusing on invariants.
Trying to be CA under real splits: At point P, the system still makes a choice, often implicitly.
Dual-write to different regions without sagas/coordinator: phantoms and loss of invariants.
Absence of RYW: the user does not see his newly recorded entity - a drop in trust.
Ignoring the clock: Without HLC/TrueTime boundaries, it's easy to get "jumping" time and racing.
There is no degradation plan: at P, chaotic partial failures begin.

12) Quick fixes (recipes)

Payments/balances: leader + majority-quorum; strict-serializable transactions short timeouts, hard failure at P.
Booking (seats/slots): write-strong through leader, reads - cache with RYW; TTL-reserves + TCC.
Global SaaS: geo-partition by 'tenant/region'; strict operations in the home region, reports/search - through projections.
Audit/log: append-only CP-log; reads can be cached, but verified with checkpoints.

13) Pre-sale checklist

Invariants requiring strong were written; the rest is in AP/projection.
Single leader/quorum interregional/geo-partition selected.
Configured 'W = majority', 'R = leader' majority 'for critical paths.
RYW/monotonic provided for UX; explicitly marked "stale-ok" reads.
Included metrics of quorum, lags, latencies; alerts on p95/p99 and re-election.
There is a degrade plan: read-only, disabling dangerous mutations, queues for "after the storm."
Chaos tests: divisions, clock-skew, leader failure; safety invariants were checked.
Contract documentation: what is strict, what "may be behind," communication for product/support.

Conclusion

Strong Consistency is a tool for protecting truth where error is unacceptable. Apply it pointwise around hard invariants, consciously paying for coordination with latency and availability in storms. Combine: CP-kernel for critical, AP-reading and projection for speed. With the right telemetry, degradation and tests, you will retain both correctness and user experience.