GH GambleHub

Eventual Consistency in Practice

Eventual consistency (EC) is a model in which copies of data may temporarily diverge, but eventually converge without global coordination. This is the key to high availability (AP by CAP) and low latency (PACELC) if invariants, merge rules, and customer warranties are correctly defined.

1) When to choose EC (and when not)

Fit:
  • Feeds, profiles, likes/counters, directories/searches, cached views.
  • Global systems with local entries and soft invariants.
  • Projections (CQRS), where the source of truth is a strict kernel and reads are asynchronous.
Not suitable:
  • Hard invariants: money, uniqueness, limits, inventory "do not go into minus." There - CP/stronger EC, sagas/TSS.

2) EC data design: conflicts and their resolution

Principle: Each record carries version metadata and a deterministic merge function.

Timestamps/versioning: 'version', 'ts', 'actor'.

Vector clock: fixes causality, allows you to understand "conflicting parallels."

The rules of the merge are:
  • LWW (Last-Write-Wins): Simple and fast, but can lose "meaning."
  • CRDT: commutative/idempotent structures, guarantee convergence.
  • Domain merge: business function (for example, merge lists without duplicates, sum counters, "newest email + tag merge").
CRDT selection:
  • G-Counter/PN-Counter → counters.
  • Sets of OR-Set → (deletions without "sticking").
  • Registers → LWW-Register (with caution about "losses").
  • Maps/documents → Map of CRDTs.
  • Co-authoring → text CRDT/OT.

3) Replication and anti-entropy

Gossip/anti-entropy: periodic exchange of states/hashes between nodes.
Hinted handoff: Temporarily "depositing" an entry for an inaccessible node.
Read repair: when reading, they found an inconsistency - they pulled up the latest versions.
Change packages (deltas): we drive deltas, not full shots.
Quorums R/W: adjust 'R', 'W', 'N' to compromise speed and freshness (for example, 'R + W> N' is closer to strong on the "last record").

4) Customer warranties over EC

Read-Your-Writes (RYW): the author sees it after his recording (sticky-session/version marking).
Monotonic Reads: do not "roll back" the client to an older value (keep watermark the latest version).
Causal Consistency: preserve causality within the session/action flow (vector labels in headers/tokens).
Bounded Staleness: warranty "no older than Δ t/N versions" for UX-critical screens.

5) UX patterns for EC

Optimistic updates: instantly reflect the action, marking "synchronization."

Freshness marking: badge "updated X sec ago," button "Update."

Conflict-UI: for rare collisions - "show both versions and select/combine."

Skeleton/placeholder + soft refresh: do not block UI by waiting for global quorums.

6) Architectural templates

6. 1 CQRS + projections

Write-kernel (CP): strict invariants.
Read-plane (EC): asynchronous projections, indexes, caches; lag is acceptable.

6. 2 AP Multi-Region

Write locally fast, replicate asynchronously.
Geo-partitioning: data "lives" closer to the user; cross-region - aggregates.
CRDT/merge functions relieve the pain of conflicts.

6. 3 Quorum tuning

yaml consistency:
replicas: 3 # N write_quorum: 2 # W read_quorum: 2 # R => R + W> N, closer to freshness on "last record"
read_repair: true hinted_handoff: true

7) Versioning and merge policies (example)

yaml entity: "profile"
versioning:
clock: "vector"    # или "hybrid_time"
fields:
name:   { merge: "lww" }
emails:  { merge: "set_union" }   # OR-Set tags:   { merge: "or_set" }
likes:   { merge: "pn_counter" }
conflict_ui:
enabled: true show_diff_for: ["name"]
auto_merge_for: ["emails","tags","likes"]

8) EC observability: what to measure

Staleness Age (p50/p95/p99): 'now − data_version_ts' or "number of lag versions."

Replication Lag: delay delivery between regions/sites.
Conflict Rate: share of parallel updates, distribution by types.
Read-Repair Rate/Latency: how often and how quickly we "treat" when reading.
Convergence Time: Time to convergence after a burst of records/node failure.

Semantic SLOs: "95% of profiles are not older than 2s," "99% of the feed converges <10s."

9) Runbook 'and incidents

Scenarios:

1. Lag growth interregionally: reduce 'write fan-out', include aggressive read-repair, troll heavy writers.

2. Surge in conflicts: temporarily enable a more "strict" rule (for example, causal/RYW), limit competitive updates on hot keys.

3. Projection lag: prioritize replication queues, temporarily cut the frequency of non-critical updates.

4. The data "stuck" in some nodes: force-anti-entropy, party rebalance, hinted handoff audit.

5. Manual parsing: unloading conflicting keys, "merge-preview" tool, battle fix.

10) EC testing

Jepsen-like tests: network splits, clock-skew, rewrites.
Property-based: invariants of merge functions (commutativity, idempotency, associativity).
Fuzz conflicts: parallel updates for one key with a variable delivery order.
Load "saws": alternating bursts/lulls to assess convergence time.
UX simulations: RYW/monotonic visibility in typical scenarios.

11) Multi-tenant and plans

Tags' tenant _ id/plan/region'in events/records.
Fairness: Replication/repair per tenant limits so that the "noisy" client does not increase the overall staleness.
Residency: data and its replicas within the jurisdiction; cross-regional views only aggregates.

12) Typical errors

LWW "for everything." Loses semantic parallel changes; use CRDT/domain merge.
There are no customer guarantees. The user "does not see" his own record → loss of trust.
No observability of obsolescence. There are no staleness/lag → "hidden degradation" metrics.
Dual-write to different systems without merge. Phantoms and divergences are infinite.
Global order at all costs. Extra quorums kill p95, and local order is enough for businesses.

13) Quick recipes

Feed/tape: EC + causal/RYW for author, CRDT for reactions, staleness p95 ≤ 2-5c.
Profiles/settings: bounded staleness (≤1 -2c), RYW, domain merge (union sets).
Global catalog: geo-partition, asynchronous replication, read-repair on demand, conflicts via OR-Set.
Metrics/counters: PN-Counter, consolidation in the background; Displays "approximate" values with a label.

14) Mini-standard (verbal scheme)

Write-edge: local record with version ('vector/hybrid'), event log.
Replication: очереди + gossip/anti-entropy, hinted handoff.
Storage: partitioning by key, CRDT/merge functions at the write level.
Read-plane: caches with read-repair, RYW/monotonic tokens, bounded staleness for critical screens.
Observability: lags/obsolescence/conflicts, alerts for exceeding SLO stealth.

15) Pre-sale checklist

  • Invariants and where EC is allowed are clearly described.
  • Vector/hybrid and merge/CRDT deterministic functions are selected.
  • Implemented customer warranties (RYW/monotonic/causal) for critical UXs.
  • Replication, read-repair, hinted handoff configured; R/W quorums are documented.
  • staleness/lag/convergence metrics and p95/p99 threshold alerts.
  • Runbook 'and on the growth of conflicts/lags; safe hand merge tools.
  • Tests for network partitions, parallel updates and convergence property.
  • Multi-tenant limits and residency policies are considered.
  • UX freshness indicators and fallback behavior are consistent with the product.

Conclusion

Eventual consistency is not a "compromise for compromise," but a scalability and availability tool. If you formalize invariants, choose the correct merge functions (preferably CRDT where appropriate), give customer guarantees and measure the staleness and convergence time, the system will be fast, stable and honest - both for users and for business.

Contact

Get in Touch

Reach out with any questions or support needs.We are always ready to help!

Telegram
@Gamble_GC
Start Integration

Email is required. Telegram or WhatsApp — optional.

Your Name optional
Email optional
Subject optional
Message optional
Telegram optional
@
If you include Telegram — we will reply there as well, in addition to Email.
WhatsApp optional
Format: +country code and number (e.g., +380XXXXXXXXX).

By clicking this button, you agree to data processing.