Replication and eventual consistency

1) Why eventual consistency

When the system is distributed by zones/regions, synchronous recording everywhere gives high latency and low availability in case of network failures. Eventual consistency (EC) allows temporary misalignment of replicas for the sake of:

low recording delay (local reception),
better availability during network partitions,
horizontal scaling.

The key task is controlled lax consistency: the user sees "fairly fresh" data, domain invariants are preserved, conflicts are detected and resolved predictably.

2) Consistency models - what we promise the customer

Strong: Reading immediately sees the last entry.
Bounded stale/read-not-older-than (RNOT): read not older than mark (LSN/version/time).
Causal: A "causal" relationship (A to B) is maintained.
Read-Your-Writes: A client sees his recent recordings.

Monotonic Reads: Every next read is not "rolled back."

Session: a set of guarantees in one session.
Eventual: if there are no new entries, all replicas converge.

Practice: Combine Session + RNOT on critical paths and Eventual on storefronts/caches.

3) Replication: mechanics and anti-entropy

Synchronous (quorum/RAFT): the record is considered successful after confirmation by N nodes; minimum RPO, above p99.
Asynchronous: leader commits locally, distributes log later; low latency, RPO> 0.
Physical (WAL/binlog): fast, homogeneous.
Logical/CDC: row/event level change flow, flexible routing, filters.
Anti-entropy: periodic reconciliation and repair (Merkle trees, hash comparison, background re-sync).

4) Version identifiers and causality orders

Monotone versions: increment/LSN/epoch; simple, but do not encode parallelism.
Lamport timestamp: partial order by logical clock.
Vector clock: fixes parallel branches and allows you to detect conflicting updates (concurrent).
Hybrid/TrueTime/Clock-SI: "Not before T" logic for global order.

Recommendation: for CRDT/conflicting updates - vector clock; for "not older" - LSN/GTID.

5) Conflicts: Discovery and Resolution

Typical situations: recording from two regions to the same object.

Strategies:

1. Last-Write-Wins (LWW) by hour/logical stamp - simple, but may "lose" updates.

2. Merge functions by domain logic:

counter fields are added (G-Counter/PN-Counter),
sets are combined with "add-wins/remove-wins,"
amounts/balances - only through transaction journals, not through a simple LWW.
3. CRDT (convergent types): G-Counter, OR-Set, LWW-Register, RGA for lists.
4. Operational transformations (rarely for databases, more often for editors).
5. Manual resolution: conflict in "inbox," the user selects the correct version.

Rule: Domain invariants dictate strategy. For money/balances - avoid LWW; Use compensated transactions/events.

6) Record warranties and idempotency

Idempotent keys on commands (payment, withdraw, create) → retry is safe.
Inbox and outbox deduplication by idempotence key/serial number.
Exactly-once is unattainable without strong premises; practice at-least-once + idempotency.
Outbox/Inbox pattern: writing to the database and publishing the event is atomic (local transaction), the recipient processes by idempotency-key.

7) No Older X Reads (RNOT)

Technicians:

LSN/GTID gate: the client transmits the minimum version (from the write response), the router/proxy sends to the replica that caught up with LSN ≥ X, otherwise - to the leader.
Time-bound: "not older than 2 seconds" - a simple SLA without versions.
Session pinning: after recording N seconds, we read only the leader (Read-Your-Writes).

8) Change flows and cache negotiation

CDC → event bus (Kafka/Pulsar) → consumers (caches, indexes, storefronts).
Cache disability: topics' invalidate: {ns}: {id} '; idempotent processing.
Rebuild/Backfill: If out of sync, reassemble the projections from the event log.

9) Sagas and compensations (inter-service transactions)

In the EC world, long-lived operations are divided into steps with compensatory actions:

Orchestration: The coordinator calls the steps and their compensations.
Choreography: steps react to events and publish the following themselves.

Invariants (example): "balance ≥ 0" - check at step boundaries + compensation for deviation.

10) Multi-region and network partitions

Local-write, async-replicate: write to local region + deliver to other (EC).
Geo-fencing: data is "glued" to the region (low latency, fewer conflicts).
Quorum databases (Raft) for CP data; caches/storefronts - AP/EC.
Split-brain plan: if communication is lost, the regions continue to operate within domain limits (write fencing, quotas), then reconcile.

11) Observability and SLO

Metrics:

Replica lag: time/LSN-distance/offset (p50/p95/p99).
Staleness: The percentage of responses that are above the threshold (for example,> 2s or LSN
Conflict rate: rate of conflicts and successful merge.
Convergence time: the convergence time of the replicas after the peak.
Reconcile backlog: volume/time of lagging batches.
RPO/RTO by data category (CP/AP).

Alerts:

Lag> target, increase in conflicts, "long" windows of infeasibility.

12) EC Data Scheme Design

Explicit version/vector in each entry (columns' version ',' vc ').
Append-only logs for critical invariants (balances, accruals).
Event identifiers (snowflake/ULID) for order and deduplication.
Commutative fields (counters, sets) → CRDT candidates.
API design: PUT with if-match/etag, PATCH with precondition.

13) Storage and reading patterns

Read model/CQRS: writing to the "source," reading from projections (may lag behind → display "updated...").
Stale-OK routes (catalog/tape) vs Strict (wallet/limits).
Sticky/Bounded-stale flags in the request (header 'x-read-consistency').

14) Implementation checklist (0-45 days)

0-10 days

Categorize data: CP-critical (money, orders) vs EU/steel-OK (catalogs, search indexes).
Define Steele SLOs (e.g. "not older than 2s"), target lags.
Enable object versioning and idempotency-keys in API.

11-25 days

Implement CDC and outbox/inbox, cache disability routes.
Add RNOT (LSN gate) and session pinning to write-critical paths.
Implement at least one merge strategy (LWW/CRDT/domain) and conflict log.

26-45 days

Automate anti-entropy (reconciliation/repair) and styling reports.
Spend game-day: network separation, surge in conflicts, recovery.
Visualize on dashboards: lag, staleness, conflict rate, convergence.

15) Anti-patterns

Blind LWW for critical invariants (loss of money/points).
Lack of idempotency → duplicates of operations during retrains.
The "strong" model on the whole → excessive p99 tails and fragility in case of failures.
No RNOT/Session guarantees → UX "blinks," users "do not see" their changes.
Hidden cache and source misalignment (no CDC/disability).
Lack of a reconcile/anti-entropy tool - data "for centuries" diverges.

16) Maturity metrics

Replica lag p95 ≤ target (for example, ≤ 500 ms within a region, ≤ 2 s inter-regions).
Staleness SLO is performed ≥ 99% of requests on "strict" routes.
Conflict resolution success ≥ 99. 9%, average resolution time ≤ 1 min.
Convergence time after peaks - minutes, not hours.
100% of "money" transactions are covered by idempotency keys and outbox/inbox.

17) Recipes (snippets)

If-Match/ETag (HTTP)


PUT /profile/42
If-Match: "v17"
Body: { "email": "new@example. com" }

If the version has changed - '412 Precondition Failed' → the client resolves the conflict.

Query "not older than LSN" (pseudo)


x-min-lsn: 16/B373F8D8

The router chooses a replica with'replay _ lsn ≥ x-min-lsn ', otherwise it is the leader.

CRDT G-Counter (idea)

Each region keeps its own counter; Total - the sum of all components Replication - Operation is commutative.

18) Conclusion

Eventual consistency is not a compromise of quality, but a conscious contract: somewhere we pay freshness for the sake of speed and availability, but we protect critical invariants with domain strategies and tools. Enter versions, idempotency, RNOT/Session warranties, CDC and anti-entropy, measure lag/staleness/conflicts - and your distributed system will be fast, stable and predictably converging even under glitches and peak loads.

Replication and eventual consistency