GH GambleHub

Data synchronization via API

1) Why synchronization is needed and what are the goals

Domain consistency: profile, wallet, directories, limits, KYC.
Lowering lags: almost real-time for critical processes (payments, bonuses).
Resilience: experiencing network/provider outages without loss of events.
Economics: Minimize egress/CPU through deltas and packetization.

Success metrics: lag (s) between source and consumer, freshness, proportion of duplicates, percentage of conflicts, cost of GB/hour of blue.

2) Synchronization models

2. 1 Pull (polling)

The client requests changes at intervals.

Pros: simplicity, load control.
Cons: lag, "empty" polls, risk of skipping at a high rate of change.
Improvements: If-Modified-Since, Etag/If-None-Match, change_token.

2. 2 Push (webhooks/events)

The source fluffs the events to the recipient.

Pros: almost real-time, poll economy.
Cons: need delivery with retrays, deduplication, security (signature, mTLS).
Requirements: idempotent consumers, exponential backoff, replay.

2. 3 CDC/Streaming (Change Data Capture)

Snapshot of changes from the transaction log/event log (Kafka, Debezium).

Pros: completeness, order, scale.
Cons: complexity, you need control over the types of operations (insert/update/delete/tombstone).

2. 4 Hybrid

Webhooks as a "trigger," polling as a fallback and for reconciliation.

3) Incremental deltas

3. 1 Watermark (timestamp)

The client stores'last _ seen _ ts' and requests'updated _ at> watermark '.

Risks: hour drift - use UTC and NTP; take overlap window for 1-2 min and dedup by ID + version.

3. 2 Change Token / Cursor

Stable sequence token: '? cursor = eyJvZmZzZXQiOjEwMDB9'.

Pros: resiliency to change order, scale.
Requirements: non-depleted cursors, TTL and safe replay.

3. 3 Numbered offsets (auto-increment)

`id > last_id`. Simple, but breaks down when sharding and "holes" in the sequence.

4) Large sample pagination

Keyset/cursor (preferred): '? after = cursor & limit = 1000' - stable with changes.
Offset/limit - simple, but expensive and subject to shifts.
Always specify a stable sort key (for example, '(updated_at, id)').

Example of a cursor response:
json
{
"items": [ { "id": "u_1", "updated_at": "2025-11-03T16:59:10Z" } ],
"next_cursor": "eyJ1cGRhdGVkX2F0IjoiMjAyNS0xMS0wM1QxNjo1OToxMFoifQ==",
"has_more": true
}

5) Change semantics: upsert, merge, delete

5. 1 Upsert/merge

'PUT/resource/{ id} 'is a complete replacement.
'PATCH/resource/{ id} '- partial update (merge patches with validation).
Idempotency by'Idempotency-Key 'for all write.

5. 2 Deletions

Soft delete (field 'deleted = true', 'deleted _ at') - save the history; sink gives tombstone.
Hard delete - give the event'deleted' before disappearing.

Example of tombstone:
json
{ "id":"u_1", "event":"deleted", "deleted_at":"2025-11-03T17:00:00Z" }

6) Version control and competition

6. 1 ETag/If-Match (optimistic locks)

Read returns' ETag: "v123" '.

Update from 'If-Match: "v123"' - protection against "lost updates."

In case of conflict - 409 Conflict with'error _ code: "CONFLICT_VERSION"'.

6. 2 Versioning of records

Field 'version '/' updated _ at' - in delta calculation and deduplication.

6. 3 Conflicts

Policies: last-write-wins, server-wins, merge-strategy by fields (for example, sums → additive, flags → source priority).

7) Ordering and deduplication

7. 1 Delivery procedure

Guarantees: at-least-once plus idempotency → de facto standard.
For critical cash flows - exactly-once effects through the idempotency store.

7. 2 Idempotence keys

Composition of domain fields: 'source _ id' event _ type 'sequence'.
Storage TTL 24-72 hours (or more in SLAs).

7. 3 Deduplication

Save the last version/seq applied to the receiver; drop older ones.

8) Repetitions, timeouts, backoff

Retriable: 5xx/429/408/timeouts; Non-retriable: 400/401/403/404/409/422/410/412.
Exponential backoff + jitter: 1s, 2s, 4s... to 30-60s.
Retry-After respect for 429/503.
Client timeouts: connection 3-5s, general request 10-30s; total limit of attempts 3-6.

9) Lags and SLA control

9. 1 SLI/SLO

SLI Lag: median/p95 lag between 'occurred _ at' and 'applied in consumer.'

SLO: for example, 'p95 lag ≤ 60s (28d)', "share of lost events = 0," "share of duplicates ≤ 0. 01%».
Error Budget: spend on releases/experiments.

9. 2 Metrics

`sync_lag_seconds`, `events_received_total`, `events_applied_total`, `duplicates_total`, `conflicts_total`, `retries_total`, `backlog_size`, `cursor_advance_rate`.

10) Reconciliation and backfill

Day/hour reconciliations: totals/window hashes.
Reconciliation API: 'GET/reconciliation? from =... & to =... 'returns checksums and variances.
Backfill: secure reloading of historical data in batches with a cursor, without a DDOS source; observe the limits.

11) Schemes and examples

11. 1 Webhook events (signed)

json
{
"event": "user. updated",
"id": "evt_01HX",
"occurred_at": "2025-11-03T18:00:05Z",
"sequence": 123456,
"data": { "id": "u_1", "email": "a@b. com", "updated_at": "2025-11-03T18:00:02Z" }
}
Titles:
  • `X-Signature: sha256=`
  • `X-Event-Id: evt_01HX`
  • `X-Retry: 0..N`

11. 2 Incremental sampling (polling)

`GET /v1/users? updated_after=2025-11-03T17: 58:00Z&cursor=...&limit=1000`

11. 3 Idempotent upsert


POST /v1/users
Idempotency-Key: upsert-u_1-20251103T1800Z
{ "id":"u_1","email":"a@b. com","version":124 }
→ 201/200 (stable)

12) Safety and compliance

Auth: OAuth2 scopes/JWT; for link channels - mTLS on demand.
Captions: HMAC headlines for webhooks, rotating secrets.
PII minimization, masking in logs; GDPR/DSAR Upload/Delete.
RBAC/ABAC: tenant/organization access, strict quotas.

13) Observability and logs

Лейблы: `env`, `service`, `tenant`, `source`, `cursor`, `seq`, `event_type`.
Correlation: 'trace _ id' from input → apply to logs and traces.
Dashboards: lag, backlog, cursor speed, type errors, 429/5xx, cost (egress/min).

14) FinOps: synchronization cost

Batching (batch size 100-1000) + compression (gzip/br).
Caching and ETag for unchanged pages.
Thin payloads: only changed fields, a link to a full resource on demand.
Concurrency limits and "night windows" for backfill.

15) Testing and quality

15. 1 Contracts and negative cases

Validate JSON schemes, required fields, stability 'error _ code'.
Tests: out-of-order, duplicates, skipping events, version conflict, 429/5xx.

15. 2 Chaos/games

Injections: network delays, drop 10-30% of events, reorder.
Criteria: maintained order/integrity? no losses? lag within SLO?

16) Implementation checklist

  • Selected model (push/pull/hybrid) and source of truth.
  • Incremental deltas: watermark or cursor/token.
  • Pagination: cursor/keyset with stable grade.
  • Idempotency-store, keys and TTL; dedup by '(id, version/seq)'.
  • ETag/If-Match and conflict policy (LWW/server-wins/merge).
  • Retry/backoff/jitter, respect 'Retry-After'.
  • Metrics lag/backlog/duplicates/conflicts, dashboards and alerts.
  • Reconciliation API + daily reconciliations.
  • Security: OAuth2/JWT, webhook signatures, mTLS, PII policies.
  • FinOps: batch + compression, concurrency limits, egress quotas.
  • Test suite: reorder, duplicates, outages, backfill.

17) Implementation plan (3 iterations)

1. MVP (1-2 weeks):

Cursor pagination, watermark deltas, idempotent upsert, basic lag/backlog, retry + backoff metrics.

2. Scale (2-3 weeks):

Webhooks as trigger + polling-fallback, HMAC signatures, reconciliation, ETag/If-Match, dashboards and burn alerts by lag.

3. Pro (3-4 weeks):

CDC/streaming (Kafka/Debezium) for hot domains, auto-backfill, DR scripts, FinOps optimization (batch/brotley), SLA for lag and reporting.

18) Mini-FAQ

What to choose: watermark or cursor?
Cursor/keyset is more resistant to reorder and scale; watermark is easier to start, but add overlap and deadup.

Is it needed exactly-once?
In general, expensive. Practice - at-least-once + idempotency; exactly-once - for monetary effects only.

How to minimize conflicts?
Use ETag/If-Match, design merge by fields, avoid "hidden" side effects.

Total

Reliable synchronization is incremental deltas + correct pagination + idempotency and version control, enhanced by observability, sparkles and economical transport. Choose the right model (push/pull/CDC), pin the SLO on the lag, implement conflict policies and dirty scenario tests - and your data exchange becomes predictable, sustainable and cost-effective.

Contact

Get in Touch

Reach out with any questions or support needs.We are always ready to help!

Telegram
@Gamble_GC
Start Integration

Email is required. Telegram or WhatsApp — optional.

Your Name optional
Email optional
Subject optional
Message optional
Telegram optional
@
If you include Telegram — we will reply there as well, in addition to Email.
WhatsApp optional
Format: +country code and number (e.g., +380XXXXXXXXX).

By clicking this button, you agree to data processing.