Synchronization of analytical data

1) Why does the ecosystem need analytics synchronization

The network brings together operators, studios/RGS, affiliates, PSP/APM, KYC/AML providers and media. To see a single picture (funnels CR→FTD→ARPU/LTV, RG/compliance, transport SLO, finance/RevShare), the ecosystem needs canonical, timely and provable data synchronization between chains and storefronts - without "two truths," with an explicit history of change and cost control.

2) Ontology and data contracts

Сущности: `eventId`, `traceId`, `participantId`, `role` (operator/studio/affiliate/psp/kyc/stream), `jurisdiction`, `brandId`, `campaignId`, `apmRouteId`, `gameId`, `tableId`, `currency`, `schemaVersion`, `formulaVersion`.

Canonical events (minimum):

`click`, `session_start`, `registration`, `kyc_status`, `deposit`, `ftd`, `bet/spin`, `reward_granted`, `withdrawal`, `postback_sent/received`, `rg_guardrail_hit`, `stream_sli`.

Data Contracts:

Schemas in Schema Registry (semver, field compatibility)
owners, aggregation windows, freshness and completeness SLAs;
error policy (nullable/stubs), directories (currencies, locales, RTP profiles).

Metric Store: formula versions (GGR/NetRev/CR/ARPU/LTV, K-factors), their owners and date of entry - the formula is always kicked in the report.

3) Temporal semantics and windows

Event Time vs Processing Time: Aggregations should be based on event time, not processing time.
Watermarks: to monitor "late" events; acceptance policy (for example, T + 24h).
Windows: sliding/calendar, with recalculation during overloads.
Delay as metric: 'ingest _ lag' and 'publish _ lag' are published for each showcase.

4) Transport and synchronization modes

1. CDC/streaming (real-time):

event bus (EDA), participation by 'traceId/participantId';

"exactly once in meaning" through consumer idempotency and body hashes;

curated topics: raw events, normalized, aggregates/oracles.

2. Batch/microbatch:

incremental uploads with cursor pagination (temporary/log cursors);

formats: Parquet/Avro with schema; party manifestos.

3. API/Webhooks:

'/vN/events' with cursors and'Idempotency-Key';

webhooks signed (JWS/HMAC), replay registry, backoff + jitter.

4. Asset-sink:

directories/locales/catalogs of games as versioned bundles (hashes, TTL).

5) Idempotence, dedup and late events

Idempotency-Key and body hash on critical paths (payments/postbacks).
Deduplication: window ± 5 minutes/watermark; storage of "seen" hashes.
Late events: upsert/backcount policy; changelog storefronts.
Exactly-once in business sense: we do not require "broker magic," we require consumer idempotence and determinism of schemes.

6) Reconciliation of attributions and formulas

Attribution: last optional touch rule with windows by channels/jurisdictions, cross-device - only through tokens (without raw PD).
Metric formulas: each entry references' formulaVersion '; MAJOR changes are published as' data _ formula _ change'events.
Backfill according to the rules: when changing the formula, double publication (old/new) is allowed in the transition period (frozen-period).

7) Data Quality: SLI/SLO and Conformance Tests

Data quality SLI:

Freshness (publish_lag p95),
Completeness (proportion of events vs reference),
Uniqueness (proportion of duplicates),
Consistency (currency/locale/ID),
Accuracy (checksums/oracles),
Time linearity (late events in the corridor).

SLO (landmarks):

publish_lag p95 ≤ 1-5 s (operating panels), ≤ 15 min (fin. units);
completeness ≥ 99. 5% at T + 15 min, ≥ 99. 9% in T + 24h;
duplicate ≤ 0. 1‰; oracle discrepancy ≤ 0. 1–0. 3%.

Conformance tests: schemes, mandatory fields, directories, webhook signatures, cursor uploads without gaps.

8) Lineage, auditing and oracles

Lineage: from storefront/dashboard to primary sets (schematics/versions/owners).
WORM audit: immutable schema/formula/key/exception logs.
Oracles (signed summaries): GGR/NetRev/SLO/RG with 'formulaVersion', 'hash (inputs)', 'kid', 'traceId' - a source of truth for invoices and appeals.
Trial "trace packages": SLA 60-90 s for P1/P2 incidents.

9) Privacy, localization and security

PII-minimization: tokenization of 'playerId', prohibition of personal data in logs/showcases, detokenization only in safe zones.
Localization: maps of jurisdictions (where we store/process data classes).
Zero Trust: mTLS, short-lived tokens, egress-allow-list, key rotation/JWKS.

ABAC/ReBAC/SoD: "see theirs and agree" access; "measure ≠ influence ≠ change."

10) Financial reconciliation and settlement

Canon Net Revenue (simplified):

[
NetRev = GGR - BonusCost - Jackpot/PoolShare - PaymentFees - Chargebacks - Tax/Levy - FraudLosses
]

Reconciliation:

cursor uploads, "ors" (signed aggregates), checksums;
invoice statuses, discrepancy acts, and parsing SLAs;
FX rules, NET7/14/30, holds and klau-backs.

11) Synchronization cost management

Cardinality policies: prohibition of 'userId '/raw URL in labels; 'routeId/campaignId'allowed.
Downsampling/roll-ups: 1с→1м→5м; RAW data lives short, aggregates last longer.
Adaptive sampling of traces: base percentage + priority for errors/slow paths/new versions.
SLO-first: Collect only what supports solutions (SLO/Finance/RG).

12) Synchronization dashboards

Data Sync Overview: publish_lag, completeness, duplicates, late ratio, schema drift, conformance errors.
Attribution Health: timeliness of postbacks, dedup windows, controversial cases.
Finance/Oracle: discrepancy between aggregates and oracles, invoice statuses.
Jurisdiction Map: location/PD flows, DPA/DPIA compliance.

13) Operations, Incidents, RCA

Alerts: burn-rate in freshness/completeness, drift of schemes, surge of duplicates.

War-room: ready-made playbooks for tires/webhooks/CDC/storefronts; Stop buttons for aggregations/formulas

RCA "without search guilty": faktgipotezaexperimentvyvoddeystviye; post-mortem SLO.

14) Anti-patterns

"Two truths" by metrics/formulas and accession dates.
Offset pagination of history under load (cursors only).
Raw personal data in logs/showcases; no tokenization.
Postback zoo without signatures and idempotency → doubles/holes.
Mixing Event/Processing Time in aggregations.
No watermarks and no late events policy.
Manual reconciliation (Excel/manual uploads) instead of oracles.
Single large tables with unlimited cardinality of labels.

15) Checklists

Design

Ontology, Schema Registry, owners, reference books.
Metric Store с `formulaVersion` и frozen-period для MAJOR.
Time semantics (event time, watermarks), late event policy.
Transport: EDA/CDC, API/signed webhooks, cursors, idempotency.
Data Quality SLI/SLO, conformance tests, alerts.
Privacy/Localization (DPIA/DPA), Zero Trust, ABAC/ReBAC/SoD.
Oracles and reconciliation rules.

Start

Sandbox and Load/Chaos-Bus Runs/Display Cases.
Canary synchronization 1%→5%→25%→50%→100% with guardrails.
Dashboards publish_lag/completeness/duplicates/drift.
Documentation of formulas and effective dates; release-notes `data_formula_change`.

Operation

Weekly DQ report; SLO/guardrails revision.
Monthly changelogs of schemes/formulas/accesses.
Regular DR/xaoc for broker/ingestors/storefronts.

16) Maturity Roadmap

v1 (Foundation): unified schemes, basic CDC/batch, cursors, DQ-SLI, manual reconciliation.
v2 (Integration): watermarks and late event policy, oracles, synchronization dashboards, auto retrays with jitter.
v3 (Automation): predictive freshness/completeness monitoring, smart-reconciliation, auto-re-indexing, adaptive sampling.
v4 (Networked Governance): inter-chain exchange of oracles/quality signals, DAO rules of formulas and transparent treasuries.

17) Success metrics

Data quality: publish_lag p95, completeness%, duplicate ‰, late%, schema drift rate.
Uniformity: the proportion of reports with a fixed 'formulaVersion', the number of MAJORs without incidents.
Finance: discrepancy with oracles, share of auto-reconciliation, dispute <X%.
Operations: MTTD/MTTR synchronization incidents, share of auto-stops/rollbacks.
Compliance: 0 PD leaks, successful DPIA/DPA checks, 100% availability of WORM logs.
Observability economics: Cost-to-Sync per rps/event, cardinality compliance.

Brief Summary

Synchronization of analytical data is not copying tables, but a protocol of trust and time: canon of schemes and formulas, event-time with watermarks, cursors and idempotency, dedup and late events, DQ-SLO and oracles, privacy and localization. By following this framework, the ecosystem receives unified, fresh and provable analytics - the basis for fast solutions, honest calculations and scalable network growth.

Synchronization of analytical data

Start

Operation

Brief Summary

Get in Touch

Quick Contact

The video will be updated soon

We are currently very busy with projects