Synchronization of analytical data
1) Why does the ecosystem need analytics synchronization
The network brings together operators, studios/RGS, affiliates, PSP/APM, KYC/AML providers and media. To see a single picture (funnels CR→FTD→ARPU/LTV, RG/compliance, transport SLO, finance/RevShare), the ecosystem needs canonical, timely and provable data synchronization between chains and storefronts - without "two truths," with an explicit history of change and cost control.
2) Ontology and data contracts
Сущности: `eventId`, `traceId`, `participantId`, `role` (operator/studio/affiliate/psp/kyc/stream), `jurisdiction`, `brandId`, `campaignId`, `apmRouteId`, `gameId`, `tableId`, `currency`, `schemaVersion`, `formulaVersion`.
Canonical events (minimum):- `click`, `session_start`, `registration`, `kyc_status`, `deposit`, `ftd`, `bet/spin`, `reward_granted`, `withdrawal`, `postback_sent/received`, `rg_guardrail_hit`, `stream_sli`.
- Schemas in Schema Registry (semver, field compatibility)
- owners, aggregation windows, freshness and completeness SLAs;
- error policy (nullable/stubs), directories (currencies, locales, RTP profiles).
Metric Store: formula versions (GGR/NetRev/CR/ARPU/LTV, K-factors), their owners and date of entry - the formula is always kicked in the report.
3) Temporal semantics and windows
Event Time vs Processing Time: Aggregations should be based on event time, not processing time.
Watermarks: to monitor "late" events; acceptance policy (for example, T + 24h).
Windows: sliding/calendar, with recalculation during overloads.
Delay as metric: 'ingest _ lag' and 'publish _ lag' are published for each showcase.
4) Transport and synchronization modes
1. CDC/streaming (real-time):
event bus (EDA), participation by 'traceId/participantId';
"exactly once in meaning" through consumer idempotency and body hashes;
curated topics: raw events, normalized, aggregates/oracles.
2. Batch/microbatch:
incremental uploads with cursor pagination (temporary/log cursors);
formats: Parquet/Avro with schema; party manifestos.
3. API/Webhooks:
'/vN/events' with cursors and'Idempotency-Key';
webhooks signed (JWS/HMAC), replay registry, backoff + jitter.
4. Asset-sink:
directories/locales/catalogs of games as versioned bundles (hashes, TTL).
5) Idempotence, dedup and late events
Idempotency-Key and body hash on critical paths (payments/postbacks).
Deduplication: window ± 5 minutes/watermark; storage of "seen" hashes.
Late events: upsert/backcount policy; changelog storefronts.
Exactly-once in business sense: we do not require "broker magic," we require consumer idempotence and determinism of schemes.
6) Reconciliation of attributions and formulas
Attribution: last optional touch rule with windows by channels/jurisdictions, cross-device - only through tokens (without raw PD).
Metric formulas: each entry references' formulaVersion '; MAJOR changes are published as' data _ formula _ change'events.
Backfill according to the rules: when changing the formula, double publication (old/new) is allowed in the transition period (frozen-period).
7) Data Quality: SLI/SLO and Conformance Tests
Data quality SLI:- Freshness (publish_lag p95),
- Completeness (proportion of events vs reference),
- Uniqueness (proportion of duplicates),
- Consistency (currency/locale/ID),
- Accuracy (checksums/oracles),
- Time linearity (late events in the corridor).
- publish_lag p95 ≤ 1-5 s (operating panels), ≤ 15 min (fin. units);
- completeness ≥ 99. 5% at T + 15 min, ≥ 99. 9% in T + 24h;
- duplicate ≤ 0. 1‰; oracle discrepancy ≤ 0. 1–0. 3%.
Conformance tests: schemes, mandatory fields, directories, webhook signatures, cursor uploads without gaps.
8) Lineage, auditing and oracles
Lineage: from storefront/dashboard to primary sets (schematics/versions/owners).
WORM audit: immutable schema/formula/key/exception logs.
Oracles (signed summaries): GGR/NetRev/SLO/RG with 'formulaVersion', 'hash (inputs)', 'kid', 'traceId' - a source of truth for invoices and appeals.
Trial "trace packages": SLA 60-90 s for P1/P2 incidents.
9) Privacy, localization and security
PII-minimization: tokenization of 'playerId', prohibition of personal data in logs/showcases, detokenization only in safe zones.
Localization: maps of jurisdictions (where we store/process data classes).
Zero Trust: mTLS, short-lived tokens, egress-allow-list, key rotation/JWKS.
ABAC/ReBAC/SoD: "see theirs and agree" access; "measure ≠ influence ≠ change."
10) Financial reconciliation and settlement
Canon Net Revenue (simplified):[
NetRev = GGR - BonusCost - Jackpot/PoolShare - PaymentFees - Chargebacks - Tax/Levy - FraudLosses
]
Reconciliation:
- cursor uploads, "ors" (signed aggregates), checksums;
- invoice statuses, discrepancy acts, and parsing SLAs;
- FX rules, NET7/14/30, holds and klau-backs.
11) Synchronization cost management
Cardinality policies: prohibition of 'userId '/raw URL in labels; 'routeId/campaignId'allowed.
Downsampling/roll-ups: 1с→1м→5м; RAW data lives short, aggregates last longer.
Adaptive sampling of traces: base percentage + priority for errors/slow paths/new versions.
SLO-first: Collect only what supports solutions (SLO/Finance/RG).
12) Synchronization dashboards
Data Sync Overview: publish_lag, completeness, duplicates, late ratio, schema drift, conformance errors.
Attribution Health: timeliness of postbacks, dedup windows, controversial cases.
Finance/Oracle: discrepancy between aggregates and oracles, invoice statuses.
Jurisdiction Map: location/PD flows, DPA/DPIA compliance.
13) Operations, Incidents, RCA
Alerts: burn-rate in freshness/completeness, drift of schemes, surge of duplicates.
War-room: ready-made playbooks for tires/webhooks/CDC/storefronts; Stop buttons for aggregations/formulas
RCA "without search guilty": faktgipotezaexperimentvyvoddeystviye; post-mortem SLO.
14) Anti-patterns
"Two truths" by metrics/formulas and accession dates.
Offset pagination of history under load (cursors only).
Raw personal data in logs/showcases; no tokenization.
Postback zoo without signatures and idempotency → doubles/holes.
Mixing Event/Processing Time in aggregations.
No watermarks and no late events policy.
Manual reconciliation (Excel/manual uploads) instead of oracles.
Single large tables with unlimited cardinality of labels.
15) Checklists
Design
- Ontology, Schema Registry, owners, reference books.
- Metric Store с `formulaVersion` и frozen-period для MAJOR.
- Time semantics (event time, watermarks), late event policy.
- Transport: EDA/CDC, API/signed webhooks, cursors, idempotency.
- Data Quality SLI/SLO, conformance tests, alerts.
- Privacy/Localization (DPIA/DPA), Zero Trust, ABAC/ReBAC/SoD.
- Oracles and reconciliation rules.
Start
- Sandbox and Load/Chaos-Bus Runs/Display Cases.
- Canary synchronization 1%→5%→25%→50%→100% with guardrails.
- Dashboards publish_lag/completeness/duplicates/drift.
- Documentation of formulas and effective dates; release-notes `data_formula_change`.
Operation
- Weekly DQ report; SLO/guardrails revision.
- Monthly changelogs of schemes/formulas/accesses.
- Regular DR/xaoc for broker/ingestors/storefronts.
16) Maturity Roadmap
v1 (Foundation): unified schemes, basic CDC/batch, cursors, DQ-SLI, manual reconciliation.
v2 (Integration): watermarks and late event policy, oracles, synchronization dashboards, auto retrays with jitter.
v3 (Automation): predictive freshness/completeness monitoring, smart-reconciliation, auto-re-indexing, adaptive sampling.
v4 (Networked Governance): inter-chain exchange of oracles/quality signals, DAO rules of formulas and transparent treasuries.
17) Success metrics
Data quality: publish_lag p95, completeness%, duplicate ‰, late%, schema drift rate.
Uniformity: the proportion of reports with a fixed 'formulaVersion', the number of MAJORs without incidents.
Finance: discrepancy with oracles, share of auto-reconciliation, dispute <X%.
Operations: MTTD/MTTR synchronization incidents, share of auto-stops/rollbacks.
Compliance: 0 PD leaks, successful DPIA/DPA checks, 100% availability of WORM logs.
Observability economics: Cost-to-Sync per rps/event, cardinality compliance.
Brief summary
Synchronization of analytical data is not copying tables, but a protocol of trust and time: canon of schemes and formulas, event-time with watermarks, cursors and idempotency, dedup and late events, DQ-SLO and oracles, privacy and localization. By following this framework, the ecosystem receives unified, fresh and provable analytics - the basis for fast solutions, honest calculations and scalable network growth.