GH GambleHub

Data schemas and their evolution

1) Why is this an iGaming platform

Reliability - Changes to data do not break reports, APIs, or models.
Feature speed: safely add fields (KYC/RG/PSP) without stopping streams.
Regulatory: traceability and reproducibility (audit/lineage, DSAR, Legal Hold).
Cost: minimize "overflows" and downtime of backfills.

2) Types of schemes and where they live

Events (streams): 'payments. deposit_accepted`, `game. round_finished`.
OLTP/DDL: normalized tables (KYC, accounts, limits).
DWH/storefronts (Gold): denormalized aggregates under BI/ML.
Feature Store: online/offline feature sets with consistency guarantees.
External partner contracts: PSP, game providers, marketing sources.

Notations: Avro/Protobuf (streams), JSON Schema (integrations), SQL DDL (DWH), Parquet schema (lake).

3) Compatibility (core of evolution)

Backward-compatible: new producers → old consumers (added field c default/nullable).
Forward-compatible: old producers → new consumers (new reader ignores unnecessary).
Full-compatible: both (desirable target for events).
Breaking-changes: renaming/deleting a field, changing the type/semantics, changing the/partitioning key.

Rule 1: events evolve through addition, not through change.
Rule 2: delete - only in the MAJOR version of the scheme after the deprecate period.

4) Semantic versions and policies

`MAJOR. MINOR. PATCH 'for each scheme/showcase/feature set.

MAJOR - incompatible (new topic/table/feature set, dual-run).
MINOR - compatible (new nullable/default fields, new enum values).
PATCH - edit descriptions/limits/comments.

Field life cycle: 'experimental → active → deprecated → removed' (with dates and owner).

5) Scheme register and data contracts

Schema Registry: stores versions, compatibility, evolution and owners.
Data Contract: fixes the scheme + SLO quality + privacy (see the section "Data validation").

Example (Avro, payments. deposit_accepted v1. 7. 0):
json
{
"type":"record","name":"deposit_accepted","namespace":"payments",
"fields":[
{"name":"event_id","type":"string"},
{"name":"occurred_at","type":{"type":"long","logicalType":"timestamp-micros"}},
{"name":"user_id","type":"string"},
{"name":"brand","type":"string"},
{"name":"country","type":"string"},
{"name":"psp","type":"string"},
{"name":"method","type":"string"},
{"name":"amount","type":{"type":"bytes","logicalType":"decimal","precision":18,"scale":2}},
{"name":"currency","type":{"type":"enum","name":"Currency","symbols":["EUR","USD","TRY","BRL"]}},
{"name":"risk_score","type":["null","int"],"default":null},       // MINOR+
{"name":"kyc_level","type":["null",{"type":"enum","name":"Kyc","symbols":["L0","L1","L2","L3"]}],"default":null}
],
"compatibility":"FULL","owner":"team-payments"
}

6) Migration patterns

6. 1 Events (streams)

Additive-only: add fields with default/nullable; old consumers don't break.
Enum extensions: new characters are considered MINOR, consumers are required to have an 'else/unknown' branch.
MAJOR migration: the new topic 'payments. deposit_accepted. v2 ', dual-write, shadow-reads, then switching consumers.

6. 2 DWH/storefronts

Blue-Green tables: 'gold. revenue_v2' next to 'v1'; materialize, verify, switch BI.
Backfill: replay by snapshots + idempotent merge (by keys/versions).
SCD: type 2 for slowly changing attributes (limits, KYC, VIP statuses).

6. 3 Feature Store

Dual-serve: the old feature set is served parallel to the new one; the model is serviced via a router.
Point-in-time consistency: evolution should not break PITA joys (timestamp/granularity are unchanged at MINOR).

7) Taxonomy of changes (checklist)

Safe (MINOR):
  • adding'nullable/default 'field;
  • enum extension (with an'unknown' branch at the consumer);
  • adding a non-key index/comment/description.
Conditionally safe:
  • Scale/unit change (for example, amount in cents → base currency) - MAJOR only
  • reference/reference transfer - through the presentation layer.
Breaking (MAJOR):
  • Rename/delete a field
  • Change the type/format/key/partition
  • change of semantics (for example, 'bonus _ amount' from "accrued" → "written off").

8) Circuit Linters and Compatibility Tests

Schema-lint: name style ('snake _ case'), required labels ('owner', 'doc', 'pii'), date/currency format.
Compat-tests: checking the new version against the registry (backward/forward/full).
Consumer-contract-tests: each service provides a "sample payload" and expectation; run on CI when changing the scheme.
Golden-datasets: a set of real and "evil" examples (new enum, empty/late fields, boundary values ​ ​ of sums).

9) Directories, enum and localization

Reference-data (countries/currencies/PSP/providers): individual versions and SLA updates; do not sew into the event code.
Locale/time zones: store UTC in events + explicit locale for presentation.
Rules of jurisdictions: age flags, promo restrictions - in the form of directories with dates of action.

10) Multi-Brand/Multi-Jurisdictional and PII

Tenant isolation: 'brand', 'country', 'license' - mandatory fields with enum; routing on them.
PII policy at the schema level: mark the fields' pii = true ', apply masks/tokenization; in events, only tokens.
DSAR: presence of'source _ id/trace _ id'for deletion/retrieval; Legal Hold on MAJOR migrations.

11) DDL and Lake versioning

DDL migrations: declarative migrations (Liquibase/Flyway/dbt), storage in VCS, review by the domain owner.
Formats in Lake: Avro/Parquet - record the evolution of fields; at MAJOR - new table/path '.../v2/'.
Partitioning: changing parts (for example, 'date'→'date,brand') - only through MAJOR and double entry.

12) Examples of iGaming

12. 1 PSP extended methods

Added 'method = "MEFETE"' to enum.
MINOR release of the'deposit _ accepted v1. 8. 0`; consumers who do not know MEFETE send a branch to 'unknown _ method'.

12. 2 Games provider added pitches

V'game. round_finished' added'jackpot _ id' (nullable).
Showcase 'gold. game_rounds_v3' receives MINOR; old reports work, new ones count jackpots.

12. 3 RG attributes

Transition from Boolean 'self _ excluded' to status' rg _ state ∈ {none, limit, cooldown, self_excluded}' - MAJOR, new topic + dual-write + migration of showcases and models.

13) Evolution process (from idea to switch)

1. Proposal (ADR): why change, type of compatibility, risk assessment and affected consumers.
2. Design and contract: scheme to register, semver, compatibility policy.
3. Tests: linters, compat, consumer-contracts, replay on golden-sets.
4. Deployment: dual-write/blue-green/shadow-reads; alerts.
5. Reconciliation: Business balances/invariants (see Data Validation).
6. Switch: switch consumers/BI/features.
7. Deprecate: freeze old schema, grace-period, delete and archive.

14) Metrics and SLOs of evolution

Success-rate of migrations, dual-run time, share of new format events, backfill volume, lag/freshness.
Compatibility incidents (P1/P2), window quality after switching.
Cost: $/TB overflow, $/hour dual-write, cluster load.
Compliance: 0 PII leaks, SLA DSAR/Legal Hold met.

15) Tools and artifacts

15. 1 Compatibility policy (registry)

yaml schema: payments. deposit_accepted compatibility: FULL default_nulls: true enums:
currency: {allow_new_symbols: true, require_consumer_unknown_branch: true}
pii: false owners: ["team-payments"]
reviewers: ["data-governance","security-dpo"]

15. 2 Migration passport (template)

yaml change_id: MIG-2025-041 scope: game. round_finished -> v3 type: MAJOR plan:
dual_write: true shadow_reads: consumers: ["gold-rounds","rg-models"]
backfill: {from: "2025-01-01", mode: "idempotent-merge"}
validation:
invariants: ["sum_bets = sum_wins + margin + bonuses"]
freshness_delta_p95_max: "PT5M"
switch_criteria:
error_rate_max: 0. 1%
kpi_diff_pp_max: 0. 5 deprecate_after: "2025-12-31"

15. 3 Linter of names and types (rules)

'sake _ case ', UTC timestamps, DECIMAL (18.2) for sums, 'country' for alpha-2 ISO-3166-1, 'currency'for ISO-4217.
No'free _ text'for enum fields; reference books - external.

16) Implementation Roadmap

0-30 days (MVP)

1. Enable Schema Registry + compatibility policy for key events (payments, game_rounds, user).
2. Linters/compat tests in CI; owner directory and SLA reviews.
3. ADR templates and migration passport; MAJOR checklist.

30-90 days

1. Blue-Green for Gold storefronts; dual-write for critical topics.
2. Consumer-contract-tests for basic services; golden-datasets.
3. Automatic diff-reconciliations and alerts when switching; cost reports.

3-6 months

1. Single deprecate/remove process with grace-period; archiving and Legal Hold.
2. Geo/tenant-specific encryption schemes and keys; DP variants for sensitive markets.
3. Data dictionary and live lineage charts.

17) RACI

Data Governance (A/R): standards, registry, migration review, de-publication.
Domain Owners (R): meaning of fields, reference books, business invariants.
Data Platform (R): registry tools, compat tests, dual-run/backfills.
Security/DPO (A/R): PII policies, geo/tenant, DSAR/Legal Hold.
SRE/Observability (C): alerts, evolution SLO, capacity.
Product/Finance (C): validation of KPIs, switching windows.

18) Anti-patterns

"Edit the field on the fly" without versions and dual-run.
Renaming instead of adding a new field → massive breakdowns.
Hard enum without the'unknown' branch → drop at new values.
Single directory "in code" for all jurisdictions.
Backfill without idempotent-merge and check balances.
Logs with PII and without trace_id for search/DSAR.

19) Related Sections

Data Validation, Data Origin and Path, DataOps Practices, Analytics and Metrics API, Auditing and Versioning, Data Security and Encryption, Access Control, MLOps: Model Exploitation.

Total

The evolution of schemes is a process, not a one-off migration: registry, versions and interoperability; dual-run and blue-green instead of "switches at midnight"; compatibility tests and business invariants instead of luck. So the data remains stable, the models are predictable, the reports are correct, and the regulators are calm.

Contact

Get in Touch

Reach out with any questions or support needs.We are always ready to help!

Start Integration

Email is required. Telegram or WhatsApp — optional.

Your Name optional
Email optional
Subject optional
Message optional
Telegram optional
@
If you include Telegram — we will reply there as well, in addition to Email.
WhatsApp optional
Format: +country code and number (e.g., +380XXXXXXXXX).

By clicking this button, you agree to data processing.