Data schemas and their evolution
1) Why is this an iGaming platform
Reliability - Changes to data do not break reports, APIs, or models.
Feature speed: safely add fields (KYC/RG/PSP) without stopping streams.
Regulatory: traceability and reproducibility (audit/lineage, DSAR, Legal Hold).
Cost: minimize "overflows" and downtime of backfills.
2) Types of schemes and where they live
Events (streams): 'payments. deposit_accepted`, `game. round_finished`.
OLTP/DDL: normalized tables (KYC, accounts, limits).
DWH/storefronts (Gold): denormalized aggregates under BI/ML.
Feature Store: online/offline feature sets with consistency guarantees.
External partner contracts: PSP, game providers, marketing sources.
Notations: Avro/Protobuf (streams), JSON Schema (integrations), SQL DDL (DWH), Parquet schema (lake).
3) Compatibility (core of evolution)
Backward-compatible: new producers → old consumers (added field c default/nullable).
Forward-compatible: old producers → new consumers (new reader ignores unnecessary).
Full-compatible: both (desirable target for events).
Breaking-changes: renaming/deleting a field, changing the type/semantics, changing the/partitioning key.
Rule 1: events evolve through addition, not through change.
Rule 2: delete - only in the MAJOR version of the scheme after the deprecate period.
4) Semantic versions and policies
`MAJOR. MINOR. PATCH 'for each scheme/showcase/feature set.
MAJOR - incompatible (new topic/table/feature set, dual-run).
MINOR - compatible (new nullable/default fields, new enum values).
PATCH - edit descriptions/limits/comments.
Field life cycle: 'experimental → active → deprecated → removed' (with dates and owner).
5) Scheme register and data contracts
Schema Registry: stores versions, compatibility, evolution and owners.
Data Contract: fixes the scheme + SLO quality + privacy (see the section "Data validation").
json
{
"type":"record","name":"deposit_accepted","namespace":"payments",
"fields":[
{"name":"event_id","type":"string"},
{"name":"occurred_at","type":{"type":"long","logicalType":"timestamp-micros"}},
{"name":"user_id","type":"string"},
{"name":"brand","type":"string"},
{"name":"country","type":"string"},
{"name":"psp","type":"string"},
{"name":"method","type":"string"},
{"name":"amount","type":{"type":"bytes","logicalType":"decimal","precision":18,"scale":2}},
{"name":"currency","type":{"type":"enum","name":"Currency","symbols":["EUR","USD","TRY","BRL"]}},
{"name":"risk_score","type":["null","int"],"default":null}, // MINOR+
{"name":"kyc_level","type":["null",{"type":"enum","name":"Kyc","symbols":["L0","L1","L2","L3"]}],"default":null}
],
"compatibility":"FULL","owner":"team-payments"
}
6) Migration patterns
6. 1 Events (streams)
Additive-only: add fields with default/nullable; old consumers don't break.
Enum extensions: new characters are considered MINOR, consumers are required to have an 'else/unknown' branch.
MAJOR migration: the new topic 'payments. deposit_accepted. v2 ', dual-write, shadow-reads, then switching consumers.
6. 2 DWH/storefronts
Blue-Green tables: 'gold. revenue_v2' next to 'v1'; materialize, verify, switch BI.
Backfill: replay by snapshots + idempotent merge (by keys/versions).
SCD: type 2 for slowly changing attributes (limits, KYC, VIP statuses).
6. 3 Feature Store
Dual-serve: the old feature set is served parallel to the new one; the model is serviced via a router.
Point-in-time consistency: evolution should not break PITA joys (timestamp/granularity are unchanged at MINOR).
7) Taxonomy of changes (checklist)
Safe (MINOR):- adding'nullable/default 'field;
- enum extension (with an'unknown' branch at the consumer);
- adding a non-key index/comment/description.
- Scale/unit change (for example, amount in cents → base currency) - MAJOR only
- reference/reference transfer - through the presentation layer.
- Rename/delete a field
- Change the type/format/key/partition
- change of semantics (for example, 'bonus _ amount' from "accrued" → "written off").
8) Circuit Linters and Compatibility Tests
Schema-lint: name style ('snake _ case'), required labels ('owner', 'doc', 'pii'), date/currency format.
Compat-tests: checking the new version against the registry (backward/forward/full).
Consumer-contract-tests: each service provides a "sample payload" and expectation; run on CI when changing the scheme.
Golden-datasets: a set of real and "evil" examples (new enum, empty/late fields, boundary values of sums).
9) Directories, enum and localization
Reference-data (countries/currencies/PSP/providers): individual versions and SLA updates; do not sew into the event code.
Locale/time zones: store UTC in events + explicit locale for presentation.
Rules of jurisdictions: age flags, promo restrictions - in the form of directories with dates of action.
10) Multi-Brand/Multi-Jurisdictional and PII
Tenant isolation: 'brand', 'country', 'license' - mandatory fields with enum; routing on them.
PII policy at the schema level: mark the fields' pii = true ', apply masks/tokenization; in events, only tokens.
DSAR: presence of'source _ id/trace _ id'for deletion/retrieval; Legal Hold on MAJOR migrations.
11) DDL and Lake versioning
DDL migrations: declarative migrations (Liquibase/Flyway/dbt), storage in VCS, review by the domain owner.
Formats in Lake: Avro/Parquet - record the evolution of fields; at MAJOR - new table/path '.../v2/'.
Partitioning: changing parts (for example, 'date'→'date,brand') - only through MAJOR and double entry.
12) Examples of iGaming
12. 1 PSP extended methods
Added 'method = "MEFETE"' to enum.
MINOR release of the'deposit _ accepted v1. 8. 0`; consumers who do not know MEFETE send a branch to 'unknown _ method'.
12. 2 Games provider added pitches
V'game. round_finished' added'jackpot _ id' (nullable).
Showcase 'gold. game_rounds_v3' receives MINOR; old reports work, new ones count jackpots.
12. 3 RG attributes
Transition from Boolean 'self _ excluded' to status' rg _ state ∈ {none, limit, cooldown, self_excluded}' - MAJOR, new topic + dual-write + migration of showcases and models.
13) Evolution process (from idea to switch)
1. Proposal (ADR): why change, type of compatibility, risk assessment and affected consumers.
2. Design and contract: scheme to register, semver, compatibility policy.
3. Tests: linters, compat, consumer-contracts, replay on golden-sets.
4. Deployment: dual-write/blue-green/shadow-reads; alerts.
5. Reconciliation: Business balances/invariants (see Data Validation).
6. Switch: switch consumers/BI/features.
7. Deprecate: freeze old schema, grace-period, delete and archive.
14) Metrics and SLOs of evolution
Success-rate of migrations, dual-run time, share of new format events, backfill volume, lag/freshness.
Compatibility incidents (P1/P2), window quality after switching.
Cost: $/TB overflow, $/hour dual-write, cluster load.
Compliance: 0 PII leaks, SLA DSAR/Legal Hold met.
15) Tools and artifacts
15. 1 Compatibility policy (registry)
yaml schema: payments. deposit_accepted compatibility: FULL default_nulls: true enums:
currency: {allow_new_symbols: true, require_consumer_unknown_branch: true}
pii: false owners: ["team-payments"]
reviewers: ["data-governance","security-dpo"]
15. 2 Migration passport (template)
yaml change_id: MIG-2025-041 scope: game. round_finished -> v3 type: MAJOR plan:
dual_write: true shadow_reads: consumers: ["gold-rounds","rg-models"]
backfill: {from: "2025-01-01", mode: "idempotent-merge"}
validation:
invariants: ["sum_bets = sum_wins + margin + bonuses"]
freshness_delta_p95_max: "PT5M"
switch_criteria:
error_rate_max: 0. 1%
kpi_diff_pp_max: 0. 5 deprecate_after: "2025-12-31"
15. 3 Linter of names and types (rules)
'sake _ case ', UTC timestamps, DECIMAL (18.2) for sums, 'country' for alpha-2 ISO-3166-1, 'currency'for ISO-4217.
No'free _ text'for enum fields; reference books - external.
16) Implementation Roadmap
0-30 days (MVP)
1. Enable Schema Registry + compatibility policy for key events (payments, game_rounds, user).
2. Linters/compat tests in CI; owner directory and SLA reviews.
3. ADR templates and migration passport; MAJOR checklist.
30-90 days
1. Blue-Green for Gold storefronts; dual-write for critical topics.
2. Consumer-contract-tests for basic services; golden-datasets.
3. Automatic diff-reconciliations and alerts when switching; cost reports.
3-6 months
1. Single deprecate/remove process with grace-period; archiving and Legal Hold.
2. Geo/tenant-specific encryption schemes and keys; DP variants for sensitive markets.
3. Data dictionary and live lineage charts.
17) RACI
Data Governance (A/R): standards, registry, migration review, de-publication.
Domain Owners (R): meaning of fields, reference books, business invariants.
Data Platform (R): registry tools, compat tests, dual-run/backfills.
Security/DPO (A/R): PII policies, geo/tenant, DSAR/Legal Hold.
SRE/Observability (C): alerts, evolution SLO, capacity.
Product/Finance (C): validation of KPIs, switching windows.
18) Anti-patterns
"Edit the field on the fly" without versions and dual-run.
Renaming instead of adding a new field → massive breakdowns.
Hard enum without the'unknown' branch → drop at new values.
Single directory "in code" for all jurisdictions.
Backfill without idempotent-merge and check balances.
Logs with PII and without trace_id for search/DSAR.
19) Related Sections
Data Validation, Data Origin and Path, DataOps Practices, Analytics and Metrics API, Auditing and Versioning, Data Security and Encryption, Access Control, MLOps: Model Exploitation.
Total
The evolution of schemes is a process, not a one-off migration: registry, versions and interoperability; dual-run and blue-green instead of "switches at midnight"; compatibility tests and business invariants instead of luck. So the data remains stable, the models are predictable, the reports are correct, and the regulators are calm.