GH GambleHub

Data lifecycle

1) Purpose and principles

The goal is to enable predictable, compliant, and cost-effective movement of data from inception to final disposition, supporting analytical, operational, and regulatory scenarios.

Basic principles:
  • Data as a Product: each set has an owner, contract, SLO, documentation.
  • Schema-first: schemes are required; changes - through versioning.
  • Privacy-by-Design: PII minimization, pseudonymization, regional storage.
  • Observation-by-Default: metrics, access logging, lineage.
  • Cost-aware: storage levels, TTL, sampling, compression.

2) Life cycle phases

2. 1 Create/Collect

Sources: products (web/mobile), backends, payments, KYC/AML providers, games/studios, marketing, operating logs.
Identifiers: 'event _ id', 'user. pseudo_id`, `session_id`, `trace_id`.
Contracts: JSON/Avro schemes, AsyncAPI/OpenAPI.
Input quality: validation of schemes, mandatory fields, size limits, anti-duplicates.
Privacy: tokenization of sensitive fields, geo-routing ingest (EEA/UK/BR).

2. 2 Ingest & Raw

Transport: HTTP/gRPC → Edge → bus (Kafka/Redpanda).
Raw layer (Bronze): append-only, immutable payloads (for forensics), partitioning by time/market/tenant.
Politicians: dedup by '(event_id, source)', DLQ for "broken" events, Legal Hold tags.

2. 3 Processing and cleaning (Refine)

Normalization (Silver): typing, deduplication, directories, FX/timezones, enrichment.
Quality (DQ): completeness/uniqueness/ranges/reference integrity.
Reprocessing: idempotent conveyors, time-travel, controlled backfills.

2. 4 Service/Use

Gold showcases: BI/reporting (GGR, RG, AML), product and risk models, real-time showcases.
Access: SQL/Trino, semantic metrics layer, API/GraphQL, Feature Store.
SLA freshness: for example, Gold-daily showcases are ready until 06:00 local time.

2. 5 Share and Publish

Internal consumers: Analytics, Product, Risk, Compliance, Marketing, Finance.
External offloads: regulators, partners/providers; immutable packages (PDF/CSV/JSON + hash).
Monitored channels: signed artifacts, audit downloads/exports.

2. 6 Archive/Retain

Retention policies: by data type and jurisdiction (e.g. regulatory - 5-7 years).
Storage layers: hot/warm/cold, WORM/Object Lock for immutability.
Archive indexing: directories, version/market labels, quick metadata search.

2. 7 Remove and Finish (Dispose)

Common removal: TTL/retention; safe cleaning, updating indexes.
Legal transactions: DSAR/RTBF (right to be forgotten), exceptions for legal storage obligations, Legal Hold (freeze removal).
Verification: deletion reports, audit log, cross-replica control.

3) Classification and catalogue

Sensitivity categories: public/internal/confidential/restricted.
Домены: Payments, Gameplay, Compliance/AML, RG, Marketing, Ops, Finance.
Data catalog: description, owner, freshness SLA, schemes, lineage, access levels.
Теги: `jurisdiction`, `tenant`, `pii_class`, `retention_class`, `legal_hold`.

4) Lakehouse model and schematics

Bronze/Silver/Gold: clear rules for transformation and responsibility.
Formats: Parquet + table format with ACID (Delta/Iceberg/Hudi).
Evolution of schemes: semantic versions, longitudinal compatibility, double-entry migrations for breaking changes.
Registry: Schema Registry, CI-validation of contracts, consumer-driven tests.

5) Data quality (DQ)

Quality metrics:
  • Completeness - The percentage of events/rows actually received.
  • Validity: the proportion of records that passed the schema validation.
  • Uniqueness: duplicate control.
  • Consistency: compliance with reference books and links.
  • Freshness: delayed arrival/materialization.
Practices:
  • DQ rules as code (YAML/SQL tests), dashboards, SLO alerts.
  • Auto-fallback during degradation (last correct cut).

6) Privacy and compliance

PII minimization: store pseudo-ID, take mappings into an isolated loop.
Masking and RLS/CLS: at the column/row level; dynamic policies.
Regionalization: data residency by market; separate directories/encryption keys.
DSAR/RTBF: controlled projections, selective edits, audit issues.
Legal Hold: freeze marks, unchanging archives, access logging.

7) Access and security

Authentication/authorization: SSO, RBAC/ABAC, attributes of jurisdictions and roles.
Encryption: TLS in-transit; at-rest via KMS/CMK; key rotation.
Access logs: who/what/when/where; alerts for mass exports/scans.
Separation of duties: different roles for prod/analytics/admins/reviewers.

8) Lineage and observability

Technical lineage: from source → transformation → showcases → reports.
Operational lineage: links with releases, feature flags, models, AML/RG rules.
Platform metrics: throughput, lag, failure-rate, cost/query, cost/GB.
Tracing: transferring 'trace _ id' from applications to storefronts/alerts.

9) Time models and retroprocesses

Event-time vs Processing-time: приоритет event-time, watermarks/allowed lateness.

Backfill and reprocessing: idempotent pipelines, time-travel, control of "double counting."

Saving states: TTL, snapshots, disaster recovery.

10) Economics and cost control

Partitioning (date/market/tenant), clustering/Z-ordering.
Sampling for high-frequency analytics (not for transactions/compliance).
Multi-layer storage (hot/warm/cold), automatic TTL.
Budget/chargeback by team, limits on heavy requests and backfill.

11) Processes and RACI

R (Responsible): Data Platform (ingest/storage/orchestration), Data Engineering (transformation), Domain owners (Contracts/DQ/SLO).
A (Accountable): Head of Data/Chief Data Officer.
C (Consulted): Compliance/Legal/DPO, Architecture, SRE, Security.
I (Informed): BI/Продукт/Маркетинг/Финансы/Операции.

12) SLO/SLI (sample targets)

IndicatorPurpose
Freshness Silver p95≤ 15 minutes
Gold-daily storefrontsuntil 06:00 lock. time
Completeness за T≥ 99. 5%
Validity (schemes)≥ 99. 9%
Surfing availability≥ 99. 9%
DSAR response time≤ 30 days (stricter by local law)

13) Dashboards

Freshness heat map by domain/market.
Completeness/Validity by thread.
Cost of storage and queries (by layer and command).
Lineage map for critical reports (regulatory, GGR, RG/AML).
DSAR/RTBF queues, Legal Hold statuses.

14) Retention policy templates (example)

Data classHotWarmArchive (WORM)TTL total
Payment transactions7 d60 d7 years7 years
Game Events (Analytics)3 d30 d1-2 years1-2 years
Compliance/AML Artifacts14 d90 d5-7 years5-7 years
Operating logs3 d30 d1 year1 year

The actual dates are determined by Legal/DPO and local law.

15) Documentation and standards

Data Product page: owner, destination, SLA, schemas, DQ rules, contacts.
Change log: schema/logic versions, impact analysis, migrations.
Runbooks: reprocessing, backfill, emergency scenarios, frieze button.

16) Implementation Roadmap

MVP (4-6 weeks):

1. Data catalog and classification (top domains), basic schemes and register.

2. Lakehouse Bronze/Silver, ingestion with validation and deduplication.

3. 1-2 Gold cases (e.g. GGR and conversion).

4. Minimum DQ rules and Freshness/Completeness dashboard.

5. Retention policies and access RBACs.

Phase 2 (6-12 weeks):
  • Linage, semantic layer of metrics, DSAR/RTBF procedures.
  • Regionalisation (EEA/UK), WORM for regulatory artefacts, Legal Hold.
  • Cost optimization, SLO alerts, budget reporting.
Phase 3 (12 + weeks):
  • Data Mesh (domain products), consumer-driven contracts and tests.
  • Automatic simulation of impact when changing schemes/logic, replays.
  • Single compliance panel (regulatory, access, DQ, lineage).

17) Pre-sale checklist

  • Schemes approved, contracts in register, compatibility tests.
  • DQ rules are active, alerts are configured, SLOs are set.
  • RBAC/ABAC roles checked, access logs enabled.
  • Retention/deletion/archive policies have been validated by Legal/DPO.
  • DSAR/RTBF/Legal Hold procedures are documented and tested.
  • Lineage/metrics/cost are displayed in dashboards.
  • Runbooks for backfill/reprocessing/DR are ready.

18) Frequent mistakes and how to avoid them

There is no single classification and directory: enter mandatory Data Product cards.
Raw data without schemes: schema-first + CI validation.
No removability: Design TTLs and RTBF processes from the start.
PII and analytics mix: store mappings separately, apply masking.
Gold without owner and SLO: Assign owner and freshness goals.
Unmanaged cost: batches, compression, tiered-storage, quotas.

19) Glossary (brief)

DSAR/RTBF - data subject request/delete right.
Legal Hold - removal freeze for legal reasons.
Lineage - traceability of origin and transformations.
Data Product is a managed product unit of data with SLAs.
DQ - data quality rules and metrics.
Lakehouse - combining data lake and ACID tables.

20) The bottom line

The data lifecycle is a managed arrangement system, not just a file warehouse. Clear contracts and schemes, classification and catalog, measurable quality, privacy and security, cost-effective storage architecture and transparent lineage make data a reliable asset that supports product, compliance and analytics without surprises and hidden risks.

Contact

Get in Touch

Reach out with any questions or support needs.We are always ready to help!

Telegram
@Gamble_GC
Start Integration

Email is required. Telegram or WhatsApp — optional.

Your Name optional
Email optional
Subject optional
Message optional
Telegram optional
@
If you include Telegram — we will reply there as well, in addition to Email.
WhatsApp optional
Format: +country code and number (e.g., +380XXXXXXXXX).

By clicking this button, you agree to data processing.