GH GambleHub

Data warehouses

1) Purpose and role of DWH in iGaming

DWH is the central data consolidation and serving layer for reporting, analytics, compliance and ML. It provides:
  • Common metric definitions (GGR/NGR, ARPPU, Retention, Churn).
  • Reproducible reports for regulators and internal stakeholders.
  • Fast storefronts for BI/operating panels and sources for models.
  • Platform-level quality control, lineage, and safety.

2) Architectural options

2. 1 Classic DWH

ETL → DWH → BI.
Pros: Manageable models, strong consistency.
Cons: expensive downloads, complex backfill, limited flexibility.

2. 2 Lakehouse DWH

Bronze/Silver/Gold on ACID tables (Delta/Iceberg/Hudi) + SQL/MPP engine.
Pros: unified storage, time-travel, simple reprocessing.
Cons: requires discipline of layers and DQ, mature orchestration.

2. 3 Hybrid

Lakehouse as a "source of truth" (Bronze/Silver), DWH-March in MPP (ClickHouse/Pinot/Druid/Cloud DWH) for high-speed reading.
Pros: balance of cost and performance, flexible storefronts.
Cons: dual support for circuits and skating, synchronization is needed.

Recommendation: for iGaming - Lakehouse + DWH-March (hybrid). Bronze/Silver - standardize, Gold/Real-time marts - serve reading loads.

3) Data modeling

3. 1 Star and Snowflake

Fact tables: narrow, event-driven: 'fact _ bets', 'fact _ payouts', 'fact _ payments'.
Dimensions: 'dim _ users' (SCD), 'dim _ games', 'dim _ providers', 'dim _ markets'.
Snowflake is appropriate in Silver (normalization), Star - in Gold (reading).

3. 2 Data Vault 2. 0 (integration core)

Hubs (business keys), Links (relationships), Satellites (context/history).
Apply in Silver for long-lived provider/PSP integrations.

3. 3 SCD I/II/III

SCD II for RG/KYC/channels and game attributes (RTP/volatility).
Strict intervals' valid _ from/valid _ to ', correct join in time.

4) Load: ETL/ELT, CDC and increments

ELT approach: loading in Silver → transformation in DWH.
CDC: Debezium/log replication from OLTP; merzhi are idempotent.
Increments: by time water ('updated _ at> max_loaded_ts') and/or hash delta.
Backfill/Reprocessing: time-travel, ranges, quotas, dry-run comparisons.

MERGE (example):
sql
MERGE INTO silver. payments s
USING stage. payments_delta d
ON s. transaction_id = d. transaction_id
WHEN MATCHED THEN UPDATE SET
WHEN NOT MATCHED THEN INSERT;

5) Semantic layer and metrics

Metrics Store/Semantic Layer: uniform formulas GGR/NGR/Conversion/LTV.
Versioning metrics and "as-of" computation for reproducibility.
Conventions are metric names, units, currency (base EUR), and 'fx _ source'.

6) Storefronts and Serving

Gold showcases: denormalized, SLA ready (for example, until 06:00 lock.) .
Operational Marts: ClickHouse/Pinot/Druid for 1-5 minute panels.
Export: CSV/JSON/PDF + hash; immutable packets (WORM) for regulators.

GGR Daily example:
sql
CREATE OR REPLACE VIEW gold. ggr_daily AS
SELECT
DATE(b. event_time) AS event_date,
b. market,
g. provider_id,
SUM(b. stake_base) AS stakes_eur,
SUM(p. amount_base) AS payouts_eur,
SUM(b. stake_base) - SUM(p. amount_base) AS ggr_eur
FROM silver. fact_bets b
LEFT JOIN silver. fact_payouts p
ON p. user_pseudo_id = b. user_pseudo_id
AND p. game_id = b. game_id
AND DATE(p. event_time) = DATE(b. event_time)
JOIN dim. games g ON g. game_id = b. game_id
GROUP BY 1,2,3;

7) Data quality (DQ) and contracts

Schema-first: JSON/Avro registry + compatibility tests (consumer-driven).
DQ-как-код: completeness/validity/uniqueness/FK/range/temporal.
Reaction policies: critical → fail + DLQ; major/minor → tag and report.
DQ observability: Freshness/Completeness/Validity dashboards, lost records funnel.

8) Security, privacy and residency

PII minimization: users via pseudo-ID; mappings separately.
RLS/CLS: Line-by-line/post-table access by role and jurisdiction.
Encryption: TLS in-transit; at-rest - KMS/CMK with rotation.
Data Residency: separate directories and keys for EEA/UK/BR; banning cross-regional joins without reason.
DSAR/RTBF: computable projections and selective edits; Legal Hold on reporting artifacts.

9) Performance and Cost (Cost Engineering)

Partitioning: by date/market/tenant; clustering/Z-order by 'market', 'provider _ id', 'game _ id', 'user _ pseudo _ id'.
Formats: Parquet + statistics and compression; OPTIMIZE/VACUUM on schedule.
Materialization: stable aggregates and summary tables; avoid "fat" joins on the fly.
Quotas/Chargeback: budgets for heavy requests/replays; reports cost/query, cost/GB.
Tiered storage: hot/warm/cold; clear recovery SLAs.

10) Observability and management

Pipeline metrics: duration, volumes, retrays, lags, fault tolerance.
DWH metrics: response time/competitiveness/cache hits/value.
Lineage: graph from sources to reports; impact analysis on changes.
SLO: Freshness Silver p95 ≤ 15 мин; Gold daily - ready until 06:00; Validity ≥ 99. 9%; Completeness ≥ 99. 5%; availability ≥ 99. 9%.

11) Multi-tenancy and domain isolation

Division by schema/database/catalog into tenant/market.

Quotas and resource groups; limiting "noisy neighbors."

Export/import policies between tenants, standardized contracts.

12) Data register and documentation

Data Catalog: owner, SLA, schema, examples, DQ rules, lineage.
Metrics/dashboards: cards with formulas and responsible.
Change Log: versions of logic, migrations, impact.

13) Processes and RACI

R (Responsible): Data Engineering (models Silver/Gold, DAG'i), Data Platform (infra, registry, DQ).
A (Accountable): Head of Data/CDO.
C (Consulted): Compliance/Legal/DPO, Finance (FX/GGR), Risk (RG/AML), SRE (SLO/стоимость).
I (Informed): BI, Product, Marketing, Operations.

14) Implementation Roadmap

MVP (4-6 weeks):

1. Lakehouse Bronze/Silver (ACID tables), CDC/increments for Payments/Gameplay.

2. First Gold showcases (GGR Daily, conversion), SLAs until 06:00.

3. DQ-like-code (10-15 rules) + Freshness/Completeness dashboards.

4. Data catalog and base semantic layer of metrics.

Phase 2 (6-12 weeks):
  • SCD II для users/games/providers; domain expansion.
  • Online March (ClickHouse/Pinot) for real-time/near-real-time panels.
  • Lineage/impact analysis, DSAR/RTBF procedures, regionalization (EEA/UK).
Phase 3 (12 + weeks):
  • Auto-simulation of changes (dry-run), replay and comparison of metrics.
  • Chargeback/quotas, cost-dashboards; DR exercises and time-travel recovery.
  • Auto-generation of showcase documentation and metrics cards.

15) Examples of SQL templates

Actual rates (Silver, 3NF):
sql
CREATE TABLE silver. fact_bets (
bet_id STRING PRIMARY KEY,
user_pseudo_id STRING NOT NULL,
game_id STRING NOT NULL,
stake_ccy DECIMAL(18,2) NOT NULL,
currency CHAR(3) NOT NULL,
stake_base DECIMAL(18,2) NOT NULL,
market CHAR(2) NOT NULL,
event_time TIMESTAMP NOT NULL
);
Connection to SCD II (get RG status at the time of bet):
sql
SELECT b. bet_id, u. rg_status
FROM silver. fact_bets b
JOIN dim. users_scd u
ON u. user_pseudo_id = b. user_pseudo_id
AND b. event_time >= u. valid_from
AND (u. valid_to IS NULL OR b. event_time < u. valid_to);
Completeness control by market:
sql
SELECT market, DATE(event_time) d, COUNT() n
FROM silver. fact_bets
GROUP BY market, DATE(event_time)
HAVING n = 0;

16) Pre-sale checklist

  • Schemes and contracts in the registry, compatibility tests are green.
  • CDC/increments and MERGE procedures are idempotent.
  • Gold showcases have SLAs, metric formulas are fixed.
  • DQ rules are active (critical → fail + DLQ), Freshness/Completeness dashboards.
  • RBAC/ABAC, encryption, residency by region, access logs.
  • Lineage/impact enabled; time-travel/backup/DR checked.
  • Cost under control: parties, clustering, materialization, quotas.

17) Anti-patterns and risks

"One fat DWH without layers": a mixture of raw and reported data → chaos and expensive fixes.
Full reload daily needlessly: use increments/CDC.
Gold without owner and formulas: lack of a single version of the truth → disputes and regressions.
PII in analytical layers: keep mappings separate, CLS/RLS.
No DQ/lineage: no evidence for regulators/audit.
Unmanageable cost: no batches/optimizations/quotas.

18) Glossary (brief)

DWH is a data warehouse for consolidation and analytics.
Lakehouse - data lake + ACID tables and SQL engine.
CDC - Capture changes from OLTP.
SCD - slowly changing measurements (I/II/III).
Gold showcase - ready-to-consume report sheet/presentation.
Semantic Layer - uniform definitions of metrics and attributes.

19) The bottom line

Modern DWH for iGaming is not a "big table," but a manageable platform: Bronze/Silver/Gold layers, strict contracts and DQ, uniform metrics and lineage, privacy and residency, performance and efficiency. By building a Lakehouse + DWH-March hybrid, you will have fast and verifiable decision making ready for audit, scale and new markets.

Contact

Get in Touch

Reach out with any questions or support needs.We are always ready to help!

Telegram
@Gamble_GC
Start Integration

Email is required. Telegram or WhatsApp — optional.

Your Name optional
Email optional
Subject optional
Message optional
Telegram optional
@
If you include Telegram — we will reply there as well, in addition to Email.
WhatsApp optional
Format: +country code and number (e.g., +380XXXXXXXXX).

By clicking this button, you agree to data processing.