GH GambleHub

Streaming and streaming analytics

1) Purpose and value

The streaming circuit provides on-the-fly decision making:
  • Antifraud/AML: identification of structuring of deposits, velocity attacks, anomalies of providers.
  • Responsible Gaming (RG): exceeding limits, risk patterns, self-exclusion.
  • Operations/SRE: SLA degradation, error bursts, early incident signals.
  • Product/marketing: personalization events, missions/quests, real-time segmentation.
  • Reporting near-real-time: GGR/NGR showcases, operating panels.

Target characteristics: p95 end-to-end 0. 5-5 s, completeness ≥ 99. 5%, managed value.

2) Reference architecture

1. Ingest/Edge

`/events/batch` (HTTP/2/3), gRPC, OTel Collector.
Validation of schemes, anti-duplicates, geo-routing.

2. Event bus

Kafka/Redpanda (partitioned by 'user _ id/tenant/market').
Retention 3-7 days, compression, DLQ/" quarantine "for" broken "messages.

3. Streaming

Flink / Spark Structured Streaming / Beam.
Stateful statements, CEP, watermark, allowed lateness, deduplication.
Enrichment (Redis/Scylla/ClickHouse-Lookup), asynchronous I/O with timeouts.

4. Serving/Operational Displays

ClickHouse/Pinot/Druid for minute/second aggregation and dashboards.
Feature Store (online) for scoring models.
Alert topics → SOAR/ticketing/webhooks.

5. Long-term storage (Lakehouse)

Bronze (raw), Silver (clean), Gold (serve) — Parquet + Delta/Iceberg/Hudi.
Replay/backtests, time-travel.

6. Observability

Pipeline metrics, tracing (OTel), logs, lineage.

3) Schemes and contracts

Schema-first: JSON/Avro/Protobuf + Registry, 'schema _ version' in each event.
Evolution: back-compatible - new nullable fields; breaking - '/v2 '+ double publication.
Required fields are 'event _ time' (UTC), 'event _ id', 'trace _ id', 'user. pseudo_id`, `market`, `source`.

4) Windows, watermarks and late data

Windows:
  • Tumbling, Hopping, Session.
  • Watermark: event-time "knowledge" threshold; e.g. 2-5 minutes.
  • Late data: pre-issue adjustments, "late = true," DLQ with a strong lag.
Flink SQL example (10-min deposit velocity):
sql
SELECT user_id,
TUMBLE_START(event_time, INTERVAL '10' MINUTE) AS win_start,
COUNT() AS deposits_10m,
SUM(amount_base) AS sum_10m
FROM stream. payments
GROUP BY user_id, TUMBLE(event_time, INTERVAL '10' MINUTE);

5) Stateful aggregations and CEP

Key: 'user _ id', 'device _ id', 'payment. account_id`.
Status: sliding sums/counters, sessions, bloom filters for deduplication.
CEP patterns: structuring (<threshold, ≥N times, per T window), device-switch, RG-fatigue.

CEP pseudo code:
python if deposits. count(last=10MIN) >= 3 and deposits. sum(last=10MIN) > THRESH and all(d. amount < REPORTING_THRESHOLD):
emit_alert("AML_STRUCTURING", user_id, window_snapshot())

6) Exactly-Once, order and idempotence

Bus: at-least-once + partition keys provide local order.
Idempotence: 'event _ id' + dedup state (TTL 24-72 h).
Sink: transactional commits (2-phase) or upsert/merge-idempotency.
Outbox/Inbox: guaranteed publication of domain events from OLTP.

7) Real-time enrichment

Lookup: Redis/Scylla (RG limits, KYC status, BIN→MCC, IP→Geo/ASN).
Asynchronous calls: sanctions/APP API with timeouts and fallback ("unknown").
FX/timezone: normalization of amounts and local market time ('fx _ source', 'tz').

8) Serving and real-time storefronts

ClickHouse/Pinot/Druid: aggregations by minutes/seconds, materialized views.
Gold-stream: operational tables GGR/RG/AML, SLA for ≤ delay 1-5 min.
API/GraphQL: low latency for dashboards and external integrations.

ClickHouse example (GGR minute by minute):
sql
CREATE MATERIALIZED VIEW mv_ggr_1m
ENGINE = AggregatingMergeTree()
PARTITION BY toDate(event_time)
ORDER BY (toStartOfMinute(event_time), market, provider_id) AS
SELECT toStartOfMinute(event_time) AS ts_min,
market,
provider_id,
sumState(stake_base) AS s_stake,
sumState(payout_base) AS s_payout
FROM stream. game_events
GROUP BY ts_min, market, provider_id;

9) Observability and SLO

SLI/SLO (landmarks):
  • p95 ingest→alert ≤ 2 s (critical), ≤ 5 s (balance).
  • Completeness of T ≥ 99 window. 5%.
  • Schema errors ≤ 0. 1%; The percentage of events with 'trace _ id' ≥ 98%.
  • Stream service availability ≥ 99. 9%.
Dashboards:
  • Party/topic lags, busy time operators, state size.
  • Funnel "sobytiye→pravilo→keys," map of "hot" keys, late-ratio.
  • Cost: cost/GB, cost/query, cost of checkpoints/replays.

10) Privacy and compliance

PII minimization: ID pseudonymization, field masking, PAN/IBAN tokenization.
Data residency: regional pipelines (EEA/UK/BR), individual encryption keys.
Legal operations: DSAR/RTBF on downstream storefronts, Legal Hold for cases/reports.
Audit: access logs, unchanging solution archives.

11) Economics and productivity

Keys and sharding: Avoid "hot" keys (salting/composite key).
Condition: reasonable TTL, snapshots, tuning RocksDB/backend state.
Preaggregation: up-front reduce for noisy streams.
Sampling: valid on non-critical metrics (not on transactions/compliance).
Chargeback: budgets for themes/jobs, quotas and team allocation.

12) Streaming DQ (Quality)

Ingest-validation (schema, enums, size), dedup '(event_id, source)'.
On the stream: completeness/dup-rate/late-ratio, window control (no double counting).
Reaction policies: critical → DLQ + alert; major/minor → tag and then clear.

Minimum rules (YAML, example):
yaml stream: payments rules:
- name: schema_valid type: schema severity: critical
- name: currency_whitelist type: in_set column: currency set: [EUR,USD,GBP,TRY,BRL]
- name: dedup_window type: unique keys: [event_id]
window_minutes: 1440

13) Access security and release control

RBAC/ABAC: separate roles for reading threads, changing rules/models.

Dual control: rollouts of rules and models through "2 keys."

Canary/A/B: dark rule and model runs, precision/recall control.
Secrets: KMS/CMK, regular rotation, prohibition of secrets in the logs.

14) Processes and RACI

R (Responsible): Streaming Platform (infra/releases), Domain Analytics (rules/features), MLOps (scoring).
A (Accountable): Head of Data/Risk/Compliance by domain.
C (Consulted): DPO/Legal (PII/retention), SRE (SLO/Incidents), Architecture.
I (Informed): Product, Support, Marketing, Finance.

15) Implementation Roadmap

MVP (2-4 weeks):

1. Kafka/Redpanda + two critical topics ('payments', 'auth').

2. Flink job with watermark, deduplication and one CEP rule (AML or RG).

3. ClickHouse/Pinot showcase 1-5 min, dashboards lag/completeness.

4. Incident channel (webhooks/Jira), basic SLOs and alerts.

Phase 2 (4-8 weeks):
  • Online enrichment (Redis/Scylla), Feature Store, asynchronous lookups.
  • Rule management as code, canary releases, A/B.
  • Streaming DQ, regionalization of pipelines, DSAR/RTBF procedures.
Phase 3 (8-12 weeks):
  • Multi-region active-active, what-if replay simulator, auto-calibration of thresholds.
  • Full Gold-stream showcases (GGR/RG/AML), reporting near-real-time.
  • Value dashboards, chargeback, DR exercises.

16) Examples (fragments)

Flink CEP — device switch:
sql
MATCH_RECOGNIZE (
PARTITION BY user_id
ORDER BY event_time
MEASURES
FIRST(A. device_id) AS d1,
LAST(B. device_id) AS d2,
COUNT() AS cnt
PATTERN (A B+)
DEFINE
B AS B.device_id <> PREV(device_id) AND B.ip_asn <> PREV(ip_asn)
) MR
Kafka Streams - idempotent filter:
java if (seenStore. putIfAbsent(eventId, now()) == null) {
context. forward(event);
}

17) Pre-sale checklist

  • Schemes and contracts in Registry, back-compat tests are green.
  • Included watermark/allowed lateness, dedup, and DLQ.
  • Configured SLO and alerts (lag/late/dup/state size).
  • Enrichment with caches and timeouts, fallback "unknown."
  • RBAC/dual-control to rules/models, all changes are logged.
  • Rules, storefronts and runbook documentation and replay/rollback.

18) Frequent mistakes and how to avoid them

Ignore event-time: without watermarks, the metrics "float."

No deduplication: false alerts and double counting.
Hot keys: distortion of parties → salting/resharding.
Synchronous front-end APIs in the hot path: async + cache only.
Unmanaged cost: preaggregations, TTL states, quotas, cost-dashboards.
Lack of simulator: rollouts without "replay" lead to regressions.

19) Glossary (brief)

CEP - Complex Event Processing.
Watermark - window readiness limit by event-time.
Allowed Lateness - tolerance of late events.
Stateful Operator - an operator with a saved state.
Feature Store - coordinated feature surfing (online/offline).

20) The bottom line

Streaming and streaming analytics are a managed system: contracts, windows and watermarks, stateful logic and CEP, enrichment and real-time storefronts, SLO and observability, privacy and value under control. By following the practices described, the platform receives reliable risk detectors, operational panels, and personalization with predictable latency and cost.

Contact

Get in Touch

Reach out with any questions or support needs.We are always ready to help!

Start Integration

Email is required. Telegram or WhatsApp — optional.

Your Name optional
Email optional
Subject optional
Message optional
Telegram optional
@
If you include Telegram — we will reply there as well, in addition to Email.
WhatsApp optional
Format: +country code and number (e.g., +380XXXXXXXXX).

By clicking this button, you agree to data processing.