Analysis of anomalies and correlations
1) Why is it iGaming
iGaming lives in real time: deposits were delayed, a specific game provider "sank," fraud surfaced, the traffic mix changed. We need a discipline that:- Detects variances early (before KPIs and revenue fall in reports).
- Distinguishes failures from seasonality/promotions/tournaments.
- Finds root causes (RCA) instead of "treating symptoms."
- Respects privacy and ethics (RG/AML) without giving away PII.
2) Anomaly typology
Point: single peak/dip (e.g. spike PSP errors).
collective: sequence of atypical values (long degradation).
contextual: normal at night, abnormal during the day (depending on the context: hour/country/channel).
Mode/trend change (change-point): level, variance, seasonality have changed dramatically.
Structural: spike in omissions/duplicates, schema drift.
Cause and effect: changing the neighboring node (PSP/provider) "turned" our row.
3) Data preparation and context
Calendar and seasonality: weekends/holidays/tournaments/promotions → individual baselines.
Aggregation layers: 1-min/5-min/hour, by country/brand/provider/device.
Normalization: per-capita (per player/session), by time of day, by FX.
Time features: rolling mean/std, EWMA, lags, day of the week, "minutes to cut-off."
Quality: filter late events/duplicates, eliminate timezone errors.
4) Detection methods (simple to hybrid)
Statistics and time series
Robust z-score (median/IQR), EWMA, STL-decomposition (trend/seasonal/remain).
CUSUM/ADWIN - sensitive to mean/dispersion shift.
Change-points (for example, PELT/BOCPD): fix the points of mode change.
Prophet/ETS - forecast + confidence corridor → emissions outside the interval.
Multidimensional/density
Isolation Forest, LOF, One-Class SVM - when there are many signs (PSP, geo, channel, device).
Autoencoder (reconstruction/error) for complex patterns.
Online streams
Sliding windows, quantile sketches, EWMA + hysteresis; accounting for watermarks and late data.
"Dual-thresholds" to suppress bounce.
Hybrid
Domain rules (SLO-conscious) + statistics/ML → higher accuracy and explainability.
5) Detection quality: how to measure
Precision/Recall/F1 for marked incidents.
ATTD (Average Time To Detect) and TTR (time to normalization).
Duration bias: penalty for "blinking" (frequent inputs/outputs from anomaly).
Ex-post business metrics: "how many rounds/deposits saved," "how many P1s prevented."
Stability: the proportion of suppressed false alarms; p95 "quiet nights."
6) Correlation, causality and traps
Correlation ≠ causality: a common driver (stock/external down) can "drive" both metrics.
Partial correlation (conditional), Mutual Information (MI) - when the links are non-linear.
Granger causality - one row helps predict the other.
DAG/causal discovery - hypotheses about the direction of influence.
Simpson's paradox: aggregates "lie" without stratification (country/channel/device).
Leakage: signs containing future information give false reasons.
7) Root-Cause Analysis (RCA)
Dependence graph: game providers → lobbies → bets → payments/PSP → KPI.
Measurement scan: Who 'broke'? (country, brand, provider, payment method, fixed asset).
Contrast groups: where there is an anomaly/no → relative risk/odds ratio.
Shapley/Feature attribution for multivariate anomaly models.
What-if scenarios: Disable the suspect segment - is the KPI restored?
8) Noise reduction and prioritization
Hysteresis: "3 out of 5 windows broken" for confirmation.
Dynamic thresholds: baseline ± k· σ, quantile 5/95, seasonal profiles.
Grouping: one incident per "provider A" instead of 300 alerts per game.
SLO-awareness: alertim only if SLO/business threshold is affected.
Suppression: N alerts in a maximum of T minutes per label set.
9) Conveyor: online and offline
Online: Flink/Spark Streaming/CEP - minute windows, watermarks, deduplication, idempotency.
Offline: backtests for the year of history, injection of "synthetic" incidents, comparison of candidates.
ModelOps: rule/model versioning (MAJOR/MINOR/PATCH), shadow/canary, and rollback for rules.
10) Privacy, ethics, compliance
Zero-PII in fiches and alerts; tokens instead of identifiers.
RG/AML: individual channels and accesses; redaction text.
Bias: Check for variation in sensitive measurements (country/method/device) - do not turn anomaly into discrimination.
Legal Hold/DSAR: storing history of detections/decisions - WORM log.
11) iGaming cases (ready-made templates)
Payments/PSP
Detection: 'success _ rate _ deposits _ 5m ↓' below baseline_28d by 3 σ, confirmation of 3/5 windows → P1.
RCA: section on 'psp, country, method'; checking queues/retraces.
Gaming providers
Detection: 'rounds _ per _ min' of provider A <60% of the rolling_quantile (0. 1) for 28d → P1.
Action: hide Game A tiles, notify provider, switch lobby.
RG
Detection: 'high _ risk _ share' ↑ by> 3 pp in 10 min in brand B → P2.
RCA: campaigns/bonuses, surge in new devices, geo-shift.
Antifraud
Detection: 'chargeback _ rate _ 60m> μ + 3 σ' And 'new _ device _ share ↑' → P1.
Action: tighten scoring/withdrawal limits.
12) Artifacts and patterns
12. 1 YAML rules (online)
yaml rule_id: psp_success_drop severity: P1 source: stream:payments. metrics_1m baseline: {type: seasonal_quantile, period: P28D, quantile: 0. 1, by: [hour, dow, country, psp]}
detect:
type: ratio_below value: 0. 6 confirm: {breaches_required: 3, within: PT5M}
labels: {psp: "$psp", country: "$country"}
actions:
- route: pagerduty:payments
- soars: [{name: switch_psp, params: {backup: "PSP_B"}}]
privacy: {pii_in_payload: false}
version: 1. 4. 0
12. 2 Config offline backtest
yaml dataset: payments_gold period: {from: "2025-07-01", to: "2025-10-31"}
inject_scenarios:
- type: level_shift target: success_rate where: {psp: "PSP_A", country: "EE"}
from: "2025-09-15T12:00Z"
delta: -0. 02 metrics: [precision, recall, f1, attd_sec]
12. 3 RCA Incident Passport
Incident: drop rounds @ provider A
Period: 2025-11-01 18: 10-18: 35 (Europe/Kyiv)
Root-node: `games. engine. provider_A` (change-point @18:12)
Аффект: `lobby_clicks ↓`, `rounds_per_min ↓ 45%`, `GGR/min ↓ 28%`
Counterarguments: payments OK, PSP OK, FX/stats normal
Actions: hide tiles, provider contact, status banner
Result: recovery @ 18:34; losses prevented X
13) Process Success Metrics
Precision/Recall/F1 on P1/P2 incidents (markup by domain owners).
ATTD/MTTR in minutes (median/p90).
Noise↓: − X% of "false night" alarms, ≤ Y alerts/shift.
RCA-time: median time to root cause.
Business saved: assessment of retained deposits/rounds.
Coverage: ≥ 95% of critical pathways under observation.
14) Processes and RACI
Domain Owners (R) - rules/baselines/incident marking.
Data Platform/Observability (R) - detection engine, storage, SLO.
ML Lead (R) - anomaly models, calibration, fairness.
SRE/SecOps (R) - SOAR/PagerDuty integrations, incidents.
CDO/DPO (A) - privacy/ethics policy, Zero-PII.
Product/Finance (C) - SLO thresholds and business priorities.
15) Implementation Roadmap
0-30 days (MVP)
1. Critical paths: payments, game_rounds, freshness ingest.
2. Baselines by hour/day and key dimensions (country/brand/psp/provider).
3. Simple detectors: EWMA/robust z-score + hysteresis.
4. Alert channels and 3 runbook'a (payments/games/DQ).
5. Backtests for 3-6 months of history; marking of incidents.
30-90 days
1. Change-points, seasonal quantiles, multimodal series.
2. Isolation Forest/LOF for multidimensional cases; shadow mode.
3. RCA dependency graph and semi-automatic attribution.
4. SLO-conscious thresholds; suppression/grouping; autocomplete tickets.
3-6 months
1. Champion-Challenger rules/models; auto-tuning thresholds.
2. External integrations (providers/PSPs) with signed webhooks.
3. Reports "alert contribution to MTTR/revenue"; quarterly hygiene-sessions.
4. Causal experiments for controversial correlations (A/B, Granger, instrumental variables).
16) Anti-patterns
Threshold by eye common to all countries/hours/channels.
Ignoring seasonality/stocks → a storm of false alerts.
There are no backtests and markup of incidents - there is nothing to optimize.
Chasing correlations without stratification/partial corr → false causes.
Logs/alerts with PII, screenshots in shared channels.
"Eternal" rules without revision and owner.
17) Related Sections
Data Flow Alerts, DataOps Practices, Analytics and Metrics APIs, Auditing and Versioning, MLOps: Model Exploitation, Access Control, Security and Encryption, Data Retention Policies, Reducing Bias.
Total
Anomaly and correlation analysis is not "ML magic" but an engineering system: correct context and seasonality, a hybrid of rules and models, strict quality metrics, and managed RCA. In iGaming, such a system reduces MTTR, protects revenue and keeps the trust of players and regulators - without privacy violations.