GH GambleHub

Data auditing and versioning

1) Why do you need it

Auditing and versioning create reproducibility: you can explain any figure, repeat the calculation and safely develop models/showcases. In iGaming, this is critical for finance (GGR/NET), payments, KYC/AML, Responsible Gaming and regulatory reporting.

Objectives:
  • Tracing: who changed the data/schema/logic and why.
  • Reproducibility: Which version of the data/code/model generated the report.
  • Release security: rollback and predictability of changes.
  • Compliance: provable logs for regulators and internal audits.

2) Concepts and version levels

1. Schema Version - Field/Type/Semantic Evolution (SEMVER).
2. Dataset Version-Snapshot/slice at a time "true" for report/training.
3. Data Product Version: formulas, filters, aggregations.
4. ML feature/model version: date/code/hyperparameters/feature/data (end-to-end).
5. Pipeline version: transformation code, configs, dependencies.
6. Data contract version: producer/consumer requirements (scheme, SLA, quality).


3) Audit: what to log

Who: subject (user/service), role/attributes (RBAC/ABAC).
What: Table/Showcase/Model/Scheme/Contract.
When: exact time, tz, correlation id.
Why: link to task/ticket/release note, reason.
Than: code/model version, commit hash, container image.
How has it changed: before/after (diff), row volume (rows affected), integrity control (hash/signature).
Context: environment (prod/stage), domain, data sensitivity (class).

Audit logs are append-only/WORM, signed, and available in SIEM.


4) Versioning policy (recommendations)

SEMVER: `MAJOR. MINOR. PATCH`

MAJOR - incompatible schema/semantics changes.
MINOR - reversibly compatible additions (new fields/columns with nullable, new vNext showcases).
PATCH - fixes without changing the contract (quality-fix, backfill).
Deviation-procedure: obsolescence window, warnings in the/CI directory, date of disconnection.
Release Notes: one page per release: what, why, risks, rollback plan.


5) Techniques in storage and streams

Time-travel/Snapshots: storing table versions; ability to execute the query "as it was on T-0."

SCD (Slowly Changing Dimensions): types 1/2/3 for dimensions (games, providers, players).
CDC/CDF (Change Data/Capture & Feed): incremental changes for facts (rates, payments, KYC).
Audit Fact-A separate fact table with edit/add/delete events.
Integrity control: batch/file hashes, package signatures, aggregate reconciliations.


6) Evolution of circuits and Data Contracts

Contract as code: schema, types, mandatory fields, allowed values, SLA freshness, DQ rules.
Compatibility: added → MINOR field; changed the type/semantics → MAJOR with migration and dual-write.
CI gate: PR changing scheme is blocked if compatibility is broken or there is no Release Notes.
Directory/Registry: stores active/obsolete versions and owners.


7) Versioning in BI and metrics

Certified "gold" showcases: fixed KPI semantics (GGR, ARPPU, retention).
Dual-run: a new version of the showcase is built in parallel (v2), comparison of metrics (tolerance bands).
Commit Reports - Each export/dashboard references a'dataset _ version' and a'definition _ version'.
Calendar sections: "dey-kat," "month-to-date" - are fixed on the data version.


8) Versioning in ML/MLOps

Model Registry: model, date, quality metrics, training data (dataset_version), feature versions (feature_set_version).
Feature Store: versioned feature groups; prohibition of "hot" fields without an explicit version.
Repro set: training code (commit), environment (Docker/conda lock), sid.
Champion-Challenger: parallel versions in sales, reports on quality, fairness and privacy.
Rollback: quick rollback to the previous stable model and feature set.


9) Rollback, backfill and fixes

Rollback plan: for each MAJOR/MINOR version - clear return steps.

Backfill playbook: source of truth, date range, order of recalculation, checksums, labels "recomputed = true."

Edit visibility: v2 replaces v1 only after comparison; all "historical" reports continue to reference their versions.


10) Safety and compliance in the audit

Event/package signing: producer signs, consumer verifies.
PII sanitation: the audit stores tokens that are not raw PII.
Legal Hold: No deletion of version/logs for the duration of the investigation.
DSAR: versions find and upload subject records by token; historical snapshots are taken into account.


11) Metrics and SLO

Repro Rate is the percentage of reports played from the data version/code ≥ the target threshold.
Coverage:% of tables with time-travel/audit log enabled.
Schema Compatibility Pass: rate of successful compatibility checks in CI.
Dual-run Delta: variance v1/v2 within tolerance.
Rollback MTTR: average version rollback time.
Audit Integrity - percentage of events signed and verified.
Backfill Success - percentage of recalculations completed correctly.


12) iGaming Patterns (Cases)

GGR correction retroactively: the supplier has recalculated RTP - we make backfill of facts for the period, fix 'recomputed _ at', publish Release Notes, compare v1/v2; we do not rewrite the reports for the past months, but mark "the corrected version is available."

Anti-fraud rules: we change the semantics of features - MAJOR, dual-run models and showcases, rollback to champion when regressing.
KYC/AML: added new provider statuses - MINOR with nullable; include compatibility tests in contracts.
RG signals: clarified the logic of the "series of losses" - MINOR + Release Notes and impact monitoring.


13) Tools and artifacts (categories)

Catalog/Lineage/Registry: set/schematic/storefront versions, owners, connections, contracts.
Orchestrator & CI/CD: compatibility gates, dual-run, release notes publishing.
Storage with time-travel: storage of snapshots/logs.
Signing & Checksums: batch signature, batch checksums.
Model/Feature Registry: feature/model versions, champion-challenger reports.


14) Templates (ready to use)

14. 1 Release Notes

Version: 'payments _ gold v2. 1. 0`

Type: MINOR (new fields' psp _ country ',' method _ group ')

Reason: PSP/Country Reporting Unification

Risks: Impact on display case'risk _ signals'

Validation: dual-run 14 days, delta ≤ 0. 2% GGR

Rollback: switch to 'v2. 0. 3 'via orchestrator flag

Deploy date/owner/ticket

14. 2 Kit version passport

Dataset: `game_rounds_silver`

Version: '2025-11-01T00: 00: 00Z' (snapshot id)

Schema: 'schema @ 1. 7. 0 '(contract reference)

Source: Provider Feeds A/B (commit...)

Integrity checksum signed manifest

DQ: Completeness 99. 9%, freshness ≤ 15 min

Uses: 'games _ perf _ gold v3. x`, `rg_signals v1. x`

14. 3 Change audit report

Event: update schema 'kyc _ status' → 'kyc _ status, v2'

user/service, 'Data-Engineer' role

When: '2025-11-01 09:32:10 + 02'

Why: Ticket # 3421 (new provider statuses)

Diff: + 'status _ reason' (nullable), enum extended

Checks: CI semver pass, MINOR contract

Caption: 'sig =...', hash diff: 'sha256 =...'

14. 4 Versioning policy (fragment)

MAJOR: breaks compatibility; dual-write ≥ 30 days; mandatory rollback plan.
MINOR: reversibly compatible; Warnings in the directory A/B storefronts 7-14 days.
PATCH: quality fixes/recalculations; Release Notes required.
Archiving: we store snapshots for regulation ≥ N months; WORM for audit.


15) Processes (end-to-end)

1. Initiative: change ticket + linedge impact score.
2. Engineering Contract/Schema Update + Release Notes.
3. Validation: CI compatibility checks, DQ tests, dual-run.
4. Deploy: by flag, canary; Publish the version to the catalog.
5. Monitoring: delta v1/v2, KPI, complaints.
6. Backfill: By regression playbook.
7. Post-mortem: if incident, update policy/tests.


16) RACI (example)

Policies and Standards: CDO (A), Data Governance Council (R/A), DPO/Sec (C).
Contracts/schemes: Domain Owners (A), Data Stewards (R), Platform/Eng (C).
Orchestration/Storage: Platform/Eng (R), SRE (C).
BI/metrics: Analytics Lead (R), Product/Finance (C).
ML versions: ML Lead (A), DS (R), Platform (C).
Audit/Logs: SecOps (R), Internal Audit (C).


17) Implementation Roadmap

0-30 days (MVP)

Enable time-travel/snapshots for critical tables (payments, game_rounds, kyc).
Run immutable audit logs and signature of ingestion packages.
Accept SEMVER policy and Release Notes template.
Catalog: add 'owner', 'schema _ version', 'dataset _ version' to top showcases.

30-90 days

Enter dual-run for all MINOR/MAJOR; automatic v1/v2 comparison.
Associate contracts with compatibility and DQ CI gates.
Backfill/rollback regulation; train teams.
Model/Feature Registry with a full set of dannyye→fichi→model→inferens links.

3-6 months

Full audit log coverage, WORM storage, reports for regulators.
Automated Release Notes from diff + lineage.
Repro Rate/Schema Compatibility/Rollback MTTR reports in dashboards.
Quarterly reviews of KPI versions and "freezing" of definitions.


18) Anti-patterns

Changing KPI semantics without a new version/release note.
Recalculations "quietly" without a backfill plan and'recomputed 'marks.
Storage of raw PII in audit logs.
Lack of dual-run and instant window replacement.
"Eternal" models/showcases without specifying the version and sources.


19) Related Sections

Data Management, Data Origin and Path, Access Control, Tokenization, Security and Encryption, Model Monitoring, Ethics and DSAR, Federated Learning, Confidential ML.


Result

Auditing and versioning turn data and models into a reliable product: each change is transparent, reproducible and reversible. For iGaming, this is the foundation of trust in KPIs, sustainability of compliance and speed of secure releases.

Contact

Get in Touch

Reach out with any questions or support needs.We are always ready to help!

Start Integration

Email is required. Telegram or WhatsApp — optional.

Your Name optional
Email optional
Subject optional
Message optional
Telegram optional
@
If you include Telegram — we will reply there as well, in addition to Email.
WhatsApp optional
Format: +country code and number (e.g., +380XXXXXXXXX).

By clicking this button, you agree to data processing.