Deploying Configurations
1) Why
The configuration changes more often than the code and directly affects revenue (PSP routing, limits, coefficients, front features) and compliance (KYC/AML, RG). We need a repeatable, verifiable and reversible process for delivering configs to food, with strict tolerances and observability.
2) Principles
1. Configuration as Data: configs - versioned data (YAML/JSON), not "manual clicks."
2. Schema-first: any entry is validated against the schema (JSON Schema/Protobuf).
3. Policies as code: gates, tolerances, SoD - in the policy repository.
4. GitOps: the only source of truth is Git; clusters are matched by the reconciler.
5. Progressive delivery: canary rolling, by segment (GEO/tenant/bank/provider).
6. Zero-downtime: atomic switches, double buffering, format compatibility.
7. Observability by design: audit, application metrics, drift detector.
8. Security: minimal privileges, secrets separately, SoD/4-eyes for risky changes.
3) Configuration model
Static: rarely change, require release (ports, kernel timeouts).
Dynamic: used without restarts (PSP routing, features, limits).
Secrets: keys/tokens; a separate loop (KMS/Secret Manager).
Artifacts: rule files/mappings (BIN→bank, GEO→PSP, bonus limits).
Addressing keys: 'tenant', 'region', 'environment', 'service', 'version', 'segment' (psp/bank_group/device).
4) Formats and schemes
Example of a scheme (JSON Schema, payments-routing):json
{
"$schema": "https://json-schema. org/draft/2020-12/schema",
"title": "pspRouting",
"type": "object",
"properties": {
"version": {"type": "string"},
"rules": {
"type": "array",
"items": {
"type": "object",
"required": ["geo","binGroup","primary","fallback"],
"properties": {
"geo": {"type":"string"},
"binGroup":{"type":"string"},
"primary":{"type":"string"},
"fallback":{"type":"array","items":{"type":"string"}},
"limits":{"type":"object","properties":{"perMinute":{"type":"integer"}}}
}
}
}
},
"required": ["version","rules"]
}
5) Lifecycle (GitOps)
1. Authoring: PR to config repository: data + ticket/change link.
2. Lint/Validate: CI: schematics, references, semantics (conflict rules), dry-run on the stand.
3. Policy Gates: SoD/4-eyes, risk class, freeze windows, compliance with SLO status.
4. Staging Apply: integration test run/synthetics, SLI check.
5. Progressive Delivery: food canaries (1-5%) → 25% → region/tenant → 100%.
6. Post-monitoring: 30-60 min metrics/alerts; fixing the result.
7. Promotion/Rollback: release marks, quick rollback via Git revert/" previous version. "
6) Rolling strategies
Canary by segment: 'tenant = A, geo = TR, bank = BIN _ XXXX'.
By region: EU→LATAM→APAC, taking into account hourly peaks.
By functionality: inclusion of a flag (feature flag) with guardrails (TTL, coverage limits).
Blue/Green for config source: switching readers to a new snapshot.
7) Dynamic loading and compatibility
Hot reboot (watchers/consuls/OTel Collector pipeline reload).
Double format (v1 + v2): both are read by the producers, they write to the new one.
Consistency: Version in API responses/metrics to see "who is on what configuration."
8) Security, Secrets, SoD
Secrets separately: storage in KMS/Secret Manager, encryption at the field level, access via ABAC.
SoD/4-eyes: change of payment routing/bonus limits/PII-export - only through double approval.
JIT rights: temporary tokens for operations, full audit.
Security checks: the linter prohibits PII/test keys in the config prod.
9) Pre-use validations
Schemes (JSON Schema/Protobuf), linters, cardinality checks.
Domain semantics: no loops/duplicates/" black holes," compatibility with current dependencies.
Shadow traffic/simulations: "drive" new routing/rules on a real stream as read-no-write.
SLO gate: red SLI → ban on promotion.
10) Observability and audit
Deployment metrics: application time, success, coverage rate, parsing errors, rollbacks.
Events: who/what/when/why, diff (including hiding secrets).
Drift detector: comparison of "what's in Git" and "what's in runtime"; alert in case of discrepancy.
Instances-Reference the 'trace _ id' of config reads.
11) Catalog of typical configs (iGaming)
Payments routing: PSP by GEO/BIN/method; retray limits; 3DS features.
KYC/AML: providers, timeouts, TTL, fallback/manual validation rules.
Risk & RG: velocity limits, day/month caps, geo-exceptions.
Games/Core: cache coefficients, pool sizes, phicheflags (replay history, new modes).
Ops/Observability: alert thresholds, sampling rules, retention classes, synthetics.
Status/Comms: message templates, localizations, update schedule.
12) Sample Configuration Package (Manifest)
yaml apiVersion: cfg. platform/v1 kind: ConfigRelease metadata:
id: payments-routing-2025-11-01 change: "RTE-421: reroute TR BIN_4571 → PSP2"
spec:
scope:
tenants: [brandA, brandB]
regions: [EU]
segments:
geo: [TR]
strategy:
steps:
- name: canary coverage: "5%"
duration: "20m"
- name: ramp coverage: "25%"
duration: "30m"
- name: region-full"
coverage: "100%"
gates:
require:
- policy: "slo-green"
- approval: ["Payments Lead","Compliance"]
- freeze: "not-in-effect"
rollback:
to: "payments-routing-2025-10-29"
autoIf:
- metric: "auth_success_rate"
condition: "drop>10% for 10m"
13) Rollbacks and change security
Reverse via Git: 'revert '/' promote previous.'
Atomic switch: readers switch to the old snapshot.
Auto-rollback criteria: SLI/KRI degradation, increase in parsing/validator errors.
Communications: incident-bot publishes status during auto-rollback.
14) Multi-tenant and geo-residency
File/folder and key-level namespaces ('tenant/region/env').
Reading policies: services see only their scope.
Geo-copies of configs (EU/LATAM/APAC) and replication latency with SLAs.
Different roll-out windows for different jurisdictions (compliance/holidays).
15) Performance and cost (FinOps)
Snapshot cache: local/distributed; TTL/ETag/If-None-Match.
Size of configs: limits on the volume and depth of structures; modularization.
Access card: top consumers of readings; optimisation of pulling frequency.
Cost of errors: a counter of "expensive" kickbacks/additional canaries.
16) Integrations
Alerting/SLO: gate promotion, auto kickbacks.
Release-gates: blocking code releases if the rollout of configs is not completed.
Incident bot: commands '/config promote ', '/config rollback', links to diffuses and dashboards.
Workflow Engine: human-task for high-risk changes; escalation timers.
17) KPI/KRI functions
Lead Time configuration: PR→prod.
Change Failure Rate (CFR): percentage of changes with rollback.
MTTR config incident.
Drift rate - Git↔runtime discrepancy rate.
SLO-gates pass rate: the proportion of changes that passed gates without manual exceptions.
Cost per change: CPU/IO, canaries, incidents.
18) Implementation Roadmap (6-10 weeks)
Ned. 1-2: catalog of configs, diagrams, linters; Git-repo; baseline CI (validation/diff).
Ned. 3-4: GitOps-reconciler, dry-run/staging, status-dashboards; ficheflags with TTL.
Ned. 5-6: policy-as-code (SoD/windows/freeze/SLO-gates), canary rolls, auto-rollback.
Ned. 7-8: drift detector, secrets via KMS, multi-tenant and geo-copies, integration incident bot.
Ned. 9-10: load/chaos rolling tests, FinOps report, team training and templates.
19) Artifact patterns
PR Template: target, risk class, region (tenant/region), rolling plan, rollback plan, dry-run results.
Policy Pack: SLO gates, SoD/4-eyes, freeze calendar, size/cardinality limits.
Runbook: "how to read the current version/diff/state of the canaries," "how to manually stop the promotion."
Config Catalog: owner, scheme, readers, frequency of updates, compliance notes.
20) Antipatterns
Manual edits "in the admin panel" without Git/audit.
Configs mixed with release artifact code, non-hot swappable.
Absence of schemes/validations of fall → during parsing.
Global one-time rolling without canaries.
Common secrets in the config; secrets in Git.
No kickbacks/TTL/guardrails at the ficheflags.
No drift detector.
Removal of SLO gates "on call" and without recording.
Total
Deployment of configurations is a managed pipeline: data with schemas → policies and gates → GitOps and progressive delivery → hot loading and reversibility → observability and audit → security and cost. This framework allows you to quickly and safely change the behavior of the iGaming platform, while maintaining SLO, revenue and compliance.