Feature Flags and Release Management
Feature Flags and Release Management
1) Why flags if there are releases?
Feature Flags (feature flags) allow you to unleash the deployment and inclusion of the function: the code goes into production stably and in advance, and business inclusion is controlled by the config/console - with targeting for segments, traffic percentages, markets, VIP/regulatory groups, devices, etc. Pros:- Release speed and security: small increments + instant rollback.
- Radius control: progressive rollout, rings, SLO stoppers.
- Experiments and A/B: multivariate flags, effect statistics.
- Operational scenarios: kill-switch for risky payment/gaming paths.
Key principle: "ship dark, enable bright" - deliver in advance, include consciously.
2) Flag types
Boolean: on/off features, emergency stop flags (kill-switch).
Multivariate: behaviors (A/B/C algorithm, limits, coefficients).
Config/Remote Config: parameters (timeouts, bet limits, bonus amount).
Permission/Entitlement: access to functions/limits by roles/tiers.
Operational: traffic routing (shadow request, inclusion of a new service).
3) Architecture and data flows
Control Plane: console/flag server, storing rules/segments, auditing.
Data Plane (SDK/Proxy/Edge): obtaining and caching flags, evaluating rules locally (minimum latency), folback when unavailable.
- Pull: The SDK periodically synchronizes the config (ETag/stream).
- Push/Streaming: Server-Sent Events/WebSocket.
- Edge Cache/Proxy: closer to the user, lowers p99.
- Local evaluation of rules (without network hop in hot-path).
- Timeouts and folkbacks (without the "blocking" flag reading).
- Signature/versioning of config snapshots.
4) Targeting and segments
Attributes: country/region, language, platform, KYC level, VIP level, risk rate, account age, payment method, responsible game limits.
Segments: saved rules with versions; "soft" (marketing) and "hard" (compliance).
Priorities/conflicts: explicit rule orders, "last match" not allowed without tests.
Geo/regulatory: product availability flags by jurisdiction; univariable predicates (for example, country-specific rebate barring).
json
{
"flag": "new_withdrawal_flow",
"default": false,
"rules": [
{"when": {"country": "CA", "kyc_level": "FULL"}, "rollout": 25},
{"when": {"segment": "vip_tier_3_plus"}, "rollout": 100},
{"when": {"country": "DE"}, "force": false}
],
"expiresAt": "2026-03-31T00:00:00Z"
}
5) Progressive rollout: Strategies
Canary by%: 1% → 5% → 25% → 50% → 100% with SLO auto-stop.
Rings: internal team → beta users → one region → globally.
Sampling by device/client: consider stickiness (hash ID).
Shadow traffic: duplicating a request to a new path without affecting the user.
Dark launch: enabled, but not visible (collecting metrics, warming up caches).
- Deterioration of p95 API latency'withdraw '> + 15% within 10 minutes.
- Errors 5xx> 0. 5% or an increase in failures of the payment provider> + 0. 3 p.p.
- Alert fraud/risk scoring above the threshold in the segment.
6) Kill-switch
A separate flag class visible by SRE/On-Call.
Guaranteed local score with TTL cache (milliseconds).
Non-refundable disconnections: require reason + postmortem ticket.
Auto-action of integrations: disabling the bonus, transferring payments to manual mode, prohibiting deposits for provider X.
7) Integration with CI/CD and GitOps
CI: validation of flag schemes, lint rules, dry run targeting against anonymized samples.
CD: promotion of flag configs as artifacts (semver), "approval gates" for sensitive flags (payments/compliance).
GitOps: flags in a separate config repository, merge request = change event, audit out of the box.
8) Safety and compliance
RBAC/ABAC: who can create/include/raise interest; Segregation of duties (developer ≠ producer ≠ product owner)
Audit: who/when/what/why; justification (ticket/JIRA), comparison with incidents.
PII minimization: attributes for targeting pass through anonymization/hashing.
Snapshot Signature Integrity Check on SDK/Proxy.
SLA for delivery of configs: degrades into "safe default."
9) Observability and metrics
Operating:- Flag propagation time (p50/p95), hit-rate of the local cache, frequency of updates.
- Number of active flags/obsolete/hanging (not removed by date).
- SLO guards: latency, error, conversion, provider stability.
- DORA: depletion rate, time to switch on, failure rate after switch on, MTTR.
- A/B indicators: CR, ARPPU, LTV signals, impact on fraud scoring.
10) Flag life cycle
1. Design: target/metric/owner/expiration date ('expiresAt'), rollback scenarios.
2. Implementation: SDK calls, folbacks, telemetry "exposure "/" decision. "
3. Rollout: progressive serve + SLO gate.
4. Stabilize: fix the effect, update the documentation/rooting.
5. Cleanup: remove code branches, close the flag, audit "residuals."
11) Implementation Examples
11. 1 Web/Node. js
ts
// Инициализация SDK (псевдо)
const flags = await sdk.init({ sdkKey: process.env.FLAGS_KEY, user: { id: userIdHash, country, vipTier } });
// Не блокировать рендер:
const showNewCashout = flags.bool("new_withdrawal_flow", false);
if (showNewCashout) {
renderNewFlow();
} else {
renderClassic();
}
11. 2 Kotlin / JVM
kotlin val client = FlagsClient(sdkKey = System.getenv("FLAGS_KEY"))
val context = UserContext(id = userHash, country = country, kycLevel = kyc)
val enabled = client.getBoolean("risk_guard_withdrawals", default = true, context = context)
if (!enabled) {
// аварийный режим: все выводы в manual review routeToManual()
}
11. 3 NGINX (external toggle via map)
nginx map $http_x_feature $cashout_new {
default 0;
"~enabled" 1;
}
location /withdraw {
if ($cashout_new) { proxy_pass http://new_flow; }
if (!$cashout_new) { proxy_pass http://classic_flow; }
}
12) Risk management and progressive steps
Inclusion steps: 1% of employees → 5% "beta" → 10% RU → 25% EU → 100% except DE (regulator).
Limiters: max. 1 step/30 min; requirement of stability of metrics per 15 min window.
Auto-stop: platform-level policy (see OPA below).
rego package flags.guard
deny[msg] {
input.flag == "new_withdrawal_flow"
input.metrics["withdraw_5xx_rate"] > 0.5 msg:= "Stop rollout: withdraw 5xx too high"
}
13) Access control and approvals
Change Types: standard (secure) vs sensitive (payments/disbursements/limits).
Approvals: product owner + tech. responsible person + compliance (for jurisdictions).
Time windows (freeze): prohibition of inclusions/extensions in high-risk periods (prime time, major tournaments).
14) Experiments and statistics
Exposure events: log the decision of the flag with attributes.
Analytics: current rollout value, segments, effect on conversions/errors.
Statistical checks: correct split, control covariates (devices/geo).
Ethics and regulatory: avoid segmentation restricted by local law.
15) Anti-patterns
Long-lived flags without 'expiresAt', 'branch graveyard' in code.
Blocking SDK network call in hot-path.
Excessive targeting by PII, lack of anonymization of attributes.
Enabling without SLO guards/auto-stop.
No kill-switch for high-risk flows (deposits/withdrawals/bonuses).
"Secret" manual flag edits without audit and justification.
16) Implementation checklist (0-60-90)
0-30 days
Select a flag platform/prepare a self-host (SDK, proxy, cache).
Enter schema ('flag', 'owner', 'purpose', 'expiresAt', 'risk _ level').
Connect SLO metrics to the platform (latency/key API errors).
31-60 days
Add approvals by sensitive flags, OPA guards.
Configure progressive strategies (percent/rings), kill-switch panel.
Embed the flag scheme linter in the CI; start stripping the first "hanging."
61-90 days
Full integration with GitOps (MR flag editing, audit).
Visual dashboards: coverage SDK, distribution time,% of cache hits.
Regular "Flag Debt Day": deleting code and closing flags.
17) Maturity metrics
Technique: p95 configuration acceptance <5 s; cache hit-rate SDK> 95%;% flags with'expiresAt '> 90%.
Processes: 100% sensitive flags with approvals; average "time to rollback" <3 min.
Code hygiene: proportion of flags closed within 30 days of global inclusion> 80%.
Business effect: improved DORA (↑ release frequency, MTTR ↓), reduced incidents during releases.
18) Applications: Templates and Policies
Flag Scheme (YAML)
yaml flag: new_withdrawal_flow owner: payments-team risk_level: high purpose: "Новый поток вывода средств"
expiresAt: "2026-03-31T00:00:00Z"
sla:
propagation_p95_ms: 3000 slo_guards:
withdraw_p95_ms_increase_pct: 15 withdraw_5xx_rate_pct: 0.5 approvals:
required: ["product_owner","tech_lead","compliance"]
No eternal flags policy (conditional for linter)
yaml rules:
- check: expiresAt max_days_from_now: 180 action: error
SDK event contract (exposure)
json
{
"event": "flag_exposure",
"flag": "new_withdrawal_flow",
"variant": "on",
"userKey": "hash_abcdef",
"context": {"country":"CA","vipTier":"3"},
"traceId": "9f1c...a2",
"ts": 1730623200000
}
19) Conclusion
Feature Flags is a "volume knob" for changes. Combine progressive inclusions, SLO guards, hard auditing and regular mopping, and bind flags to CI/CD and GitOps. As a result, releases will become frequent, manageable and secure, and the risk of incidents predictable and controlled.