Feature Flags and feature release

Feature Flag (FF) is a managed condition that enables/disables system behavior without releasing code. Flags allow you to: roll out features safely, target groups of users/markets/tenants, quickly disable problematic components, conduct experiments and configure parameters in runtime.

Key objectives:

Reduce blast radius for releases.
Separate deployment and activation.
Allow transparent change management with auditing, SLO and one-click rollback.

1) Types of flags and when to apply them

Release flags - phased inclusion of a new feature (dark → canary → ramp-up → 100%).
Ops/kill-switch - instant disconnection of dependencies (provider, subsystem, heavy calculations).
Experiment (A/B, multi-variant) - division of traffic into variants (weights, sticky bucketing).
Permission/Entitlement - access to features by role/plan/jurisdiction.
Remote Config - behavior parameters (threshold, timeout, formula) from the flag/config.
Migration flags - switching schemes/data paths (moving to a new index/DB/endpoint).

Anti-pattern: the same flag "about everything" - split into feature, comp switch and parameters.

2) Flag data model (minimum)

yaml flag:
key: "catalog. new_ranker"
type: "release"    # release      ops      kill      experiment      permission      config     migration description: "New Directory Ranking"
owner: "search-team@company"
created_at: "2025-10-01T10:00:00Z"
ttl: "2026-01-31" # delete deadline after 100% enable rules:
- when:
tenant_id: ["brand_eu","brand_latam"]
region: ["EE","BR"]
user_pct: 10 # progressive percentage then: "on"
- when:
kyc_tier: ["unverified"]
then: "off"
variants: # for experiments
- name: "control"; weight: 50
- name: "v1"; weight: 30
- name: "v2"; weight: 20 payload:
v1:
boost_freshness: 0. 3 boost_jackpot:  0. 2 v2:
boost_freshness: 0. 2 boost_jackpot:  0. 4 prerequisites: # dependent flags/schema versions
- key: "catalog. index_v2_ready"
must_be: "on"
audit:
require_ticket: true change_window: "09:00-19:00 Europe/Kyiv"
safeguards:
max_rollout_pct: 50 # stop threshold auto_rollback_on:
p95_ms: ">200"
error_rate: ">2%"

3) Evaluation and targeting

Ключи таргетинга: `tenant_id, region/licence, currency, channel, locale, role, plan, device, user_id, cohort, kyc_tier, experiment_bucket`.
Evaluation order: prerequisites → deny rules → allow rules → default.
Sticky bucketing: for experiments, hash a stable identifier (for example, 'hash (user_id, flag_key)') so that the user always gets one option.

Pseudocode:

ts result = evaluate(flag, context)  // pure function if (!prereqs_ok(result)) return OFF if (deny_match(result, ctx)) return OFF if (allow_match(result, ctx)) return resolve_variant_or_on(result, ctx)
return flag. default

4) FF distribution and architecture

Options:

Server-side SDK (recommended): sources of truth and cache in the backend; unification of logic.
Edge/CDN evaluation: fast targeting on the perimeter (where there are no PII/secrets).
Client-side SDK: when you need UI personalization, but only with minimal context and no sensitive rules.
Config-as-Code: storing flags in the repository, CI validation, rollout via CD.

Strategy Cache:

Startup bootstrap + streaming updates (SSE/gRPC) + fallback to the last snapshot.
SLA "freshness" flags: p95 ≤ 5 s.

5) Release strategies

5. 1 Dark Launch

The feature is enabled but invisible to the user; collect metrics and errors.

5. 2 Canary

We include 1-5% of traffic in one jurisdiction/tenant; monitor p95/p99, errors, conversion.
Stop conditions - autocatoph threshold triggers by metrics.

5. 3 Progressive Rollout

10% → 25% → 50% → 100% scheduled with manual/auto verification.

5. 4 Shadow / Mirroring

We duplicate requests to the new path (with no apparent effect) and compare the results/latency.

5. 5 Blue/Green + FF

We deploy two versions; the flag steers traffic and switches dependencies by segment.

6) Dependencies and cross-service consistency

Use prerequisites and "health-flags" of readiness: the index is built, the migration is completed.
Coordination through events: 'FlagChanged (flag_key, scope, new_state)'.

For critical scenarios, use two-phase switching:

1. enable read-path → 2) check metrics → 3) enable write/side-effects.

Service contracts: default must be fail-safe OFF.

7) Observability and SLO

Metrics per flag/variant/segment:

`flag_eval_p95_ms`, `errors_rate`, `config_freshness_ms`.
Business metrics: 'ctr', 'conversion', 'ARPU', 'retention', guardrails (e.g. RG incidents).
Automatic SLO thresholds for autocatopa.

Logs/tracing: add 'flag _ key', 'variant', 'decision _ source' (server/edge/client), 'context _ hash'.

Dashboards: rollout "ladder" with thresholds, heatmap errors by segments.

8) Safety and compliance

PII-minimization in context.
RLS/ACL: who can change which flags (by domain/market).
Hour windows of changes (change windows) and "double confirmation" for sensitive flags.
Immutable audit: who/when/what/why (ticket/incident link).
Jurisdictions: Flags must not circumvent regulatory bans (for example, include playing in a banned country).

9) Managing "long-lived" flags

Each flag has a TTL/deletion date.
After 100% inclusion - create a task to delete code branches, otherwise the "flag-debt" will grow.
Mark the flags as' migration '/' one-time ', separate them from the constant' permission/config '.

10) Sample Contract API/SDK

Evaluation API (server-side)

http
POST /v1/flags/evaluate
Headers: X-Tenant: brand_eu
Body: { "keys":["catalog. new_ranker","rgs. killswitch"], "context": { "user_id":"u42", "region":"EE" } }
→ 200
{
"catalog. new_ranker": { "on": true, "variant":"v1", "as_of":"2025-10-31T12:10:02Z" },
"rgs. killswitch":  { "on": false, "variant":null, "as_of":"2025-10-31T12:10:02Z" }
}

Client SDK (кэш, fallback)

ts const ff = await sdk. getSnapshot()     // bootstrap const on = ff. isOn("catalog. new_ranker", ctx)
const payload = ff. payload("catalog. new_ranker", "v1")

11) Interaction with other circuits

Rate limits/quotas: flags can lower RPS/enable throttling for the duration of the incident.
Circuit breaker/degradation: kill-switchi disable heavy paths and enable degradation.
Directory/Personalization: Flags change weights/ranking rules (via Remote Config).
Database migrations: flags gradually translate reads/writes to a new scheme (read-replica → dual-write → write-primary).

12) Playbooks (runbooks)

1. Incident after 25% inclusion

Autocatoff triggered → OFF flag for all/segment, ticket to on-call, stats collection, RCA.
Temporarily enable degradation/old branch through the migration flag.

2. p95 catalog growth

Threshold 'p95 _ ms> 200' - autocatoph; fix a snapshot of logs with'flag _ key = catalog. new_ranker`.
Enable payload config.

3. Lack of jurisdiction

The permission flag mistakenly opened the game in 'NL' - OFF + post-fact audit, adding the guard rule "region deny."

4. Variance in A/B

Stop the experiment, perform CUPED/stratified analysis, re-roll with updated scales.

13) Testing

Unit: deterministic evaluation of rules/priorities/prerequisites.
Contract: flag scheme (JSON/YAML), validators, CI-check before merge.
Property-based: "deny> allow," "most specific wins," stable bucketing.
Replay-Plays real contexts on the new configuration.
E2E: canary scripts (step-up/step-down), autocatoff check and audit events.
Chaos: Streaming cliff, legacy snapshot, massive flag update.

14) Typical errors

Secret logic in client flags (leaks/spoofing).
The absence of TTL → the "cemetery" of flags in the code.
"Universal" flags without → segmentation cannot localize the problem.
No guardrails/autocatophones - manual incidents.
Incompatible dependencies between flags → loops/out of sync.
Evaluation of flags in each request without cache → latency spikes.
No audit/change window - compliance risks.

15) Pre-sale checklist

Flag created with type, owner, description, TTL and ticket requirement.
Targeting rules defined; 'deny' on unwanted regions/roles.
Sticky bucketing is deterministic; ID is stable.
Pre-requisites and health flags ready; default safe.
Dashboards and alerts on p95/p99, error_rate, business guardrails.
Autocatoff configured; rollout stop threshold and rollback conditions.
Canary Plan - Percentages/Milestones/Change Window/Owners
Configs are validated in CI; snapshot distributed across clusters/regions.
Support/product documentation; incident playbooks.
Plan to remove code branches and the flag itself after 100%.

16) Example of "migration" flag (DB/index)

yaml flag:
key: "search. use_index_v2"
type: "migration"
description: "Switching reads to index v2"
prerequisites:
- key: "search. index_v2_built"
must_be: "on"
rules:
- when: { tenant_id: ["brand_eu"], user_pct: 5 } then: "on"
- when: { tenant_id: ["brand_eu"], user_pct: 25 } then: "on"
safeguards:
auto_rollback_on:
search_p95_ms: ">180"
error_rate: ">1%"
ttl: "2026-02-01"

Conclusion

Feature Flags is not only "on/off," but the discipline of change risk management. Clear flag types, deterministic targeting, progressive displays with guardrails, autocathof, audit and deletion plan make releases predictable and incidents concise and controlled. Build flags into architecture as a first class of citizens - and you can deliver value more often, safer and more meaningfully.

Feature Flags and feature release

Client SDK (кэш, fallback)

Conclusion

Get in Touch

Quick Contact

The video will be updated soon

We are currently very busy with projects