GH GambleHub

Ecosystem updates without downtime

(Section: Ecosystem and Network)

1) Zero-downtime purpose and principles

Zero-downtime updates ensure continuous operation of the network and products during changes in code, configurations, data schemes and protocols. Basic principles:
  • Backward/forward compatibility at contract boundaries.
  • Progressive delivery instead of "big switch."
  • Observability and reversibility: metrics, traces, fast rollback.
  • Idempotency and secure retreats for network and payment flows.
  • Fault isolation: cell architecture, circuit-breakers, fan-out limits.

2) Downtime-free release strategies

Blue-Green - two identical stacks (Blue = prod, Green = new). Traffic is switched atomically at the balancer level with the possibility of instant rollback.
Canary - phased share of traffic (1%→5%→20%→50%→100%) with SLO gates.
Rolling - node-by-node pool update with readiness check and connection drainage.
Shadow/Traffic Mirroring - mirror requests for a new version without affecting responses.
Feature Flags - business switching of features over an unchanged API (gradual rollout).
Dark Launch - Enable hidden logic branches for telemetry and profiling.

Recommendation: for critical services - a combination of canary + rolling + feature flags; for gateways and APIs - blue-green with short switching.

3) Contractual compatibility (API/events/protocol)

API: versioning by URI/headers; adding fields - valid, deleting/renaming - only through the "deprecate window."

Events (event-bus): "only adding" fields; keys are immutable; new types - as new themes/versions.
Schemas (Avro/JSON-Schema/Protobuf): registry schema, BACKWARD'FULL 'compatibility.
Network protocol/P2P: version handshake and capability negotiation (nodes declare supported versions/features).
Gateways: adapters between vN and vN + 1 (transcoding/field mapping) for the migration period.

Deprecate policy (example): announcement → ≥90 days of warning → "deprecated" checkbox → deletion of field/endpoint.

4) Expand → Migrate → Contract

1. Expand - add new structures/indexes/columns (nullable/default), dual-write (dual-write) to the old and new formats.
2. Migrate - background migrations, backfill, consistency validators; read through an adapter that supports both schemes.

3. Contract - disable reading/writing to the old scheme, remove technical debt after completing the "deprecate window."

SQL (Simplified):
sql
-- Expand
ALTER TABLE payouts ADD COLUMN payout_ref TEXT NULL;
CREATE INDEX CONCURRENTLY ix_payouts_ref ON payouts(payout_ref);

-- Migrate (batch + idempotent)
UPDATE payouts SET payout_ref = concat('ref_', id) WHERE payout_ref IS NULL;

-- Contract (after compatibility window)
ALTER TABLE payouts ALTER COLUMN payout_ref SET NOT NULL;

Event transactionality: Use Outbox (transaction with event record) + CDC for guaranteed delivery.

5) Long-lived connections and drainage

Graceful shutdown: SIGTERM → stop accepting new requests → set 'readiness = fail' → wait for WebSocket/HTTP2/QUIC streams to drain → close.
Connection draining on the balancer: 'deregister _ delay' 30-120 s, sticky sessions - through tokens, not IP.
Back-pressure: Limit new upstream p99_latency.

6) SDK and Client Versioning

SemVer for SDK; LTS branch with extended support window (e.g. 12 months).

Policy: "at least two active minor versions"; telemetry per client by version; Automatic upgrade alerts

Critical changes (security): forced flag of disabling old versions through the gateway after the deadline.

7) Updates of protocols and network nodes

Soft-fork: extending rules without violating old nodes (capabilities).
Hard-fork: pre-announced window, double validation, "canary validators," protection against "reorg/rollback" conflicts, time-lock for activation.
Cross-chain updates: governance bridges transmit activation signals; in case of misalignment - local circuit-breaker.

8) Configurations and secrets as data

Centralized config-service with versioning, digital signatures and rollback.
Secrets rotation without downtime: double keys (old/new), alternate inclusion; zero downtime for KMS/PKI.
Feature-flags in a separate row, audit of on/off.

9) Pipeline release and automatic "gates"

Стадии: build → unit → security scan → e2e/stage → shadow → canary → 100%.

Gates-stops:
  • Error-budget burn-rate, p95/p99 latency, error-rate, decrease in success-rate events/payments, growth of dead-letter queues.
  • Auto-rollback in case of SLO violation in any of the stages.
Example (pseudo-YAML):
yaml release:
strategy: canary steps:
- name: shadow traffic_mirror: 5%
gates: [no_data_loss, no_pii_leak]
- name: canary_1 traffic: 1%
gates: [error_rate<0. 2%, p99<400ms]
- name: canary_2 traffic: 10%
gates: [slo_ok_1h, zero_deadletters]
- name: rollout traffic: 100%
gates: [stability_6h]
- name: bake duration: 24h action: finalize_or_rollback

10) Observability and SLO for releases

Key SLIs:
  • p95/p99 latency by endpoints; error-rate (5xx + fatal 4xx); success-rate events; proportion of retrays; queue lag; "relay" share in P2P; share of customers by version.
SLO (example):
  • p99 API ≤ 400 ms; error-rate ≤ 0. 2%; success-rate events ≥ 99. 5%; queue lag ≤ 2 s; Rollback MTTR ≤ 15 min.
  • Release dashboards: before/after comparison, canary graphs, dependency map (service map), burn-rate alerts 1h/6h.

11) Rollbacks and kill-switch

Auto-rollback: keep the latest "good" artifacts and configs; "1-button" rollback on the balancer (Blue←Green).
Partial rollback: The phicheflag turns off the new logic when saving the binary.
Data rollback: only for "read-paths"; for write-paths - protected migrations (never delete old columns until the end of the window).
Kill-switch: centralized flag to disable the unstable subsystem.

12) Testing without downtime

Contract tests against customer stabilization (consumer-driven).
Schema-compat tests.

Chaos tests in staging: failure of% of nodes/regions, degradation of DHT/TURN/KMS/DNS, "retray storm."

Load/remarket tests: canary regions and hot routes.

13) Communications and Compliance Procedures

Release notes: what changes, influence, windows/deadlines of deprecate, actions for partners.
SLA of incident responses: MTTA ≤ 5 min, first status update ≤ 15 min, post-mortem ≤ 72 h.
Trace audit: linking all config changes and releases to applications/sites, signatures of artifacts.

14) Special cases

Payment/financial flows: strict idempotency, dedup according to idempotency-key, outbox + CDC, "non-destructive" migrations only.
WebSocket/streams: protocol version in handshake, reconnect with summary (resume tokens).
Cache/edge: 'stale-while-revalidate', dual cache versions, TTL hygiene during release period.
Mobile clients: phased rollout in sectors, forced update on security releases.

15) Zero-downtime checklist

1. Contract compatibility and the registry scheme are configured.
2. Expand→Migrate→Contract is described and automated.
3. Balance/Ingress supports blue-green and connection drainage.
4. Canary-pipeline with SLO-gates and auto-rollback.
5. Feature-flags and kill-switch are available 24/7.
6. Outbox + CDC and idempotency are enabled for all write paths.
7. Release-health dashboards and burn-rate alerts are active.
8. Communications and deprecate policy announced to partners in advance.
9. Weekly rehearsal rollback; quarterly chaos-day.

16) Glossary

Progressive delivery - phased release of features with risk control.
Schema registry - a repository of schema versions with compatibility policies.
Outbox/CDC - a template for guaranteed publication of events from transactions.
Blue-Green - parallel stacks with atomic traffic switching.
Canary - gradually increasing the share of traffic on the new version.
Graceful shutdown/drainage - correct termination of active connections.

Bottom line: zero downtime is not one trick, but a system: contracts, scheme compatibility, phased release strategies, observability, safe migrations and guaranteed rollbacks. Following this framework, the ecosystem is updated quickly, predictably and without pain for users and partners.

Contact

Get in Touch

Reach out with any questions or support needs.We are always ready to help!

Telegram
@Gamble_GC
Start Integration

Email is required. Telegram or WhatsApp — optional.

Your Name optional
Email optional
Subject optional
Message optional
Telegram optional
@
If you include Telegram — we will reply there as well, in addition to Email.
WhatsApp optional
Format: +country code and number (e.g., +380XXXXXXXXX).

By clicking this button, you agree to data processing.