GH GambleHub

Central control dashboard

1) Purpose and principles

Central control dashboard (hereinafter referred to as CDU) is a single window for making decisions in operations. It aggregates signals from telemetry, ITSM, CI/CD, service catalog, work calendar and providers, turning them into actionable widgets.

Principles:
  • SLO-first: top - target SLO and burn-rate by Tier-0/1.
  • One-click to action: from the widget - to the playbook/runbook or ticket.
  • Unified dictionary: the same SEV, statuses, colors and thresholds.
  • Event annotations: releases/configs/windows on all graphs.
  • Roles and permissions: personal views (on-call, IC, management).
  • Low noise - source quorum, deduplication, and windowing.

2) Roles and key scenarios

On-call (P1/P2): quickly understand "what is on" and open the playbook (≤1 click).
IC: declare SEV, start war-room-mode, control cadence of com-updates.
Release Manager: see gates, canary progress, rollback readiness.
Service Owner/Product: business SLI (success of payments/registrations), impact of features.
SRE/Platform: capacity, autoscale, anomalies, DR-readiness.
FinOps: $/unit, overspending, budget alerts.
Security/Legal: posture, key certificates, rotation windows, WORM audit links.

3) CDA Information Architecture

Top shelf (hero panel):
  • SLO по Tier-0/1 (availability/latency/success) с burn-rate 2-окна.
  • SEV status: active incidents and their timeline.
  • Release status: canary/blue-green, active gates.
  • Traffic lights providers (PSP/KYC/CDN).
Middle shelf (operating):
  • Maintenance windows (now/24h), suppression card.
  • Capacity: CPU/RAM/IO/queue-depth/p95 latency with forecast.
  • FinOps: $/1k txn, daily spend vs budget, log volume anomalies.
  • DataOps: freshness of showcases, SLA pipelines, DQ errors.
  • Security: certificate term, secret rotation, critical vulnerabilities (age/SLA).
Lower shelf (diagnostics/drill- ดาวn):
  • Correlations "release ↔ SLO," "provider ↔ failure/latency."
  • Quick links: logs, trails, tickets, playbooks, SOP, escalation matrix.

4) Widgets (reference set)

1. SLO & Burn-rate

Shows the current SLI, target, and error budget consumption (1h/6h).
Action: open the service degradation playbook.

2. Incidents (SEV panel)

Active/Recent, Declare/Comms Timers, IC/Comms Roles.
Action: open war-room, update template, IC checklist.

3. Releases/Configs

Canary 1→5→25%, flags, rollback (button/SOP link).
Annotations: version, commits, author.

4. Maintenance windows

Current/upcoming, impacted-services/regions; suppression mask.
Action: Coordinate notifications, enable SLO guards.

5. Capacity/Autoscale

Consumption forecast (Naive/AR), hotspot-card, warm-pool.
Action: request quotas/scale rules (PR to repo policy).

6. FinOps

$/unit, top "expensive" queries/logs, daily burn vs budget.
Action: open the report and recommendation (sampling logs, archives).

7. Providers

SLA/PSP/KYC/CDN status, route weights, folback readiness.
Action: switch weight, communication template to partners.

8. Security

Certificates (≤30d), delays in rotations, vulnerabilities (age), suspicious events.
Action: open IR playbook/ticket.

9. DataOps

Window freshness, skip percentage, pipeline failure, DLQ.
Action: Backfill/quarantine/rollback transformation.

5) States/colors/thresholds (reference)

Green: SLI within target, burn-rate <1 ×.
Amber: SLI degrades, burn-rate 1-2 ×, p95 growth, but there is a workaround.
Red: breach or predictive burn-out <1h; open SEV-1/0.
Grey: suppression, no telemetry (source error).

6) Annotations and correlations

Release/config/window/provider statuses are displayed on SLO graphs.
Click on the → diff marker, author, gates, Rollback/Folback/SOP button.
In the incident, the timeline is built from ChatOps annotations and actions.

7) Data sources and verification

Telemetry: metrics/trails/logs with trace_id.
ITSM: Incidents/Issues/Changes (Statuses/SLAs).
CI/CD: releases, signatures, artifacts, tests.
Service directory/CMDB: owners, SLO, dependencies.
Calendar: maintenance windows.
Providers: status-API + manual confirmations (landing in a separate showcase).
FinOps: billing/resource tags, log volumes, egress.

Quality control: quorum, duplicate probes, SLA freshness, alerts to "dumb" sources.

8) Display modes

War-room: fixed layout SLO/Incidents/Releases/Comms-timer.
Executive (28 days): trends MTTR/MTTD/SEV mix, $/unit, SLO-adherence.
On-call: compact "night" panel (dark mode, large numbers).
Multi-tenant/region: service/region/tenant filters; presets.

9) Navigation and actions (one-click)

Buttons: '/declare sev1 ', '/freeze', '/rollback ', '/status update', 'open playbook'.
Drill- ดาวn: SLO → graph → logs/trails with prefilled filters (trace_id, release_id).
Sharing: snapshot of panels in a ticket/status page.

10) Security, access, audit

SSO/OIDC + RBAC/ABAC: roles and scopes (view/action).
JIT/JEA: The "dangerous" action is only available with a temporary raise.
Audit unchangeable: who pressed what, which requests/commands left.
Secrets: not displayed, only links to the secret manager.

11) CDU Maturity Metrics

Actionability ≥ 90%: Clicks lead to actions, not just graphs.
Time-to-First-Action ≤ 2 min from CCD during SEV-1/0.
The proportion of incidents where the CDU was a "source of truth" ≥ 95%.

Freshness of widgets: % with data "fresh 5 minutes."

Coverage:% of critical services with SLO cards and release annotations.
Zero-blind-spots: silent sources for the week = 0.

12) Checklists

Design

  • Roles and scripts are described (P1/P2/IC/Exec/FinOps/Security/DataOps).
  • The color/SEV/threshold dictionary is consistent.
  • DataSources with quorum and freshness SLAs.
  • War-room/On-call/Executive layouts.
  • ChatOps/ITSM/CI/CD/CMDB Integration Plan.

Operation

  • Widgets pass linter (required fields, owner, thresholds).
  • Once a week - Escalation/Alert Review with DPC improvements.
  • Incident snapshots are attached to the AAR/RCA.
  • Dark Mode/Mobile Duty Preset.
  • Tests for "mute" sources and correctness of annotations.

13) Templates (ideas)

13. 1 Widget Definition (YAML)

yaml id: slo-payments title: "SLO: Success of payments (EU)"
owner: team-payments type: slo_burnrate sli:
metric: "biz. payment_success_ratio"
target_pct: 99. 5 burn_rate:
short_window: "1h"
long_window: "6h"
thresholds:
amber: { burn_rate: 1. 2 }
red:  { burn_rate: 2. 0 }
actions:
- label: "Open playbook"
link: "rb://payments/slo-degrade"
- label: "Release rollback"
link: "sop://REL-ROLLBACK-01"
annotations:
release: true change: true filters:
region: "eu"
tier: "0"

13. 2 Incident Card (JSON)

json
{
"id": "incidents-active",
"type": "incident_board",
"sev": ["SEV-0", "SEV-1", "SEV-2"],
"fields": ["id","sev","service","since","ic","next_comms_at"],
"actions": [{"label":"War-room","cmd":"/declare sev1"}]
}

13. 3 Connection with the release

yaml id: release-canary type: release_progress source: cicd://checkout gates: ["tests","signatures","slo_guardrails"]
canary_steps: [1,5,25]
rollback: "sop://REL-ROLLBACK-01"
annotations: { on_charts: ["slo-latency","slo-success"] }

13. 4 FinOps widget

yaml id: finops-burn type: cost_unit metrics:
- id: "cost_per_1k_txn"
- id: "logs_daily_gib"
alerts:
- when: "cost_per_1k_txn > target1. 2"
action: "open://finops/reco-logs-sampling"

14) Anti-patterns

"Wall of graphs" without actions and playbooks.
Different colors/thresholds on commands → confusion in SEV.
No release/window annotations - complex cause correlation.
Duplicate sources without quorum are false Page/noise.
Secrets/keys on the panel - risk of leakage.
Slow render (requests/aggregations are not cached) - panels are not opened in battle.

15) Implementation Roadmap (4-8 weeks)

1. Ned. 1: collection of requirements by roles, dictionary of statuses/colors, layouts of three modes.
2. Ned. 2: SLO/Incidents/Releases/Windows connection, annotations, ChatOps actions.
3. Ned. 3: add FinOps/Capacity/Providers/DataOps/Security, quorum of sources.
4. Ned. 4: War-room mode, snapshots in ITSM, pilot on Tier-0.
5. Ned. 5-6: performance optimization, mobile/on-call preset, widget linter.
6. Ned. 7-8: maturity metrics, weekly review, automatic recommendations (sampling logs, quotas, folback).

16) The bottom line

CDUs are not "beautiful graphs," but a panel of solutions: SLO and burn-rate from above, incidents/releases/windows in one context, instant actions via ChatOps and SOP, confirmed sources and annotations. This dashboard reduces MTTA/MTTR, simplifies communications, supports FinOps and makes operation transparent and predictable.

Contact

Get in Touch

Reach out with any questions or support needs.We are always ready to help!

Telegram
@Gamble_GC
Start Integration

Email is required. Telegram or WhatsApp — optional.

Your Name optional
Email optional
Subject optional
Message optional
Telegram optional
@
If you include Telegram — we will reply there as well, in addition to Email.
WhatsApp optional
Format: +country code and number (e.g., +380XXXXXXXXX).

By clicking this button, you agree to data processing.