Standard Operating Procedures
1) SOP nima va nima uchun kerak
SOP (Standard Operating Procedure) - tushunarli kirish/chiqish, rollar va sifat mezonlari bilan takrorlanadigan operatsiyalar uchun rasmiyatlashtirilgan, tasdiqlangan qadamlar ketma-ketligi.
SOP maqsadlari:- Bajarish o’zgaruvchanligi va xavflarni kamaytirish.
- Tayyor harakatlar hisobiga MTTA/MTTRni qisqartirish.
- Komplayens va audit: takrorlanuvchanlik, izlanuvchanlik.
- Onbording: ta’limni tezlashtirish va «shadow → solo».
SOP ≠ pleybuk: pleybuk - yoyilgan yechimlar daraxti, SOP - muayyan stsenariy uchun chiziqli reglament (yoki pleybuk shoxi).
2) «Yaxshi» SOP tamoyillari
Outcome-Driven: natijaga e’tibor qaratish (SLO/biznes mezonlari), nafaqat qadamlar.
Aniqlik: buyruqlar, parametrlar, kutilayotgan effektlar va nazorat nuqtalari.
Andoza xavfsizlik: geytlar, limitlar, backout/rollback belgilangan.
Minimal kontekst: qisqa izohlar + batafsil runbook/diagnostika bogʻlamalari.
Dolzarbligi: revyu sanasi, egasi, versiyasi, amal qilish muddati.
Bajaruvchanlik: JIT/JEA kirish joylari, boʻgʻinlarni tekshirish, artefaktlar shablonlari.
3) SOP standart tuzilmasi (skelet)
ID/Version/Review Date
Name and short purpose (what and why)
Scope (Services/Regions/Tenants, SEV/Risk)
Roles and Responsibilities (RACI: R/A/C/I)
Preconditions (accesses, windows, stage, reserve, artifacts)
Materials/tools (dashboards, feature flags, repos, keys)
Quality gates (SLO-gardrails, quorum of probes, alerts)
Step-by-step instruction (step → command → expected result → verification)
Branches (if X - perform Y) [minimum]
Backout/Rollback (start conditions, steps, verification)
Communications (who, when, where; message templates)
Evidence (what to save: screenshots, logs, chexums, links)
Completion (success criteria, watching who closes the ticket)
Change History (What, By Whom, and Why)
4) SOP katalogi va egalik
’domain/ops’,’service/checkout’,’risk/high’,’provider/psp-a’.
Egasining kartochkasi: buyruq, navbatchi aloqalar, zaxira egasi.
SLA dolzarbligi (masalan, har ≤ 90 kunda yoki hodisa/relizdan keyin qayta ko’rib chiqish).
SOP (CI) linter/validator: tuzilishini, havolalarini, egalarini, muddatini tekshirish.
5) SOP hayot sikli
1. Tashabbus (hodisa/mashq/yangi jarayondan keyin).
2. Loyiha (muallif = xizmat/jarayon egasi).
3. Revyu (SRE/Security/Legal/Comms - domen boʻyicha).
4. Uchuvchi (tabletop/game day): vaqtni o’lchaymiz, topilmalar → tuzatishlar.
5. Nashr qilish (versiya, sana, raqam, CMDB/servis katalogidagi namunalar).
6. Operatsion qo’llash (tiket/chatlarda izohlar, evidence yig’ish).
7. Yangilash (RCA/CAPA bo’yicha, revyu muddati bo’yicha, arxitektura o’zgarishlari bo’yicha).
8. Arxivlash/deprekatsiya (yangi SOP/pleybuk bilan almashtirildi).
6) Qo’shni artefaktlar bilan aloqalar
Pleybuklar: SOP - pleybuk ichidagi «chiziqli novda»; qadamlardan havola.
Runbook’i: texnik tafsilotlar/skriptlar runbook ichiga kiritilgan, SOP havola qiladi.
Siyosatlar (Policy-as-Code): ruxsat, retensiya, RBAC - majburiy havolalar.
SLO/SLI: muvaffaqiyat mezonlari va garde-rails.
Eskalatsiya matritsasi: SOP muvaffaqiyatsiz tugaganida rollar/tayminglar.
Xizmat koʻrsatish oynalari: high-risk SOP uchun slot/comma talablari.
7) SOP samaradorligi metrikasi
Time-to-Execute (mediana/p95) - protsedura qancha vaqtni oladi.
Success Rate - eskalatsiya/qaytarishsiz muvaffaqiyatli ijrolar ulushi.
Evidence Completeness - artefaktlarning to’liqligi.
SLO Impact - qadam (burn-minut) vaqtida/undan keyin tanazzulga uchraydimi?
Defect Density - 10 SOP da g’azab/mashqlar paytida izohlar.
Freshness - 90 kunga ≤ bo’lgan SOP ulushi.
Adoption - SOPga qancha alert/oynalar bogʻlangan.
8) SOP muallifining chek-varaqasi
- Qo’llashning maqsadi va chegaralari aniqlangan.
- Rollar, kirish joylari va derazalar tasvirlangan.
- Sifat va SLO o’lchovli, signal manbalari mavjud.
- Qadamlar bajariladi: buyruqlar/skriptlar, kutilgan natijalar, tekshirish.
- Backout/rollback va ishga tushirish mezonlari aniq.
- Komm namunalari ilova qilingan.
- Evidence roʻyxati tuzildi.
- Versiya/sana/egasi/revyu ko’rsatilgan.
9) SOP ijrochisining chek-varaqasi
- JIT/JEA’ning dastlabki shartlari va kirish imkoniyatlari tasdiqlandi.
- Chipta/war-room ochildi va izohlar yoqildi.
- Kuzatish: kerakli dashbordlar/alertlar ochiq.
- Tartib bo’yicha qadamlarni bajaryapman; har biridan keyin - verifikatsiya qilish.
- Gardreyllar buzilganda - darhol backout va eskalatsiya.
- Evidence to’ldirilgan; SLO/biznes-SLI yakuniy tekshiruvi.
- Xabar yopildi, maqom sahifasi/komms yangilandi.
10) SOP namunalari (parchalar)
10. 1 SOP: Relizning kanar orqaga qaytishi (REL-ROLLBACK-01)
The goal: to return the stable version when the burn-rate is exceeded or the p99 grows.
Scope: checkout-api service (prod, EU).
Roles: Release (R), IC (A in SEV-1), P1 (R), Comms (I).
Preconditions: feature flags are ready; JEA accesses; release-annotations included.
Gates: slo. payment_success, http_p99; quorum synthetic EU/US + RUM.
Steps:
1) Freeze unrelated depleys.
2) rollback to tag v2. 3. 7 (command...) → waiting 5 minutes.
I expect: p99↓, error_rate↓, burn-rate <threshold.
3) Business SLI check (payment success, conversion) 10 min.
4) Remove the suppression of alerts; update release annotation.
Backout: if rollback does not help - escalate to IC, enable degrade-UX, consider failover.
Comms: "Rolled back; metrics stabilize; next update in 15 minutes."
Evidence: before/after screenshots, link to dashboards, command and output.
Completion: 30 min green SLOs; close the ticket; assign an RCA (if SEV-1).
Version: 1. 6 (2025-10-28)
10. 2 SOP: Rejali yangilash DB (MW-DB-UPGRADE-02)
Purpose: update PostgreSQL minor without data loss.
Area: payments-db (prod EU), 02: 00-04: 00 Europe/Kyiv.
Roles: DB Lead (R), SRE (C), Service Owner (A), Comms (R clients).
Preconditions: OK backups; replica in sync; Test upgrade passed.
Gates: lag≤30s, error_rate<0. 5%, p99 <400ms, SLO green 30m.
Steps:
1) Transfer traffic to canary replica 1%→5%→25%; SLI monitoring.
2) Consistently upgrade secondary nodes → switch over → upgrade of the former primary.
3) Restore replication, check consistency.
Backout: promote stable replica; return writer; rolling back packets.
Comms: T-7/-2 days and T-60/-15 min alert; updates q = 30m during the window.
Evidence: migration logs, checksums, p95/p99 graphs.
Completion: observation 60m without burn; MW report with evidence.
Version: 2. 1 (2025-09-12)
10. 3 SOP: PSP provayderini almashtirish (PROV-PSP-SWITCH-01)
Objective: to maintain payment success_ratio in case of PSP-A degradation.
Trigger: PSP-A red/partial status + success_ratio% ≥2 drop.
Steps:
1) Install weights: PSP-A 30%, PSP-B 70%.
2) Turn on the degrade_payments_ux; enhance retrays (within SLA).
3) Monitor fraud_rate/chargeback-risk 30m.
Backout: Regain weights at green SLI 60m.
Comms: status page (first ≤15m, cadence 30m).
10. 4 SOP: DATA-BACKUP-RESTORE-CHECK-03 tiklanishini tekshirish
Objective: weekly verification of recoverability.
Steps: lift from backup in isolation → hash control → consistency requests → report.
Success criterion: time-to-restore ≤ 45 min; 100% integrity.
11) SOP atrofida avtomatlashtirish
SOP shablonizatori: RACI/geyt/komm-blok bilan skelet yaratish.
Bot-ijrochi: chek-boksli qadamlar, taymerlar, cadence bo’yicha eslatmalar, evidence avtosaloni.
CMDB/katalog bilan integratsiya: servisda - tegishli SOPlar roʻyxati.
Telemetriya izohlari: «SOP-RUN: <ID> step N» → tezkor tahlil.
Kirish siyosati: deploy/oyna faqat yashil SOP geytlarida boshlanadi.
12) Anti-patternlar
SOP egasi/sanasi bo’lmagan SOP - «o’lik» hujjat.
Muvaffaqiyat va backout mezonlarisiz shishirilgan ko’rsatmalar.
Kelishilmagan buyruqlar/kalitlar - xato va sizib chiqish xavfi.
Wiki va repozitoriyadagi turli xil versiyalar - haqiqat manbalarining tafovutlari.
Yo’q evidence - sifat/komplayensni tasdiqlash uchun hech narsa yo’q.
«Barcha holatlar uchun bitta SOP» - bajarish qobiliyati yo’qoladi.
13) Joriy etish yo’l xaritasi (4-6 hafta)
1. Ned. 1: SOP namunasi, linter va katalog tasdiqlansin; eng yaxshi 10 ta ssenariyni tanlash.
2. Ned. 2: relizlar/qaytarish/provayder/bekaplar uchun SOP yozish; tabletop uchuvchilari.
3. Ned. 3: ChatOps-bot va telemetriya izohlarini ulash; alertlarni SOP bilan bogʻlash.
4. Ned. 4: choraklik jadval; Freshness/Success Rate metriklarini kiriting.
5. Ned. 5-6: tanqidiy operatsiyalarning 90 foizini qoplash; DR/Security-SOP; evidence yig’imini avtomatlashtirish.
14) Jami
SOP operatsiyalarni oldindan aytib bo’ladigan va tekshiriladigan qiladi: yagona sifat geytalari, batafsil qadamlar, aniq rollar va qaytariluvchanlik. Pleybuklar, siyosatchilar, SLO va avtomatlashtirish bilan birgalikda bu ekspluatatsiyani ishonchli ishlab chiqarish liniyasiga aylantiradi - tezkor reaktsiyalar, minimal xavf va tushunarli javobgarlik.