Circuit Breaker va retrailar
Circuit Breaker va retraylar
1) Nima uchun bu zarur?
Tarmoqlar ishonchsiz: latentlik pulsatsiyalanadi, uzellar tushadi, limitlarga erishiladi. Retrajlar qisqa muddatli nosozliklardan qutqaradi, Circuit Breaker esa tizimni kaskadli nosozliklar va «DDoS» dan himoya qiladi. To’g "ri taymautlar va limitlar bilan kombinatsiya SLOni saqlab qoladi, dumdagi kechikishlar va" to’qqizlik "narxini barqarorlashtiradi.
2) Bazaviy prinsiplar
Avval taymautlar, keyin retrajlar, keyin Circuit Breaker.
Retraim faqat idempotent operatsiyalari (GET, xavfsiz POST/PUT idempotent kalitli).
Retraj byudjetini ajrating: marshrut uchun boshlang’ich RPSning 10-15% ≤.
Rad etishni mahalliylashtiring: bulkhead (alohida pullar/kvotalar) + rate-limit.
Tanazzulga uchraganda - tezda rad etish (fail-fast), graceful-degradation/zagushki.
3) Retraylar semantikasi
Qachon retraj qilish
Tranziyent xatolar: timeouts, 5xx, tarmoq, 429 (’Retry-After’dan keyin).
Aniq biznes xatolari (4xx ≠ 429), idempotentsiz side-effects (kalitsiz to’lov).
Strategiyalar
Exponential backoff + jitter (to’liq yoki bir tekis): retray to’dalarini tekislaydi.
Max attempts: 1-2 (kamdan-kam hollarda 3) - odatda ko’proq zararli.
Budget: service va per-request «retry tokens» uchun global retray/sek hisoblagich.
Hedging (kamdan-kam hollarda): t-kvantilidan (p95) keyin so’rovning parallel dubli - faqat qat’iy idempotent o’qish uchun.
python base = 100 # ms for attempt in range(1, max_attempts+1):
try:
return call()
except Transient as e:
if attempt == max_attempts: raise sleep_ms = min(cap_ms, base 2(attempt-1))
sleep(random(0, sleep_ms)) # full jitter
4) Taymautlar va «tezkor rad etish»
Client timeout <upstream timeout: «zombi» soʻrovlarini jamlamaslik uchun.
Делите: connect timeout, read timeout, overall deadline.
Tail-aware taymautlari: p95/p99 + kichik zaxirani nishonga oling.
Umumiy muddat maydonidan foydalaning (masalan, gRPC’deadline’) va uni zanjir boʻylab pastga tashlang.
5) Circuit Breaker: qanday ishlaydi
Holatlar:- Closed: trafikni oʻtkazib yuboradi, xato/yashirin deb hisoblaydi.
- Open: tezda muvaffaqiyatsiz tugadi (yoki zaxira javob).
- Half-Open: tekshirish soʻrovlari; muvaffaqiyatli bo’lsa - yopiladi.
- Xatolar/taymautlar N soʻrov/soniya oynasi uchun X% ulushidan oshadi yoki chegaradan yuqori p99.
- Relevant harakatlanuvchi statistika va minimal hajm (masalan, 50 so’rovdan ≥).
6) Bulkhead, kvotalar va «bo’lish va hukmronlik qilish»
per-upstream va per-fich birikmalarining alohida pullari.
In-flight so’rovlariga kvotalar; ortiqcha - tez rad etish.
Kamchilikda - past ustuvorlikdagi fich (feature flags) ning degradatsiyasi.
7) Perimetr bilan integratsiya (Envoy/Istio/Nginx)
Envoy (retry + outlier + CB, g’oya):yaml routes:
- match: { prefix: "/api" }
route:
cluster: upstream_api timeout: 2s retry_policy:
retry_on: "connect-failure,reset,retriable-4xx,5xx"
num_retries: 2 per_try_timeout: 600ms retry_back_off: { base_interval: 100ms, max_interval: 800ms }
hedge_policy:
hedge_on_per_try_timeout: true initial_requests: 1 additional_request_chance: { numerator: 5, denominator: HUNDRED } # 5%
clusters:
- name: upstream_api circuit_breakers:
thresholds:
- priority: DEFAULT max_connections: 500 max_requests: 1000 max_retries: 200 outlier_detection:
consecutive_5xx: 5 interval: 5s base_ejection_time: 30s max_ejection_percent: 50
Istio (VirtualService fault/retry, qisqacha misol):
yaml apiVersion: networking. istio. io/v1beta1 kind: VirtualService spec:
hosts: ["payments"]
http:
- route: [{ destination: { host: payments } }]
timeout: 2s retries:
attempts: 2 perTryTimeout: 600ms retryOn: "5xx,connect-failure,refused-stream,reset"
Nginx Ingress (izohlar):
yaml nginx. ingress. kubernetes. io/proxy-connect-timeout: "2"
nginx. ingress. kubernetes. io/proxy-read-timeout: "2"
nginx. ingress. kubernetes. io/proxy-next-upstream: "error timeout http_502 http_503 http_504"
nginx. ingress. kubernetes. io/proxy-next-upstream-tries: "2"
8) Kutubxonalar va kod (stek-snippetlar)
Java (Resilience4j):java var cb = CircuitBreaker. ofDefaults("psp");
var retry = Retry. of("psp-retry",
RetryConfig. custom()
.maxAttempts(2)
.waitDuration(Duration. ofMillis(200))
.intervalFunction(IntervalFunction. ofExponentialRandomBackoff(100, 2. 0, 0. 5) )//jitter
.retryExceptions(SocketTimeoutException. class, IOException. class)
.build());
Supplier<Response> decorated =
CircuitBreaker. decorateSupplier(cb,
Retry. decorateSupplier(retry, () -> client. call()));
return Try. ofSupplier(decorated)
.recover(BusinessException. class, fallback())
.get();
Go (context deadline + backoff):
go ctx, cancel:= context. WithTimeout(context. Background(), 2time. Second)
defer cancel()
var lastErr error for i:= 0; i < 2; i++ {
reqCtx, stop:= context. WithTimeout(ctx, 600time. Millisecond)
lastErr = call(reqCtx)
stop()
if lastErr == nil { break }
sleep:= time. Duration(rand. Intn(1<<uint(7+i))) time. Millisecond // full jitter time. Sleep(min(sleep, 800time. Millisecond))
}
if lastErr!= nil { return fastFail() }
Node. js (got + p-retry):
js import pRetry from 'p-retry';
await pRetry(() => got(url, { timeout: { connect: 500, request: 2000 } }), {
retries: 2,
factor: 2,
randomize: true,
minTimeout: 100,
maxTimeout: 800,
onFailedAttempt: e => { if (isBusiness(e)) throw e; }
});
9) Retraylar va SLO budjeti
retry tokens kiriting: har bir retray tokenni sarflaydi; puli cheklangan.
Error-budget bilan bog’lang: burn-rate ostonadan yuqori bo’lsa, retrajlarni o’chirib qo’ying, CBni tez-tez oching, degradatsiyani yoqing.
Kanareya relizlari: kanareyalarda urinishlar va tokenlarni kamaytiring.
10) Hedging (ehtiyot bo’ling)
Yutqazuvchini bekor qilish orqali p95 muddatdan keyin qoʻshimcha soʻrovni ishga tushiring.
Faqat o’qish va «xavfsiz» idempotent operatsiyalari uchun; ulushni cheklang (1-5% ≤).
Ortiqcha yukni kuzatib boring.
11) Kuzatish
Rate, Error, Duration (p50/p95/p99) yo’nalishlari bo’yicha RED-metriklar.
CB-metriklar: holati (open/half-open), ochish chastotasi, oʻtkazib yuborilgan/rad etilgan soʻrovlar.
Retralar: attempts/request, retry-rate, yoqilgan tokenlar.
Perimetri: outlier-ejection, ejection-rate.
Treyslar:’retry _ attempt’,’cb _ state’,’hedged = true’izohlarini oling,’trace _ id’ni tashlang.
12) Arxitektura bilan integratsiya
Bulkhead + CB har bir tanqidiy oqim uchun.
Navbatlar/asinxron: aqldan ozgan taymautlar o’rniga uzoq operatsiyalar uchun.
Kesh/boʻshliqlar: «fail-open» da tanqidiy boʻlmagan fichlar uchun.
Avtoskeyl: yomon retrajlarni qoplamaydi - avval «bo’ron» ni to’xtating.
13) Anti-patternlar
Taymautsiz retrajlar → «osilgan» konnektlar va pullarning kamayishi.
Noidempotent operatsiyalarni takrorlash (ikki marta hisobdan chiqarish).
Cap va jittersiz cheksiz eksponensial o’sish.
Barcha vazifalar uchun yagona CB → nosozlikni butun mahsulotga koʻchirish.
429/’ Retry-After’ga eʼtibor bermaslik.
Mijozning taymauti apstrimnikidan uzoqroq (yoki umuman emas).
Biznes xatolarni retrajlar bilan «davolash».
14) Joriy etish chek-varaqasi (0-30 kun)
0-7 kun
Yo’nalishlar va ularning idempotentligini aniqlang.
Vaqtni belgilang (connect/read/overall), minimal retrajlarni (× 1) va andoza CBni yoqing.
Asosiy apstrimlar uchun pullar/kvotalarni ajrating.
8-20 kun
Jitter va global retray byudjetini, retry-rate alertlarini kiriting.
Perimetrda outlier-ejection moslamasini oʻrnating, low-prio fich uchun tezkor rad etish.
Dashbordlar RED + CB/Retry, tegli treyslar.
21-30 kun
Kanareya profillari (kamroq urinishlar), game-day «apstrim sekin/flapayt».
Siyosatni hujjatlashtiring: kim/nima retrayt, chegaralar, istisnolar.
p95/p99 va vaqtlarni ko’zga emas, balki ma’lumotlarga qarab ko’rib chiqing.
15) Etuklik metrikasi
100% yo’nalishlarda taymautlar va hujjatlashtirilgan retraj/SV siyosati mavjud.
Retry-rate byudjetga to’g’ri keladi (10-15% ≤), hodisalarda hech qanday ko’tarilish bo’lmaydi.
CB butun hovuz tushishidan oldin ishlaydi; kaskadli nosozliklar yo’q.
Treyslar/hedging urinishlarini ko’rsatadi; p99 cho’qqilarda barqaror.
Kanar relizlari retraylarning «ehtiyotkor» profilidan foydalanadi.
16) Konfiguratsiyalarning qisqa namunalari
Resilience4j YAML (Spring Boot, идея):yaml resilience4j:
circuitbreaker:
instances:
psp:
slidingWindowType: COUNT_BASED slidingWindowSize: 100 minimumNumberOfCalls: 50 failureRateThreshold: 50 waitDurationInOpenState: 30s permittedNumberOfCallsInHalfOpenState: 5 retry:
instances:
psp:
maxAttempts: 2 waitDuration: 200ms enableExponentialBackoff: true exponentialBackoffMultiplier: 2. 0 retryExceptions:
- java. net. SocketTimeoutException
- java. io. IOException
Envoy rate-limit (g’oya parchasi):
yaml rate_limits:
- actions:
- generic_key: { descriptor_value: "api. payments" }
17) Xulosa
Barqarorlik - bu intizom: taymaut → retrai (jitter va byudjet bilan) → Circuit Breaker + bulkhead/kvotalar va tezkor rad etish. Perimetrni moslashtiring (outlier-ejection), RED/CB/Retry dashbordlarini osib qo’ying, idempotentlik siyosatini tuzating va biznes-SLIni unutmang. Shunda qisqa nosozliklar sezilmay qoladi va haqiqiy hodisalar kaskadli qulashga aylanmaydi.