Stream vs Batch tahlili
1) qisqacha mazmuni
Stream - bir soniyada sodir bo’lgan voqealarni doimiy ravishda qayta ishlash: antifrod/AML, RG-triggerlar, SLA-alertlar, operativ panellar.
Batch - to’liq takrorlanadigan davriy qayta hisob-kitob: tartibga soluvchi hisobot (GGR/NGR), moliyaviy sverkalar, ML-datasetlar.
Taxminlar: Stream p95 e2e 0. 5-5 s, Batch D + 1 dan 06:00 gacha (lok.) .
2) Tanlov matritsasi (TL; DR)
80/20-qoida: reaktsiyani talab qilmaydigan hamma narsa <5 daqiqa - Batch; qolganlari - kechasi Batch validatsiyasiga ega bo’lgan Stream.
3) Arxitektura
3. 1 Lambda
Konsolidatsiya uchun onlayn + Batch uchun oqim. Plyus: moslashuvchanlik. Minus: ikkita mantiq.
3. 2 Kappa
Hammasi oqimga oʻxshaydi; Batch = log orqali «replay». Plyus: yagona kod. Minus: repleyning murakkabligi/qiymati.
3. 3 Lakehouse-Hybrid (tavsiya etilgan)
Stream → operativ OLAP-mart (daqiqa) va Bronze/Silver; Batch Gold (D + 1) ni qayta tanlaydi va hisobotlarni nashr etadi.
4) Ma’lumotlar va vaqt
Stream
Oynalar: tumbling/hopping/session.
Watermarks: 2-5 daqiqa; late data belgilanadi va emitatsiya qilinadi.
Stateful: CEP, dedup, TTL.
Batch
Inkrementlar/CDC:’updated _ at’, log-replikatsiya.
SCD I/II/III: atributlar tarixi.
Snapshotlar: «as-of» uchun kunduzgi/oylik qatlamlar.
5) iGaming’da qo’llash patternlari
AML/Antifrod: Stream (velocity/strukturalash) + Batch solishtirmalar va keyslar.
Responsible Gaming: Stream limitlar/o’z-o’zidan istisnolarni nazorat qilish; Batch hisobot reyestrlari.
Operatsiyalar/SRE: Stream alert SLA; Batch hodisalar va trendlarni post-tahlil qilish.
Mahsulot/marketing: Stream personalizatsiya/missiyalar; Batch kogortlari/LTV.
Moliya/hisobotlar: Batch (Gold D + 1, WORM-paketlar), Stream - operativ panellar.
6) DQ, takrorlanuvchanlik, reple
Stream DQ: sxemalar validatsiyasi, dedup’(event_id, source)’, completeness oyna, late-ratio, dup-rate; tanqidiy → DLQ.
Batch DQ: noyoblik/FK/range/temporal, OLTP/provayderlar bilan solishtirish; tanqidiy → fail job + hisobot.
- Stream: + deterministik transformatsiya diapazoni boʻyicha topiklarni takrorlash.
- Batch: time-travel/mantiq versiyasi (’logic _ version’) + Gold snapshotlari.
7) Xususiy va rezidentlik
Stream: taxalluslashtirish, online-niqoblash, mintaqaviy konveyerlar (EEA/UK/BR), tashqi PII-lookups uchun taymautlar.
Batch: PII-mappinglar, RLS/CLS, DSAR/RTBF, Legal Hold, WORM-arxivlarni izolyatsiya qilish.
8) Cost-injiniring
Stream: «issiq» kalitlardan qochish (salting), async lookups, TTL holatlarini cheklash, oldindan agregatsiya qilish.
Batch: partizatsiya/klaster, small files kompaksiyasi, barqaror agregatlarni materiallashtirish, kvota/ishga tushirish oynalari.
9) Misollar
9. 1 Stream - Flink SQL (10-min velocity depozitlar)
sql
SELECT user_id,
TUMBLE_START(event_time, INTERVAL '10' MINUTE) AS win_start,
COUNT() AS deposits_10m,
SUM(amount_base) AS sum_10m
FROM stream. payments
GROUP BY user_id, TUMBLE(event_time, INTERVAL '10' MINUTE);
9. 2 Stream - CEP (AML psevdokod)
python if count_deposits(10MIN) >= 3 and sum_deposits(10MIN) > THRESH \
and all(d. amount < REPORTING_LIMIT for d in window):
emit_alert("AML_STRUCTURING", user_id, snapshot())
9. 3 Batch - MERGE (Silver inkrement)
sql
MERGE INTO silver. payments s
USING stage. delta_payments d
ON s. transaction_id = d. transaction_id
WHEN MATCHED THEN UPDATE SET
WHEN NOT MATCHED THEN INSERT;
9. 4 Batch — Gold GGR (D+1)
sql
CREATE OR REPLACE VIEW gold. ggr_daily AS
SELECT
DATE(b. event_time) event_date,
b. market, g. provider_id,
SUM(b. stake_base) stakes_eur,
SUM(p. amount_base) payouts_eur,
SUM(b. stake_base) - SUM(p. amount_base) ggr_eur
FROM silver. fact_bets b
LEFT JOIN silver. fact_payouts p
ON p. user_pseudo_id = b. user_pseudo_id
AND p. game_id = b. game_id
AND DATE(p. event_time) = DATE(b. event_time)
JOIN dim. games g ON g. game_id = b. game_id
GROUP BY 1,2,3;
10) Metrika va SLO
Stream
p95 ingest→alert ≤ 2–5 c completeness окна ≥ 99. 5%
schema-errors ≤ 0. 1%
late-ratio ≤ 1%
foydalanish imkoniyati ≥ 99. 9%
Batch
Gold. daily soat 06:00 gacha tayyor.
completeness ≥ 99. 5%
validity ≥ 99. 9%
MTTR DQ-hodisa ≤ 24-48 soat
11) Test va relizlar
Kontraktlar/sxemalar: consumer-driven tests; back-compat CI.
Stream: kanareya qoidalari, qorong’u ishga tushirish, replay simulyatori.
Batch: namunalarda dry-run, metriklarni solishtirish, nazorat yig’indisi (reconciliation).
12) Anti-patternlar
Mantiqni takrorlash: formulalarni tekislamagan holda turli xil Stream va Batch hisob-kitoblari.
Kesh/taymautsiz Stream issiq yoʻlidagi sinxron tashqi API.
Full reload «har qanday holatda» inkrementlar o’rniga.
Watermarks/late siyosati mavjud emas.
tahliliy qatlamlarda PII; CLS/RLS yo’qligi.
Gold-vitrinalar, ular orqaga qaytadi.
13) Tavsiya etilgan gibrid (pleybuk)
1. Stream-kontur: ingest → shina → Flink/Beam (watermarks, dedup, CEP) →
1-5 daqiqalik panellar + Bronze/Silver (append) uchun OLAP (ClickHouse/Pinot).
2. Batch-kontur: inkrementlar/CDC → Silver normalizatsiya/SCD → Gold sutkalik vitrinalar/hisobotlar (WORM).
3. Kelishish: metriklarning yagona semantik qatlami; nightly solishtirmalar Stream Batch; tafovutlar> chegara → tiketlar.
14) RACI
R (Responsible): Streaming Platform (Stream-infra), Data Engineering (Batch modellari), Domain Analytics (metrika/qoidalar), MLOps (fichi/Feature Store).
A (Accountable): Head of Data / CDO.
C (Consulted): Compliance/Legal/DPO, Finance (FX/GGR), Risk (RG/AML), SRE (SLO/стоимость).
I (Informed): BI/Mahsulot/Marketing/Operatsiyalar.
15) Yo’l xaritasi
MVP (2-4 hafta):1. Kafka/Redpanda + 2 tanqidiy topika (’payments’,’auth’).
2. Flink-joba: watermark + dedup + 1 CEP-qoida (AML yoki RG).
3. OLAP-vitrin 1-5 daqiqa + dashbordlar lag/late/dup.
4. Lakehouse Silver (ACID), birinchi Gold. ggr_daily (D + 1 dan 06:00 gacha).
2-bosqich (4-8 hafta):- Inkrementlar/CDC, SCD II, metriklarning semantik qatlami.
- Oqimli DQ va nightly taqqoslash Stream Batch.
- Hududlashtirish (EEA/UK/BR), DSAR/RTBF, Legal Hold.
- Replay-simulyator, canary/A-B qoidalar/metriklar relizlari.
- Cost-dashbordlar va kvotalar; tiered storage; DR mashqlari.
- Vitrin/metrik va lineage hujjatlarini avtogeneratsiya qilish.
16) Joriy etish chek-varaqasi
- Registridagi sxemalar/kontraktlar; back-compat testlari yashil.
- Stream: watermarks/allowed-lateness, дедуп, DLQ; Prodda OLAP panellari.
- Batch: qo’shimcha/CDC, SCD II, Gold D + 1 va WORM eksporti.
- Metriklarning yagona semantik qatlami; nightly solishtirmalar Stream Batch.
- DQ-dashbordlar Freshness/Completeness/Validity; alertlar lag/late/dup.
- RBAC/ABAC, shifrlash, rezidentlik; DSAR/RTBF/Legal Hold.
- Nazorat ostidagi narx (cost/GB, cost/query, state size, repley kvotalangan).
17) Jami
Stream va Batch raqobatchi emas, balki bitta g’ildirakning ikkita g’ildiragi. Stream «bu erda va hozir», Batch - «ertalabki haqiqat». Lakehouse gibrid yondashuvi, metrikalarning yagona qatlami va DQ/lineage intizomi SLA va qiymati bo’yicha maqbul bo’lgan tezkor, takrorlanadigan va komplayent tahliliy konturlarni qurish imkonini beradi.