Ma’lumotlar ko’llari va oqimlar agregatsiyasi
1) Maqsadi va qiymati
Data Lake/Lakehouse - uzoq vaqt saqlash va kattalashtirilgan oʻqish uchun tayanch qatlami, bunda:- Mahsulotlar/o’yinlar/to’lovlar oqimi Bronzega «bo’lgani kabi» qo’nadi.
- Silver kelishilgan kalitlar va sifatni taʼminlash orqali normallashtiradi va boyitadi.
- Gold - BI, regulyator, antifrod/RG uchun agregatsiyalangan vitrinalar (shu jumladan real-/near-real-time).
Lakehouse-da oqimlarni agregatsiya qilish: hisobotlarning past kechikishi, oldindan aytib bo’ladigan qiymat, takrorlanuvchanlik va forensikani beradi.
2) Referens arxitektura
1. Ingest/Edge: HTTP/gRPC, OTel, batch endpoints → шина (Kafka/Redpanda).
2. Bronze (append-only): obyekt ombori + ACID-jadvallar (Delta/Iceberg/Hudi), partiyalar by date/market/tenant; boshlang’ich payloadni saqlash.
3. Stream Compute: Flink/Spark/Beam - deraza agregatlari, CEP, dedup, online-lookups.
4. Silver (clean/conform): valyutalar/taymzonlar normallashuvi, FK/ma’lumotnomalar, o’lchovlar uchun SCD.
5. Serving/OLAP: ClickHouse/Pinot/Druid - panellar uchun materiallashtirilgan daqiqali/soniyali agregatlar.
6. Gold (serve): kunduzgi/soatlik vitrinalar, tartibga soluvchi kesmalar, o’zgarmas eksport paketlari (WORM).
7. Boshqaruv konturlari: Schema Registry, DQ-kod, lineage, kataloglar, sirlar/KMS, RBAC/ABAC.
3) Kontraktlar va sxemalar
Schema-first: JSON/Avro/Protobuf; majburiy maydonlar:’event _ time (UTC)’,’event _ id’,’trace _ id’,’user _ pseudo _ id’,’market’,’schema _ version’.
Evolyutsiya: back-compatible → qoʻshish nullable; breaking → ’/v2’+ qoʻshaloq yozuv.
Katalog: domen tavsifi, egasi, yangi SLA, DQ qoidalari, lineage.
4) Oqimlarning ko’lga qo’nishi
Exactly-once pastki qismida: at-least-once nashri + idempotent sink (MERGE/upsert’event _ id’).
Dedup: strimdagi stateful + Silver’dagi o’ziga xoslik.
Fayllar kompaksiyasi: small files → oʻqish va qiymat uchun muntazam OPTIMIZE/VACUUM.
Time-travel: tuzatish, replay va auditni o’z ichiga oladi.
sql
CREATE TABLE bronze. payment_events (
event_id STRING, user_pseudo_id STRING, currency STRING,
amount DECIMAL(18,2), market STRING, event_time TIMESTAMP, payload STRING
)
PARTITIONED BY (days(event_time), market);
5) Oqimlar agregatsiyasi: derazalar va watermarks
Oynalar:- Tumbling - barqaror panellar uchun belgilangan (masalan, 1 min/5 min).
- Hopping - «silliq» metriklar uchun bir-birini qoplaydigan (qadam
- Session - harakatsizlik bo’yicha xulq-atvor uzilishlari.
- Watermarks: late data boshqaruvi (odatda 2-5 daqiqa), qo’shimcha emissiya/tuzatish qoidalari.
sql
SELECT market,
TUMBLE_START(event_time, INTERVAL '1' MINUTE) AS ts_min,
COUNT() AS deposits_1m,
SUM(amount_base) AS sum_1m
FROM silver. payments
GROUP BY market, TUMBLE(event_time, INTERVAL '1' MINUTE);
6) Agregatlarni materiallashtirish
OLAP dvigateli (ClickHouse/Pinot/Druid): dashbordlar va tezkor tahlil uchun daqiqali/soniyali agregatlarni saqlaydi.
Lakehouse Gold: hisobot va taqqoslash uchun sutkalik/soatlik kesmalarni saqlaydi.
sql
CREATE MATERIALIZED VIEW mv_ggr_1m
ENGINE = AggregatingMergeTree()
PARTITION BY toDate(event_time)
ORDER BY (toStartOfMinute(event_time), market, provider_id) AS
SELECT toStartOfMinute(event_time) AS ts_min,
market,
provider_id,
sumState(stake_base) AS s_stake,
sumState(payout_base) AS s_payout
FROM stream. game_events
GROUP BY ts_min, market, provider_id;
Gold - kunduzgi kesim (Lakehouse):
sql
CREATE OR REPLACE VIEW gold. ggr_daily AS
SELECT
DATE(event_time) AS event_date,
market, provider_id,
SUM(stake_base) AS stakes_eur,
SUM(payout_base) AS payouts_eur,
SUM(stake_base) - SUM(payout_base) AS ggr_eur
FROM silver. fact_game_financials
GROUP BY 1,2,3;
7) Silver: normallashtirish va kelishish
Vaqt va valyuta:’event _ time (UTC)’,’amount _ base’,’fx _ rate _ used’,’fx _ source’.
Kalitlar/maʼlumotnomalar:’user _ pseudo _ id’,’game _ id’,’provider _ id’,’market’.
SCD II: o’lchovlarni tarixlashtirish (users/games/providers/RG/KYC).
DQ qoidalari: kalitlarning o’ziga xosligi, ma’lumotnomalar, summa diapazonlari, temporal-validlik.
8) Agregatlar reyestri va "to’g" ri "ta’riflar
Semantic Layer: GGR/NGR yagona formulalari, stavkalar/yutuqlar, konvertatsiya, ARPPU, latency p95.
Metriklarni versionlash:’metric _ version’va’as-of’hisoblash.
Doc-kartochkalar: owner, formula, manbalar, tayyorgarlikning SLA.
9) Exactly-once/idempotentlik va tartib
Shina: at-least-once + partiyalashtirish (mahalliy tartib).
Ishlov berish:’event _ id’(TTL 24-72 soat), SER/tuzatishlar kiritilgan deraza operatorlari.
Sink: tranzaksion kommitlar yoki idempotent upsert/merge.
Outbox/Inbox: OLTP domen voqealarini kafolatlangan holda chop etish.
10) Late data va tuzatishlar
Allowed lateness: tezkor vitrinalar uchun 2-5 min; Gold uchun kunlik qayta yig’ish.
Tuzatishlar: OLAPdagi qo’shimcha emissiyalar va Gold (idempotent) ni qayta tanlash.
Bayroqlar:’late = true’,’correction _ of = <event _ id>’audit uchun.
11) Kuzatuv va DQ
SLI/SLO (taxminlar):- p95 ingest → 1-min vitrin ≤ 2-5 s; Gold daily soat 06:00 gacha tayyor.
- Completeness ≥ 99. 5%; Schema validity ≥ 99. 9%; Trace coverage ≥ 98%.
- Payplaynlar metrikasi: lag/throughput/busy time/state size, late-ratio, dup-rate.
- DQ-dashbordlar: Freshness/Completeness/Validity, yo’qotish hunisi, «issiq» kalitlar xaritasi.
- Lineage: Bronzadan Gold/eksportgacha bo’lgan yo’l; o’zgarishlarda impact-tahlil.
12) Maxfiylik, rezidentlik, xavfsizlik
PII-minimallashtirish: taxalluslashtirish, alohida himoyalangan mapping.
Residency: EEA/UK/BR - alohida kataloglar va shifrlash kalitlari; asossiz kross-mintaqaviy join’onlarni taqiqlash.
Shifrlash: TLS in-transit; KMS/CMK at-rest; regulyatorda + WORM eksport imzolari.
DSAR/RTBF/Legal Hold: selektiv tahrirlash, o’chirishni muzlatish, tinglanadigan kirish.
13) Unumdorlik va qiymat
Partiyalashtirish: sana/bozor/tenant bo’yicha; tez-tez filtrlanadigan atributlar boʻyicha/Z-order klasterlash.
Kompaksiya: small files, muntazam OPTIMIZE/VACUUM.
Materiallashtirish: daqiqa/soniya - OLAP da; sutka/soat - Gold.
Tiered storage: hot/warm/cold, SLA tiklash, chargeback (cost/GB, cost/query).
Oldindan agregatsiya/eskizlar: HyperLogLog/approx-distinct istalgan joyda.
14) Misollar (parchalar)
Flink CEP - depozitlarni tuzish (10 daqiqa):python if count_deposits(window=10MIN) >= 3 \
and sum_deposits(window=10MIN) > THRESH \
and all(d. amount < REPORTING_LIMIT for d in window_events):
emit_alert("AML_STRUCTURING", user_id, snapshot())
SQL - Silver’ga yuklanayotganda:
sql
CREATE TABLE silver. payments AS
SELECT EXCEPT(rn) FROM (
SELECT p., ROW_NUMBER() OVER (PARTITION BY event_id ORDER BY event_time) rn
FROM bronze. payment_events p
) WHERE rn = 1;
Iceberg/Delta - MERGE idempotent:
sql
MERGE INTO silver. fact_bets s
USING stage. fact_bets_delta d
ON s. bet_id = d. bet_id
WHEN MATCHED THEN UPDATE SET
WHEN NOT MATCHED THEN INSERT;
15) Jarayonlar va RACI
R (Responsible):- Data Platform (Lakehouse/katalog/ACID, kompaksiya),
- Streaming (agregatlar/CEP/dedup),
- Domain Analytics (metriklar/Gold).
- A (Accountable): Head of Data/CDO.
- C (Consulted): Compliance/Legal/DPO (PII/residency/Legal Hold), Finance (FX/GGR), SRE (SLO/стоимость), Security.
- I (Informed): BI/Mahsulot/Marketing/Operatsiyalar.
16) Joriy etish yo’l xaritasi
MVP (3-5 hafta):1. Lakehouse Bronze/Silver (ACID-jadvallar), Kafkadan ingest, registry sxemalari.
2. OLAPdagi bazaviy oqim agregatlari (1-5 daqiqa); Gold vitrinasi. ggr_daily (D + 1 dan 06:00 gacha).
3. DQ-Payments/Gameplay uchun kod sifatida, Freshness/Completeness dashbordlari.
4. Kompaksiya/OPTIMIZE, minimal cost-metriklar va alertlar lag/late/dup.
2-faza (5-10 hafta):- Silver kengaytmasi (users/games/providers uchun SCD II), lineage va impact-tahlil.
- Asinxron lookups (RG/KYC/ASN/BIN), late-tuzatishlarni boshqarish.
- Metriklarning semantik qatlami, eksport reglamenti (WORM/imzo).
- Multi-region, DR/replay-simulyator, auto-tuning va watermarks.
- Cost-dashbordlar, chargeback/kvotalar, tiered storage va arxivlash.
- Vitrinalar va metrik kartochkalar hujjatlarini avtogeneratsiya qilish.
17) Sotishdan oldingi chek-varaq
- Reyestrdagi sxemalar va kontraktlar; back-compat testlari yashil.
- Dedup, watermark/allowed lateness, DLQ kiritilgan.
- Kompaksiya/OPTIMIZE/VACUUM jadval boʻyicha oʻrnatilgan.
- SLO: p95 ingest→minute-view, Gold до 06:00; alertlar lag/late/dup/state size.
- DQ qoidalari faol; lineage Bronze dan eksportgacha koʻrinadi.
- RBAC/ABAC и KMS; rezidentlik va DSAR/RTBF/Legal Hold sinovdan o’tkazildi.
- Nazorat ostidagi qiymat (cost/GB, cost/query, cold ulushi), repleylar uchun limitlar.
18) Anti-patternlar va xavflar
Xom va hisobot maʼlumotlarini bitta jadvalda aralashtirish: reproducibility buziladi.
Kompaksiya yo’qligi: portlash small files → qimmatbaho so’rovlar.
FX ni «orqaga» hisoblash: tarix va hisobotlarni buzadi.
Watermarks/late siyosati yo’q: vitrinalar va alertlar «suzmoqda».
Full reload kerak emas: inkrementlar/MERGE va tuzatishlar ishlating.
Tahlilda PII: mappinglarni alohida saqlang, CLS/RLSni yoqing.
19) Lugʻat (qisqacha)
Lakehouse - data lake + ACID jadvallari va SQL dvigateli.
Bronze/Silver/Gold - xom/normallashtirilgan/serving qatlamlari.
Watermark - event-time boʻyicha oynalarning tayyorlik chegarasi.
Materialized View - tezkor o’qish uchun oldindan belgilangan vitrin.
Time-travel - jadvallarning tarixiy versiyalarini o’qish.
WORM - eksport artefaktlarini o’zgarmas saqlash.
20) Jami
To’g’ri oqim-agregatsiyali ma’lumotlar ko’li - bu qatlamlar va shartnomalar intizomi: Bronze «bo’lgani kabi», Silver - normallashtirish va sifat uchun, OLAP - daqiqali panellar uchun, Gold - takrorlanadigan hisobotlar uchun. Derazalar va watermarks, dedup va kompaksiya, maxfiylik va qiymatni boshqarish orqali siz mahsulot, komplayens va operatsion boshqaruv uchun tezkor, tekshiriladigan va komplayent vitrinalarni olasiz.