Maʼlumotlarni qayta ishlash
1) Maqsadi va qiymati
Batch-konveyerlar quyidagilar uchun ishonchli kundalik/soatbay vitrinalarni shakllantiradi:- Tartibga solish va moliyaviy hisobotlar (GGR/NGR, soliqlar, RG/AML reyestrlari).
- BI va oziq-ovqat analitikasi (kogortlar, LTV, konversion hunilar).
- Aniqlikni solishtirish (OLTP, DWH, provayderlar/PSP), tarixlashtirish (SCD).
- ML uchun fichlar va o’quv to’plamlarini tayyorlash.
Asosiy xossalari: bashorat qilish, to’liqlik, takrorlanuvchanlik, ma’lumotlar birligiga past qiymat.
2) Arxitektura (referens)
1. Ingest (raw capture): HTTP/gRPC, OLTP dan CDC, provayder yuklamalari → Bronze.
2. Lakehouse: Bronze (raw, append-only) → Silver (clean/conform) → Gold (serve).
3. Orkestr: Airflow/Dagster/Prefect (DAG’i, qaramlik, retray, SLA).
4. Ishlov berish: Spark/Trino/DBT/SQL-dvigatellar; partizatsiya va ACID formatlari (Delta/Iceberg/Hudi).
5. DQ va Kontraktlar: Schema Registry, DQ-qoidalar (YAML/SQL), consumer-tests.
6. Serving: BI/semantik qatlam, hisobot eksportlari (CSV/PDF/JSON + hash), API/GraphQL.
7. Kuzatilganlik: payplaynlar metrikasi, lineage, loglar, qiymati (cost/GB, cost/query).
3) Chastotalar va SLAs
Kundalik (D + 1 dan 06:00 gacha) : GGR hisobotlari, tartibga soluvchi yuklamalar, solishtirmalar.
Soatbay/kvazirealtaym: Ops/Moliya uchun operativ panellar.
Haftalik/oylik: moliyaviy konsolidatsiya, modellar va retro-protsesslar.
- Gold-kundalik vitrinalar mahalliy vaqt bilan soat 06:00 gacha tayyor.
- Freshness Silver p95 ≤ 15 min mikrobatchey uchun/ ≤ 2 soat kunduzgi uchun.
- Completeness ≥ 99. 5%, Validity (sxema) ≥ 99. 9%.
4) Inkremental yuklamalar va CDC
Yondashuvlar:- CDC (Change Data Capture): Debezium/log replikatsiyasi → Bronze → Silver.
- Watermark vaqti boʻyicha:’updated _ at> max_loaded_ts'.
- Xesh taqqoslash:’md5 (row)’oʻzgarishlar detektori uchun.
- Upsert/Merge: Silver/Gold.
sql
MERGE INTO silver. payments AS s
USING staging. payments_delta AS d
ON s. transaction_id = d. transaction_id
WHEN MATCHED THEN UPDATE SET
WHEN NOT MATCHED THEN INSERT;
5) SCD (o’lchovlarni tarixlashtirish)
SCD I: qayta yozish (imlo, minor tuzatishlar).
SCD II: Toʻliq funksional tarix (’valid _ from/valid _ to/is _ current’).
SCD III: qisqacha taqqoslash uchun «oldin/keyin».
sql
MERGE INTO dim. users_scd t
USING stage. users u
ON t. user_pseudo_id = u. user_pseudo_id AND t. is_current = TRUE
WHEN MATCHED AND (t. country <> u. country OR t. rg_status <> u. rg_status)
THEN UPDATE SET t. is_current = FALSE, t. valid_to = CURRENT_TIMESTAMP
WHEN NOT MATCHED
THEN INSERT (user_pseudo_id, country, rg_status, valid_from, valid_to, is_current)
VALUES (u. user_pseudo_id, u. country, u. rg_status, CURRENT_TIMESTAMP, NULL, TRUE);
6) Backfill и Reprocessing
Backfill: birlamchi toʻldirish/qoʻshimcha yuklash.
Reprocessing: mantiq/maʼlumotlar tuzatilgandan keyin vitrinalarni qayta hisoblash.
- Idempotentlik (MERGE/upsert), Bronze o’zgarmasligi, mantiqni versiyalash.
- takroriy progonlar uchun Time-travel; meta ma’lumotlar snapshotlari.
- Guardrails: diapazonlar, kvotalar va raqobatbardosh joblarni cheklash.
- Hujjatlar: runbook bosqichlari va tugash mezonlari.
7) Qatlamlarni modellashtirish
Bronze:- Append-only,’event _ date’,’jurisdiction’,’tenant’partiyalari.
- Boshlang’ich payload’ni saqlaymiz (forensika uchun),’ingested _ at’ni o’rnatamiz.
- Normallashtirish va standartlashtirish: FK/ma’lumotnomalar, dedup, FX/taymzonlar.
- Faktlar/o’lchovlar jadvallari (3NF/BCNF), asosiy o’lchovlar uchun SCD.
- BI/regulyator/moliya ostidagi denormallashtirilgan vitrinalar, tayyorgarlikning SLA.
- Agregatlarni materiallashtirish; o’zgarmas eksport artefaktlari (hash + WORM).
8) Ma’lumotlar sifati (DQ-kod sifatida)
Silver uchun YAML qoidalari namunasi:yaml table: silver. payments slo:
freshness_minutes: 15 completeness_percent: 99. 5 rules:
- name: amount_positive type: range column: amount_base min: 0. 01 severity: critical
- name: currency_whitelist type: in_set column: currency set: [EUR,USD,GBP,TRY,BRL]
severity: major
- name: unique_tx type: unique columns: [transaction_id]
severity: critical
- name: fk_user type: foreign_key column: user_pseudo_id ref_table: dim. users_scd severity: critical
Reaksiya siyosati: critical → fail job + DLQ; major/minor → tag + hisobot.
9) Semantik qatlam va hisobot
Semantic-layer/metrics-store da metriklarning yagona aniqliklari (GGR/NGR, ARPPU, Retention).
Metriklarni versionlash; BI/eksport paketlari bilan integratsiya qilish.
Hisobotlar: CSV/JSON/PDF + sha256, tushirish jurnali va zarur hollarda Legal Hold.
10) Maxfiylik, rezidentlik, xavfsizlik
PII-minimallashtirish: foydalanuvchilarni taxalluslashtirish; mapping - alohida himoyalangan konturda.
Data residency: EEA/UK/BR bo’yicha alohida kataloglar/kalitlar; huquqiy asossiz kross-mintaqaviy join’onlarni taqiqlash.
Shifrlash: TLS in-transit; KMS/CMK at-rest; eksportni nazorat qilish.
DSAR/RTBF: hisoblab chiqiladigan proyeksiyalar, selektiv tahrirlash; kirish auditi.
Legal Hold: tartibga soluvchi artefaktlar uchun WORM arxivlari.
11) Unumdorlik va qiymat
Sana/bozor/tenant bo’yicha partiyalashtirish; Tez-tez uchraydigan predikatlar boʻyicha Z-order/cluster.
Formatlar: Parquet + ACID jadvallari; kompressiya/statistika, OPTIMIZE/VACUUM.
Materiallashtirish: Goldda barqaror agregatsiyalar; «monolit» joblardan qochish.
Kvotalar/budjetlar: jamoalar bo’yicha chargeback; backfill/og’ir so’rovlar uchun limitlar.
Rejalashtirish: past yuklamali derazalar (kecha/dam olish kunlari), navbatlarning ustuvor yo’nalishlari.
12) Kuzatuv va boshqaruv
Payplaynlar metrikasi: duration, success rate, retries, rows processed, cost/query.
DQ-metriklar: completeness, validity, uniqueness, FK-xatolar, drift.
Freshness heatmap: domenlar va bozorlar bo’yicha; SLA dashbordlari.
Lineage: Bronzadan hisobotlargacha; oʻzgarishlardan oldin impact-tahlil.
Alertlar: SLO-byudjetlar, DQ tanazzullari, kechikishlar, qiymatning oshishi.
13) SQL/modellar namunalari
Valyutalarni normallashtirish (Silver):sql
CREATE OR REPLACE TABLE silver. payments AS
SELECT p. transaction_id,
p. user_pseudo_id,
p. currency,
p. amount_orig,
r. rate AS fx_rate_used,
p. amount_orig r. rate AS amount_base,
p. market,
CAST(p. event_time AS TIMESTAMP) AS event_time
FROM bronze. payment_events p
JOIN dim. fx_rates r
ON r. date = DATE(p. event_time)
AND r. ccy_from = p. currency AND r. ccy_to = 'EUR';
GGR (Gold) ning kundalik vitrini:
sql
CREATE OR REPLACE VIEW gold. ggr_daily AS
SELECT
DATE(b. event_time) AS event_date,
b. market,
g. provider_id,
SUM(b. stake_base) AS stakes_eur,
SUM(p. amount_base) AS payouts_eur,
SUM(b. stake_base) - SUM(p. amount_base) AS ggr_eur
FROM silver. fact_bets b
LEFT JOIN silver. fact_payouts p
ON p. user_pseudo_id = b. user_pseudo_id
AND p. game_id = b. game_id
AND DATE(p. event_time) = DATE(b. event_time)
JOIN dim. games g ON g. game_id = b. game_id
GROUP BY 1,2,3;
To’liqlikni nazorat qilish (DQ SQL):
sql
SELECT market, event_date, COUNT() AS n
FROM silver. fact_bets
GROUP BY market, DATE(event_time) AS event_date
HAVING n = 0;
14) Jarayonlar va RACI
R (Responsible): Data Engineering (DAG’i, Silver/Gold modellari), Data Platform (infra, sxemalar registri, DQ).
A (Accountable): Head of Data / Chief Data Officer.
C (Consulted): Compliance/Legal/DPO (PII/retention), Finance (FX/GGR), Risk (RG/AML), SRE (SLO/стоимость).
I (Informed): BI/Mahsulot/Marketing/Operatsiyalar.
15) Joriy etish yo’l xaritasi
MVP (4-6 hafta):1. Lakehouse Bronze/Silver (ACID formati), CDC/2-3 domen uchun inkrementlar.
2. DQ-kod sifatida: Payments/Gameplay + CI-validatsiya uchun 10-15 qoidalar.
3. Birinchi Gold-vitrin (GGR Daily) SLA dan 06:00 gacha; hisobot eksporti + hash.
4. Dashbordlar Freshness/Completeness/Cost, bazaviy alertlar.
2-faza (6-12 hafta):- SCD II для users/games/providers; domenlarni kengaytirish.
- Metriklarning semantik qatlami; OLTP/provayderlar (accuracy) bilan taqqoslash.
- backfill/reprocessing, lineage va impact-tahlil, hududlashtirish (EEA/UK) tartib-taomillari.
- Oʻzgarishlarni avtosimulyatsiya qilish (dry-run), byudjetlar/kvotalar, chargeback.
- Avtomatik hujjatlar (data product pages), DR-mashqlar va time-travel-tiklash.
- Qiymatni optimallashtirish (klaster, materiallashtirish, TTL, vakuum).
16) Sotishdan oldingi chek-varaq
- Registridagi kontraktlar va sxemalar, muvofiqlik testlari yashil rangda.
- Inkremental yuklash/CDC ishlamoqda, MERGE idempotentdir.
- DQ qoidalari faol; critical → fail + DLQ; qoidabuzarliklar to’g "risidagi hisobot.
- SLA/yangi/to’liq dashbordlar; Alertlar sozlangan.
- PII/DSAR/RTBF/Legal Hold siyosati Legal/DPO tomonidan tasdiqlangan.
- Runbook’va backfill/reprocessing/DR sinovdan o’tkazildi.
- Nazorat ostidagi narx (cost/query, cost/GB, kvoutlar).
17) Anti-patternlar va qanday qilib
Monolit tungi joblar: mustaqil qadamlarga bo’ling, partiyalarga parallel bo’ling.
Full-reload kerak emas :/CDC/merji inkrementlaridan foydalaning.
Tahlilda PII aralashmasi: mappinglarni alohida saqlang, CLS/RLS qo’llang.
DQ/lineage yo’qligi: DQ kodini kiriting va kelib chiqishini kuzating.
«Qo’lda ishlatiladigan» backfill’lar: avtomatlashtirish va hujjatlashtirish, diapazonlarni cheklash.
Boshqarilmaydigan qiymat: klasterlashtirish, materiallashtirish, retenshn-siyosat.
18) Lugʻat (qisqacha)
CDC - OLTP’dan oʻzgarishlarni olib tashlash.
SCD - sekin o’zgaruvchan o’lchovlar (I/II/III).
Lakehouse - data lake + ACID jadvallari.
MERGE/Upsert - indempotent yangilash operatsiyalari.
Time-travel - jadvallarning tarixiy versiyalarini o’qish.
WORM - artefaktlarni o’zgarmas saqlash.
19) Jami
Paketda qayta ishlash - bu bashorat qilinadigan, takrorlanadigan va komplayent konveyerlarning intizomi. Schema-first, inkrementlar/CDC, SCD-tarixlashtirish, DQ-kod, kuzatuv va ongli iqtisodiyot printsiplariga amal qilib, siz istalgan vaqtda tekshiriladigan va auditga tayyor bo’lgan barqaror Gold-vitrinalar va hisobotlarni olasiz.