GH GambleHub

ETL/ELT jarayonlari

1) Vazifasi va konteksti

ETL/ELT-konveyerlar hisobot (GGR/NGR, regulyatorlar), analitiklar/ML va operativ panellar uchun ma’lumotlarni oldindan aytib bo’ladigan yuklash, transformatsiya qilish va e’lon qilishni ta’minlaydi.

ETL: DWH/Lakehouse (kamdan-kam hollarda zamonaviy steklarda).
ELT: Avval Lakehouse (Bronze/Silver) ga yuklaymiz, so’ngra SQL/dvigatellarga aylantiramiz (tavsiya etilgan).

2) Etalon arxitekturasi

1. Ingest/Edge: HTTP/gRPC/Batch, OLTP dan CDC, provayder S3/FTP tushirish.
2. Bronze (raw, append-only): o’zgarmas payload’lar, sana/bozor/tenant bo’yicha partiyalar.
3. Silver (clean/conform): normallashtirish, dedup, maʼlumotnomalar, SCD, FX/taymzonlar.
4. Gold (serve): BI/regulyator/model ostidagi denormallashtirilgan vitrinalar.
5. Orkestr: Airflow/Dagster/Prefect (DAG’i, SLA, retray, siljish).
6. DQ/Contracts: Schema Registry + DQ-как-код, consumer-driven tests.
7. Kuzatilishi: payplaynlar metrikasi, lineage, loglar, cost-dashbordlar.

3) ETL vs ELT tanlash

MezonETLELT (tavsiya etilgan)
Qayta hisoblashning moslashuvchanligipastyuqori (time-travel, reprocessing)
Qiymatio’sishda qimmatroqkattalashtirishda maqbul
Sifat nazoratiingestSilver/Gold + DQ-kod sifatida
Tarixiylik/forensikacheklanganto’liq (Bronze append-only)

Amaliyot: iGaming - ELT + CDC: tezda yuklaymiz, keyin standartlashtiramiz va hisoblaymiz.

4) Inkrementlar va CDC

Deltalarga yondashuvlar:
  • CDC (Debezium/log replikatsiyasi): OLTP → Bronze → MERGE ning Silverdagi oʻzgarishlari.
  • Watermark vaqti boʻyicha:’updated _ at> max_loaded_ts'.
  • Hash-diff: o’zgarishlarni aniqlash uchun’md5 (row)’taqqoslash.
  • Upsert/MERGE: yuklashning idempotentligi.
MERGE (Delta/Iceberg) misoli:
sql
MERGE INTO silver. payments s
USING stage. payments_delta d
ON s. transaction_id = d. transaction_id
WHEN MATCHED THEN UPDATE SET
WHEN NOT MATCHED THEN INSERT;

5) Kontraktlar va sxemalar

Schema-first: JSON/Euro/Protobuf in Registry;’schema _ version’.
Evolyutsiya: back-compatible (nullable qoʻshimchalar); breaking - ’/v2’+ qoʻshaloq yozuv.
Majburiy maydonlar:’event _ time (UTC)’,’event _ id’,’trace _ id’,’user _ pseudo _ id’,’market’.

6) DQ-kod sifatida (minimal to’plam)

yaml table: silver. payments owner: data-payments slo:
freshness_minutes: 15 completeness_percent: 99. 5 rules:
- name: unique_tx # uniqueness of transactions type: unique columns: [transaction_id]
severity: critical
- name: currency_whitelist type: in_set column: currency set: [EUR,USD,GBP,TRY,BRL]
severity: major
- name: amount_positive type: range column: amount_base min: 0. 01 severity: critical
- name: fk_user type: foreign_key column: user_pseudo_id ref_table: dim. users_scd severity: critical

7) Orkestratsiya: DAG’i, qaramlik, SLA

DAG-dizayn: manbalardan vitrinalarga; vazifalar o’rtasidagi aniq bog’liqliklar.
Retray va idempotentlik: backoff, «sof» takrorlash, checkpoint’lar.
Siljishlar (catchup): o’tkazib yuborilgan davrlarning ehtiyotkorlik bilan o’rnatilishi.
SLA: masalan, Gold. daily mahalliy vaqt bilan soat 06:00 gacha tayyor; qoidabuzarliklar to’g "risida xabardor qilish.
Parametrlash: bozorlar/tenantlar/sanalar vars orqali; yagona job namunasi.

8) Idempotentlik va exactly-once

ingest: dublikatlar mumkin → dedup po’(event_id, source)’.
Ishlov berishda: upsert/merge; transformatsiyalarning «sof» funksiyalari.
V sink: tranzaksion kommitalar yoki idempotent writes; «ikki marta hisobga olish» nazorati.
Outbox/Inbox: OLTP domen voqealarini tranzaksion nashr etish.

9) Backfill и reprocessing

Backfill: birlamchi toʻldirish/tarixiy diapazonlar.
Reprocessing: mantiq oʻzgarganda/tuzatilganda qayta hisoblash.
Guardrails: diapazon chegaralari, kvotalar, vaqt oynalari, metrlarni taqqoslash bilan dry-run.
Markalash:’logic _ version’,’reprocessed _ at’,’recalc _ reason’.

10) Silver/Gold modellashtirish

Silver (3NF/BCNF): faktlar’fact _ bets/payments/payouts’, o’lchovlar’dim _ users/games/providers/markets (SCD II)’, valyutalar/taymzonlarni standartlashtirish.
Gold: BI/regulyator/model ostidagi denormallashtirilgan vitrinalar; o’zgarmas eksport paketlari (WORM) + imzo.

Gold misoli: GGR Daily

sql
CREATE OR REPLACE VIEW gold. ggr_daily AS
SELECT
DATE(b. event_time) AS event_date,
b. market,
g. provider_id,
SUM(b. stake_base) AS stakes_eur,
SUM(p. amount_base) AS payouts_eur,
SUM(b. stake_base) - SUM(p. amount_base) AS ggr_eur
FROM silver. fact_bets b
LEFT JOIN silver. fact_payouts p
ON p. user_pseudo_id = b. user_pseudo_id
AND p. game_id = b. game_id
AND DATE(p. event_time) = DATE(b. event_time)
JOIN dim. games g ON g. game_id = b. game_id
GROUP BY 1,2,3;

11) Xususiy va rezidentlik

PII-minimallashtirish: tokenlashtirish; izolyatsiya qilingan konturda real ID mappinglari.
RLS/CLS: rollar/yurisdiksiyalar bo’yicha foydalanish siyosati, yashirish.
Residency: EEA/UK/BR uchun alohida kataloglar/kalitlar; asossiz kross-mintaqaviy join’onlarni taqiqlash.
DSAR/RTBF & Legal Hold: selektiv tahrirlash, hisobot uchun WORM arxivlari, eksport auditi.

12) Kuzatuv va SLO

SLI/SLO belgilari:
  • Freshness Silver p95 ≤ 15 min; Gold daily soat 06:00 gacha tayyor. vaqt.
  • Completeness ≥ 99. 5%, Validity (sxema) ≥ 99. 9%.
  • Job’larning muvaffaqiyati ≥ 99. 0%, MTTR hodisalar ≤ 24-48 soat

Dashbordlar: Freshness heatmap, DQ-yo’qotish hunisi, cost/query & cost/GB, lineage-graf.

13) Unumdorlik va qiymat

Partiyalashtirish: sana/bozor/tenant; klaster/Z-order filtrlar bo’yicha.
Formatlar: Parquet + ACID (Delta/Iceberg/Hudi), siqish va statistika.
Kompaksiya: small files (OPTIMIZE/VACUUM) ga qarshi kurash.
Materiallashtirish: barqaror agregatlar; ulkan on-the-fly join’olardan qochish.
Chargeback: budjetlar, replay/backfill uchun kvotalar; past yuklamali derazalarda rejalashtirish.

14) DAGning namunaviy vazifalari misollari (Airflow psevdokodi)

python with DAG("elt_payments_daily", schedule="@daily", start_date=..., catchup=True) as dag:
extract = BashOperator(task_id="extract_cdc", bash_command="run_cdc_to_bronze. sh {{ ds }}")
load  = BashOperator(task_id="load_to_silver", bash_command="sql/run_merge_silver. sql {{ ds }}")
dq   = BashOperator(task_id="dq_checks", bash_command="dq/run_checks. sh silver. payments {{ ds }}")
gold  = BashOperator(task_id="build_gold_ggr", bash_command="sql/build_gold_ggr. sql {{ ds }}")
export = BashOperator(task_id="export_regulator", bash_command="export/run_worm_pack. sh {{ ds }}")

extract >> load >> dq >> gold >> export

15) Jarayonlar va RACI

R (Responsible): Data Engineering (DAG’i, Silver/Gold modellari), Data Platform (infra, Registry, DQ).
A (Accountable): Head of Data/CDO.
C (Consulted): Compliance/Legal/DPO (PII/residency/Legal Hold), Finance (FX/GGR), Risk (RG/AML), SRE (SLO/стоимость).
I (Informed): BI/Mahsulot/Marketing/Operatsiyalar.

16) Joriy etish yo’l xaritasi

MVP (3-5 hafta):

1. Lakehouse Bronze/Silver (ACID) + CDC/Payments/Gameplay uchun inkrementlar.

2. DQ-kod sifatida (10-15 qoidalar) va Freshness/Completeness bazaviy dashbordlari.

3. Birinchi Gold-vitrin (GGR Daily) SLA «06:00 gacha», WORM-eksport imzosi bilan.

4. DAG orkestratsiyasi va SLA/DQ alertasi.

2-faza (5-10 hafta):
  • Domenlarni kengaytirish, users/games/providers uchun SCD II.
  • Metriklarning semantik qatlami; lineage/impact-tahlil; backfill/reprocessing tartib-taomillari.
  • Hududlashtirish (EEA/UK), RLS/CLS, qiymatni nazorat qilish (kvotalar/chargeback).
3-faza (10-16 hafta):
  • Repley-simulyator (what-if), vitrin/metrik hujjatlarining avtogeneratsiyasi.
  • Cost-optimallashtirish (klaster, materiallashtirish, TTL, kompaksiya).
  • DR-mashqlar va time-travel tiklash.

17) Sotishdan oldingi chek-varaq

  • Registridagi kontraktlar/sxemalar, muvofiqlik testlari yashil rangda.
  • CDC/inkrementlar va MERGE idempotentdir; dedup ingest.
  • DQ qoidalari faol (critical → fail + DLQ), SLA dashbordlari sozlangan.
  • Gold-vitrinalar hujjatlashtirilgan, semantik qatlamdagi metrik formulalar.
  • RBAC/ABAC, shifrlash, rezidentlik, DSAR/RTBF/Legal Hold tekshirildi.
  • Jadval bo’yicha kompaksiya/OPTIMIZE/VACUUM; backfill/replay limitlari.
  • Runbook’va hodisalar va reprocessing, eksport auditi (WORM + hash).

18) Anti-patternlar va xavflar

Full reload: CDC/inkrementlardan foydalaning.
Xom va hisobot ma’lumotlarini aralashtirish: Bronze/Silver/Gold ni alohida saqlang.
DQ va lineage yo’qligi: isbotlanishi va takrorlanishi yo’q.
PII analitik qatlamlarda: mappinglarni izolyatsiya qiling, CLS/RLS qo’llang.
Monolit «tungi» joblar: partiyalar bo’yicha parallel bo’ling.
Qiymat ignori: small files-ni kuzatib boring, agregatlarni materiallashtiring, kvotalarni kiriting.

19) Lugʻat (qisqacha)

ETL/ELT - chiqarish/transformatsiya/yuklash (yuklashdan oldin/keyin).
CDC - oʻzgarishlarni ushlash.
SCD - o’lchovlarni tarixlashtirish (I/II/III).
WORM - hisobot paketlarining o’zgarmas saqlanishi.
Time-travel - jadvallarning tarixiy versiyalarini o’qish.

20) Jami

Zamonaviy ETL/ELT - bu skriptlar emas, balki boshqariladigan platforma: kontraktlar va DQ, idempotent inkrementlar/CDC, Bronze/Silver/Gold qatlamlari intizomi, kuzatuv va SLO, maxfiylik va tejamkorlik. Ushbu yoʻl-yoʻriqqa amal qilib, siz hisobotni barqaror taʼminlaydigan, mahsulot va modellarni kutilmagan hodisalarsiz takrorlanadigan va tinglanadigan konveyerlarga ega boʻlasiz.

Contact

Biz bilan bog‘laning

Har qanday savol yoki yordam bo‘yicha bizga murojaat qiling.Doimo yordam berishga tayyormiz.

Telegram
@Gamble_GC
Integratsiyani boshlash

Email — majburiy. Telegram yoki WhatsApp — ixtiyoriy.

Ismingiz ixtiyoriy
Email ixtiyoriy
Mavzu ixtiyoriy
Xabar ixtiyoriy
Telegram ixtiyoriy
@
Agar Telegram qoldirilgan bo‘lsa — javob Email bilan birga o‘sha yerga ham yuboriladi.
WhatsApp ixtiyoriy
Format: mamlakat kodi va raqam (masalan, +998XXXXXXXX).

Yuborish orqali ma'lumotlaringiz qayta ishlanishiga rozilik bildirasiz.