ETL/ELT amallary
1) Maksady we mazmuny
ETL/ELT-konweýerler hasabat (GGR/NGR, düzgünleşdirijiler), analitikler/ML we iş panelleri üçin maglumatlaryň öňünden ýüklenmegini, üýtgedilmegini we çap edilmegini üpjün edýär.
ETL: DWH/Lakehouse ýüklemezden ozal üýtgedýäris (seýrek häzirki zaman steklerinde).
ELT: Ilki bilen Lakehouse (Bronze/Kümüş) ýükleýäris, soňra SQL/hereketlendirijiler bilen üýtgedýäris (maslahat berilýär).
2) Salgylanma arhitekturasy
1. Ingest/Edge: HTTP/gRPC/Batch, ALTP-den CDC, üpjün ediji S3/FTP düşürmek.
2. Bronze (raw, append-only): üýtgemeýän payload 'lar, partiýa/bazar/tenant.
3. Silver (clean/conform): kadalaşma, deadup, gollanmalar, SCD, FX/wagt zolaklary.
4. Gold (serve): BI/sazlaýjy/modelleriň aşagyndaky denormallaşdyrylan penjireler.
5. Orkestr: Airflow/Dagster/Prefect (DAG 'i, SLA, retralar, üýtgeşmeler).
6. DQ/Contracts: Schema Registry + DQ-как-код, consumer-driven tests.
7. Syn edilişi: paýlaýynlaryň metrikleri, lineage, logi, cost-daşbordlar.
3) ETL vs ELT saýlamak
Tejribe: iGaming - ELT + CDC-de: çalt ýükleýäris, soňra standartlaşdyrýarys we hasaplaýarys.
4) Inkrementler we CDC
Deltalara çemeleşmeler:- CDC (Debezium/log-replikasiýa): ALTP → Bronze → MERGE-de Silver.
- Watermark wagty: 'updated _ at> max_loaded_ts'.
- Hash-diff: 'md5 (row)' deňeşdirmesi.
- Upsert/MERGE: downloads.
sql
MERGE INTO silver. payments s
USING stage. payments_delta d
ON s. transaction_id = d. transaction_id
WHEN MATCHED THEN UPDATE SET
WHEN NOT MATCHED THEN INSERT;
5) Şertnamalar we shemalar
Schema-first: JSON/Euro/Protobuf in Registry; 'schema _ version' wakalarda/faýllarda.
Ewolýusiýa: back-compatible (nullable); breaking - '/v2 '+ goşa ýazgy.
Hökmany meýdanlar: 'event _ time (UTC)', 'event _ id', 'trace _ id', 'user _ pseudo _ id', 'market'.
6) DQ-ýaly-kod (iň az toplum)
yaml table: silver. payments owner: data-payments slo:
freshness_minutes: 15 completeness_percent: 99. 5 rules:
- name: unique_tx # uniqueness of transactions type: unique columns: [transaction_id]
severity: critical
- name: currency_whitelist type: in_set column: currency set: [EUR,USD,GBP,TRY,BRL]
severity: major
- name: amount_positive type: range column: amount_base min: 0. 01 severity: critical
- name: fk_user type: foreign_key column: user_pseudo_id ref_table: dim. users_scd severity: critical
7) Orkestrasiýa: DAG 'i, endikler, SLA
DAG-dizaýn: çeşmelerden penjirelere; wezipeleriň arasyndaky aç-açan garaşlylyk.
Retraýlar we idempotentlik: backoff, "arassa" gaýtalanmalar, checkpoint's.
Süýşmeler (catchup): sypdyrylan döwürleriň takyk dogony.
SLA: Mysal üçin, Gold. daily ýerli wagt bilen 06: 00-a çenli taýýar; düzgün bozmalar barada habar bermek.
Parametrlemek: bazarlar/tenantlar/seneler vars arkaly; job's şablony.
8) Idempotentlik we exactly-once
ingest: dublikatlar mümkin → dedup po '(event_id, source)'.
Iş ýüzünde: upsert/merge; "arassa" özgeriş funksiýalary.
V sink: geleşik komissiýalary ýa-da idempotent writes; "goşa hasaba alynmaga" gözegçilik.
Outbox/Inbox: ALTP-den domen wakalarynyň geleşik çap edilmegi.
9) Backfill и reprocessing
Backfill: başlangyç doldurma/taryhy aralyklar.
Reprocessing: logika/düzedişler üýtgäninde gaýtadan hasaplamak.
Guardrails: aralyk çäkleri, kwotalar, wagt penjireleri, metrleri deňeşdirmek bilen dry-run.
Bellik: 'logic _ version', 'reprocessed _ at', 'recalc _ reason'.
10) Silver/Gold modellemek
Silver (3NF/BCNF): faktlar 'fact _ bets/payments/payouts', ölçegler 'dim _ users/games/providers/markets (SCD II)', walýuta/wagt zolagynyň standartlaşdyrylmagy.
Gold: BI/sazlaýjy/modeliň aşagyndaky denormallaşdyrylan penjireler; üýtgewsiz eksport paketleri (WORM) + gol.
Gold mysaly: GGR Daily
sql
CREATE OR REPLACE VIEW gold. ggr_daily AS
SELECT
DATE(b. event_time) AS event_date,
b. market,
g. provider_id,
SUM(b. stake_base) AS stakes_eur,
SUM(p. amount_base) AS payouts_eur,
SUM(b. stake_base) - SUM(p. amount_base) AS ggr_eur
FROM silver. fact_bets b
LEFT JOIN silver. fact_payouts p
ON p. user_pseudo_id = b. user_pseudo_id
AND p. game_id = b. game_id
AND DATE(p. event_time) = DATE(b. event_time)
JOIN dim. games g ON g. game_id = b. game_id
GROUP BY 1,2,3;
11) Gizlinlik we rezidentlik
PII-minimalizasiýa: tokenizasiýa; izolirlenen konturda hakyky ID mappingleri.
RLS/CLS: rollar/ýurisdiksiýalar boýunça giriş syýasaty, gizlemek.
Residency: EEA/UK/BR üçin aýratyn kataloglar/açarlar; sebitleýin join 'olaryň esassyz gadagan edilmegi.
DSAR/RTBF & Legal Hold: saýlama redaktirlemeler, hasabat üçin WORM arhiwleri, eksport barlagy.
12) Synlamak we SLO
SLI/SLO görkezmeleri:- Freshness Silver p95 ≤ 15 min; Gold daily 06: 00-a çenli taýýar. wagt.
- Completeness ≥ 99. 5%, Hakykylyk (shema) ≥ 99. 9%.
- Job 'olaryň üstünligi ≥ 99. 0%, MTTR hadysalary ≤ 24-48 sagat.
Daşbordlar: Freshness heatmap, DQ-ýitgiler hunisi, cost/query & cost/GB, lineage-graf.
13) Öndürijiligi we bahasy
Partiýa ýerleşişi: senesi/bazary/tenant; süzgüçler boýunça/Z-order.
Formatlar: Parquet + ACID (Delta/Iceberg/Hudi), gysyş we statistika.
Kompaksiýa: small files (OPTIMIZE/VACUUM) bilen göreş.
Materializasiýa: durnukly agregatlar; ullakan on-the-fly join 'olardan gaça durmak.
Chargeback: býudjetler, repleýler üçin kwotalar/backfill; pes ýükli penjirelerde meýilleşdirmek.
14) DAG-nyň adaty meseleleriniň mysallary ("Airflow" psevdokody)
python with DAG("elt_payments_daily", schedule="@daily", start_date=..., catchup=True) as dag:
extract = BashOperator(task_id="extract_cdc", bash_command="run_cdc_to_bronze. sh {{ ds }}")
load = BashOperator(task_id="load_to_silver", bash_command="sql/run_merge_silver. sql {{ ds }}")
dq = BashOperator(task_id="dq_checks", bash_command="dq/run_checks. sh silver. payments {{ ds }}")
gold = BashOperator(task_id="build_gold_ggr", bash_command="sql/build_gold_ggr. sql {{ ds }}")
export = BashOperator(task_id="export_regulator", bash_command="export/run_worm_pack. sh {{ ds }}")
extract >> load >> dq >> gold >> export
15) Amallar we RACI
R (Responsible): Data Engineering (DAG 'i, Silver/Gold modelleri), Data Platform (infra, Registry, DQ).
A (Accountable): Head of Data/CDO.
C (Consulted): Compliance/Legal/DPO (PII/residency/Legal Hold), Finance (FX/GGR), Risk (RG/AML), SRE (SLO/стоимость).
I (Informed): BI/Önüm/Marketing/Amallar.
16) Durmuşa geçirmegiň ýol kartasy
MVP (3-5 hepde):1. Lakehouse Bronze/Silver (ACID) + CDC/Payments/Gameplay üçin inkrementler.
2. DQ-kod (10-15 düzgün) we Freshness/Completeness esasy daşbordlary.
3. SLA-dan ilkinji Gold-vitrin (GGR Daily) "06: 00-a çenli", WORM-eksport gol bilen.
4. DAG orkestri we alertler SLA/DQ.
2-nji faza (5-10 hepde):- Domen giňeltmek, users/games/providers üçin SCD II.
- Metrikleriň semantik gatlagy; lineage/impact-analiz; backfill/reprocessing amallary.
- Sebitleşdirmek (EEA/UK), RLS/CLS, çykdajy gözegçiligi (kwotalar/chargeback).
- Repli-simulýator (what-if), vitrin/metrik resminamalarynyň awtogenerasiýasy.
- Cost-optimizasiýa (toplama, materializasiýa, TTL, kompaksiýa).
- DR-maşklar we time-travel dikeldiş.
17) Azyk önüminden öň çek-sanawy
- Registriýadaky şertnamalar/shemalar, laýyklyk synaglary ýaşyl.
- CDC/inkrementler we MERGE idempotent; dedup ingest.
- DQ düzgünleri işjeň (critical → fail + DLQ), SLA dashbordlary sazlandy.
- Altyn penjireler dokumentleşdirildi, semantik gatlakdaky metrik formulalar.
- RBAC/ABAC, şifrlemek, rezidentlik, DSAR/RTBF/Legal Hold barlandy.
- Programma boýunça kompaksiýa/OPTIMIZE/VACUUM; backfill/replay çäkleri.
- Runbook 'we hadysalar we reprocessing, eksport barlagy (WORM + hash).
18) Anti-patternler we töwekgelçilikler
Doly reload "mümkin boldugyça": CDC/inkrementleri ulanyň.
Çig we hasabat maglumatlaryny garyşdyrmak: Bronze/Kümüş/Gold-y aýratyn saklaň.
DQ we lineage ýoklugy: subut etmek we köpeltmek ýok.
Analitik gatlaklarda PII: mappingleri izolirläň, CLS/RLS ulanyň.
Monolit "gijeki" joblar: bölüň, partiýa boýunça paralel.
Çykdajy nyşany: small files-i yzarlaň, agregatlary materiallaşdyryň, kwotalary giriziň.
19) Sözlük (gysgaça)
ETL/ELT - çykarmak/üýtgetmek/ýüklemek (ýüklemezden öň/soň).
CDC - üýtgeşmeleri ele almak.
SCD - ölçegleriň taryhy (I/II/III).
WORM - hasabat paketlerini üýtgewsiz saklamak.
Time-travel - tablisalaryň taryhy wersiýalaryny okamak.
20) Jemleýji
Häzirki zaman ETL/ELT skriptlar däl-de, dolandyrylýan platforma: şertnamalar we DQ, idempotent inkrementler/CDC, Bronze/Kümüş/Altyn gatlaklarynyň düzgüni, syn edilişlik we SLO, gizlinlik we tygşytlylyk. Bu gollanmany ýerine ýetirip, siz hasabatlylygy durnukly üpjün edýän, önümi we modelleri garaşylmadyk derejede köpeldilýän we diňlenilýän konweýerleri alarsyňyz.