AI-payplaynlar va o’qitishni avtomatlashtirish
1) Vazifasi va prinsiplari
Maqsad: minimal vaqt-to-value va xavf/qiymat nazorati bilan ishonchli va takrorlanuvchan ma’lumotlarni o’zgartirish → fichlar → modellar → echimlar → fikr-mulohazalar.
Prinsiplar:- Pipeline-as-Code: hamma (DAG, konfiglar, testlar, siyosatchilar) - Git, PR va revyu orqali.
- Determinism: maʼlumot/kod/konteyner/bogʻliqlikning oʻrnatilgan versiyasi.
- Separation of Concerns: DataOps, FeatureOps, TrainOps, DeployOps, MonitorOps.
- Guarded Automation: Biz avtomatlashtiramiz, lekin sifat, xavfsizlik va komplayens bilan.
- Privacy by Design: PIIni minimallashtirish, rezidentlik, audit.
2) Konveyer qatlamlari va arxitekturasi
1. Ingest & Bronze: voqealarni ishonchli qabul qilish (CDC, shinalar, retraylar, DLQ).
2. Silver (normallashtirish/boyitish): SCD, valyuta/vaqt, tozalash, dedup.
3. Gold (vitrinalar): o’qitish/hisobot berish uchun fan jadvallari va datasetlar.
4. Feature Store: yagona formulalar fich online/offline, versiyalar va SLO.
5. Train & Validate: tanlov tayyorlash, o’qitish, kalibrlash, baholash/chek-geytlar.
6. Registry & Promotion: modellar reyestri, sifat kartochkalari, reklama siyosati.
7. Serving: REST/gRPC/Batch, fich-keshlar, fich-bayroqlar, canary/shadow.
8. Monitor & Feedback: SLI/SLO, dreyf/kalibrlash, onlayn yorliqlar, auto-retrain.
3) Orkestratsiya: DAG patternlari
Daily CT (D + 1): ma’lumotlarning tungi sikli → fichi → o’qitish → validatsiya → reyestrga nomzod.
Event-Driven Retrain: PSI/ECE/expected-cost dreyfi bo’yicha yoki sxemalar relizi bo’yicha trigger.
Rolling Windows: weekly/monthly ma’lumotlarni «sirpanchiq oyna» bilan qayta o’rganish.
Blue/Green Artifacts: barcha artefaktlar immutabeldir (hash), parallel versiyalar.
Dual-write v1/v2: sxemalar/fich qoʻshaloq yozuv va ekvivalentlikni taqqoslash orqali koʻchiriladi.
python with DAG("ct_daily", schedule="@daily", start_date=..., catchup=False) as dag:
bronze = BashOperator(task_id="ingest_cdc", bash_command="ingest.sh")
silver = BashOperator(task_id="silver_norm", bash_command="dbt run --models silver")
gold = BashOperator(task_id="gold_marts", bash_command="dbt run --models gold")
feats = BashOperator(task_id="feature_store_publish", bash_command="features publish")
ds = BashOperator(task_id="build_dataset", bash_command="dataset build --asof {{ ds }}")
train = BashOperator(task_id="train", bash_command="trainer run --config conf.yaml")
eval = BashOperator(task_id="evaluate", bash_command="eval run --gate conf/gates.yaml")
reg = BashOperator(task_id="register", bash_command="registry add --stage Staging")
bronze >> silver >> gold >> feats >> ds >> train >> eval >> reg
4) Ma’lumotlar to’plami va tanlov
Point-in-time join va fich/leybllar uchun «kelajaksiz».
Bozor/tenant/vaqt bo’yicha stratifikatsiya, holdout va oqish uchun «gap».
Version:’data _ version’,’logic _ version’,’asof _ date’; WORM snapshotlar.
5) Feature Store va ekvivalentlik online/offline
Yagona spetsifikatsiya fich (nomi, formulasi, egasi, SLO, testlar).
Onlayn = oflayn: transformatsiyalarning umumiy kodi; ekvivalentlik testi (MAE/MAPE).
TTL va kesh: oynalar 10m/1h/1d; taymautlar/retryalar; "last_known_good".
yaml name: bets_sum_7d owner: ml-risk offline: {source: silver.fact_bets, window: "[-7d,0)"}
online: {compute: "streaming_window: 7d", ttl: "10m"}
tests:
- compare_online_offline_max_abs_diff: 0.5 slo: {latency_ms_p95: 20, availability: 0.999}
6) Ta’limni avtomatlashtirish (CT) va sifat geytalari
CT-sikl: tayyorgarlik → o’qitish → kalibrlash → baholash → nomzodni ro’yxatga olish.
Geytlar (misol):- Off-line: PR-AUC ≥ benchmark − δ; ECE ≤ 0. 05; expected-cost ≤ limit.
- Slice/Fairness: har qanday slaysda metriklarning tushishi ≤ Y%; disparate impact normal holatda.
- Fich ekvivalentligi: taxminan.
- Qiymati: ≤ budjetining vaqti/resurslari.
yaml gates:
pr_auc_min: 0.42 ece_max: 0.05 expected_cost_delta_max: 0.0 slice_drop_max_pct: 10 features_equivalence_p95_abs_diff_max: 0.5
7) Modellar va promoushenlar registri
Model kartochkasi: ma’lumotlar, derazalar, fichlar, metriklar off/online, kalibrlash, tavakkalchilik, egasi.
Stages: `Staging → Production → Archived`; faqat tekshirilgan geytlar orqali targ’ib qilinadi.
Qaytarish siyosati: so’nggi prod-versiyalar N ≥ saqlansin; one-click rollback.
8) CI/CD/CT: qanday bogʻlash kerak
CI (kod/testlar): unit/integratsiya/kontrakt testlari, linterlar, security-skanlar.
CD (serving): Docker/K8s/Helm/ficha bayroqlari, canary/shadow/blue-green.
CT (ma’lumotlar/o’qitish): jadval/voqealar bo’yicha orkestrator; artefaktlar → reyestr.
Promotion Gates: onlayn-SLOda avto-reliz (canary ≥ X soatda).
9) Ko’p ijara va rezidentlik
Tenantlar/hududlar: izolyatsiyalangan payplaynlar va shifrlash kalitlari (EEA/UK/BR); mintaqalararo join’onlarni asossiz taqiqlash.
Sirlar: KMS/CMK, Secret Manager; loglarda tokenlashtirilgan ID.
DSAR/RTBF siyosati: fich va loglarda hisoblab chiqiladigan proyeksiyalar va selektiv tahrirlash; Sex uchun Legal Hold.
10) Monitoring → fikr-mulohazalar → retrain
SLI/SLO: latency p95/p99, 5xx, coverage, cost/request; PSI/KL, ECE, expected-cost dreyfi.
Onlayn yorliqlar: proksi (soat/kun) va ushlanganlar (D + 7/D + 30/D + 90).
Avtomatik harakatlar: recalibration/threshold update → shadow retrain → canary → promotion.
Runbooks: degradatsiya stsenariylari (dreyf, kalibrlash, fich-kesh, provayderlar).
11) Xavfsizlik, RG/AML va yechimlar siyosati
Guardrails: pre/post-filter, caps chastota, cooldown, taqiqlar roʻyxati.
Policy Shielding: model → yechim → siyosat filtri → harakat.
Audit:’model _ id/version’,’feature _ version’,’threshold’,’policy _ id’, sabablari.
WORM arxivi: relizlar, sifat hisobotlari, test/reklama jurnallari.
12) Qiymati va unumdorligi
Yo’lni profillash: chi (30-60%), inferens (20-40%), IO/tarmoq.
Cost-dashbordlar: cost/request, cost/feature, GPU/CPU-soatlar, small-files.
Optimallashtirish: og’ir fich oflayn, issiq derazalar keshi, INT8/FP16, replay/backfill kvotalarini materiallashtirish.
Chargeback: Biz byudjetni jamoalar/bozorlar bo’yicha taqsimlaymiz, «qimmat» chichlarni nazorat qilamiz.
13) Misollar (parchalar)
Argo Workflow:yaml apiVersion: argoproj.io/v1alpha1 kind: Workflow metadata: {name: ct-daily}
spec:
entrypoint: pipeline templates:
- name: pipeline dag:
tasks:
- name: gold template: task arguments: {parameters: [{name: cmd, value: "dbt run --models gold"}]}
- name: features dependencies: [gold]
template: task arguments: {parameters: [{name: cmd, value: "features publish"}]}
- name: train dependencies: [features]
template: task arguments: {parameters: [{name: cmd, value: "trainer run --config conf.yaml"}]}
- name: eval dependencies: [train]
template: task arguments: {parameters: [{name: cmd, value: "eval run --gate conf/gates.yaml"}]}
- name: task inputs: {parameters: [{name: cmd}]}
container: {image: "ml/ct:latest", command: ["/bin/bash","-lc"], args: ["{{inputs.parameters.cmd}}"]}
Gate skript (psevdokod):
python ok = (pr_auc >= gate.pr_auc_min and ece <= gate.ece_max and expected_cost_delta <= gate.expected_cost_delta_max and slice_drop_pct <= gate.slice_drop_max_pct and features_equivalence_p95_abs_diff <= gate.features_equivalence_p95_abs_diff_max)
exit(0 if ok else 1)
Siyosat (g’oya) targ’iboti:
yaml promotion:
require:
- offline_gates_passed
- canary_online_hours >= 24
- slo_green: [latency_p95, error_rate, coverage]
- drift_warn_rate <= 5%
14) Jarayonlar va RACI
R (Responsible):- Data Eng — Ingest/Silver/Gold, Feature Store, CDC/Backfill;
- Data Science - tanlash/o’qitish/kalibrlash/geytlar;
- MLOps - orkestrlash/reyestr/serving/kuzatish.
- A (Accountable): Head of Data / CDO.
- C (Consulted): Compliance/DPO (PII/RG/AML/DSAR), Security (KMS/audit), SRE (SLO/qiymat), Finance (byudjetlar/ROI), Product.
- I (Informed): Marketing/Operatsiyalar/Qo’llab-quvvatlash.
15) Joriy etish yo’l xaritasi
MVP (3-6 hafta):1. DAG «daily CT»: Bronze→Silver→Gold→Feature Store→Train→Eval→Registry(Staging).
2. Feature Store v1 va online/offline ekvivalentlik testi.
3. Sifat geytlari (PR-AUC/ECE/expected-cost/slice).
4. Modellar registri, kartochka va WORM-relizlar arxivi.
2-faza (6-12 hafta):- Avto-recalibration/threshold update, canary-promotion bo’yicha onlayn SLO.
- dreyf bo’yicha Event-driven retrain; migratsiya uchun dual-write v1/v2.
- Cost-dashbordlar va backfill/replay kvotalari; multi-tenant izolyatsiya.
- Slaytlar boʻyicha fairness siyosati va avto-reporting.
- Alohida kalitli ko’p mintaqaviy rezidentlik (EEA/UK/BR).
- Jadval va voqealar bo’yicha avto-retreyn, payplaynlarning avtogen hujjatlari.
16) Oziq-ovqat tayyorgarligi chek-varaqasi
- Pipeline-as-Code в Git; CI testlari (unit/integratsiya/kontraktlar/xavfsizlik).
- Bronze/Silver/Gold va Feature Store barqaror; ekvivalentligi fich yashil.
- Oflayn geytlar o’tdi; model kartochkasi to’ldirilgan; WORM arxivi yaratilgan.
- Canary ≥ 24 soat yashil SLO bilan; rollback tugmasi va kill-switch ishlamoqda.
- Dreyf/ECE/expected-cost va onlayn-yorliqlar monitoringi yoqilgan.
- PII/rezidentlik/DSAR/RTBF/Legal Hold; audit oʻrnatilgan.
- Budjetdagi qiymati; kesh/kvotalar/chichlar va repleylar uchun limitlar faol.
17) Anti-patternlar va tavakkalchiliklar
Orkestratordan tashqarida qo’lda, «bir martalik» qadamlar; Git tarixi yoʻq.
Geyt va kartochkalarsiz o’qitish; «qo’lda» targ’ib qilish.
Kelishilmagan online/offline fichlari → prodda tafovutlar.
Dreyf/kalibrlash ignori/expected-cost; faqat ROC-AUC «tur uchun».
Rezidentlik yo’qligi/PII-siyosat; «xom» IDni loglash.
Cheklanmagan backfill/replay → qiymat portlashi va SLAga ta’siri.
18) Jami
AI-payplaynlar noutbuklar to’plami emas, balki qadriyat konveyeridir. Ma’lumotlar qatlamlarini rasmiylashtiring, Feature Store va CT/CI/CD, sifat va xavfsizlik geytlarini qo’shing, dreyfni avtomatlashtiring, onlayn/offline ekvivalentligini va shaffof iqtisodiyotni saqlang. Shunday qilib, siz tezkor, oldindan aytib bo’ladigan va to’ldiriladigan «ma’lumotlar → model → effekt» tsikliga ega bo’lasiz.