Personalization models
Personalization models
Personalization is a system where data → models → display policy → action → feedback. The goal is to maximize incremental value (income/retention/satisfaction) while meeting constraints (ethics/RG, frequency caps, diversity, freshness, SLA).
1) Data and views
Raw materials: events (views/clicks/games/purchases/deposits), content catalog (attributes), user profiles, context (time/geo/device/channel), quality signals (bot/fraud).
Fici:- User: RFM, category preferences, price sensitivity, time of day, device.
- Item: genre/category, studio/provider, language, price/volatility, "freshness."
- Context: dow/hod, promo/events, session, login channel.
- Embeddings: user/item collaborative spaces (MF/Word2Vec2Rec/transformers), multimodal (text/images).
- Quality: point-in-time (without faces), UTC-time, idempotence of events, PII masking.
2) Basic paradigms
1. Content-based - proximity according to the attributes of the address and user profile.
2. Collaborative filtering (CF) - similar users/aytems based on interaction signals.
3. Matrix factorization/embeddings - hidden factors, dot-product/MLP for score.
4. Learning-to-Rank (LTR) - gradient boosting/neural networks for ranking lists (pairwise/listwise).
5. Re-ranking of the layer - post-processing, taking into account diversification/novelty/restrictions.
6. Contextual bandits - online learning with exploration-exploitation.
7. RL/seq-recommendations - path/session optimization (multi-step award).
3) Decision pipeline
1. Recall (rapid candidate selection, 200-5k): ANN by embeddings, rule-base/category, popularity.
2. Rank (exact scoring, 20-200): LTR/MLP with rich features.
3. Re-rank/Policy (Fin. list, 5-30): multipurpose optimization + constraints and diversification.
4. Action: show/push/e-mail/personal showcase with mouth guards and a "quiet watch."
5. Feedback: implicit/explicit signals → retraining/bandit-update.
4) Multi-purpose goals and limitations
Objectives: CTR/CTCVR, retention, revenue, margin, LTV, satisfaction, speed.
Restrictions: contact frequency, RG/compliance, variety of categories, brand/provider quotas, fairness.
[
\max \sum_i w_i \cdot \text{Objective}_i \quad
\text{s. t. } \text{caps, RG, diversity, SLA}
]
Practice: do policy-aware re-ranking (see § 7), where speed is combined with rules.
5) Cold start and small data
New users: popularity by segment/channel/geo, content-based by questionnaire/first click, bandit with wide intelligence.
New aitems: content embeddings (text/tags), metadata, "look-alike" by provider/genre.
Few-shot: embedding transfer/shared tower.
6) Scoring metrics
Offline
Classification/ranking: AUC/PR-AUC, NDCG @ k, MAP, Recall @ k.
Business: eCPM/eRPM, expected revenue/margin, LTV proxy.
Multipurpose: weighted metrics (e.g. NDCG with gain = value).
Calibration: Brier, ECE (for probabilities).
Списки: coverage/diversity/novelty/serendipity.
Online
A/B and bandit tests: CTR, CTCVR, income/session, D1/D7 retention, complaints/unsubscribes (guardrails), latency/SLA.
Increment: lift%, CUPED/quasi-experiments in complex randomization.
7) Diversification and policy-aware re-ranking
MMR/PM-2/xQuAD: balance of "relevance × novelty."
Quotas: min/max by genre/provider/risk category.
Fairness: Limit shares to avoid systematic skewing.
[
\textstyle \text{Score} = \alpha \cdot \hat{p}_{\text{click}} + \beta \cdot \text{Value} - \gamma \cdot \text{Fatigue} + \delta \cdot \text{Novelty}
]
Hysteresis: do not "blink" lists; Update items with inertia.
8) Contextual Bandits and RLs
Bandits (LinUCB, Thompson): fast online-learn, exploration control. Good for first position/creative/channel.
Cascading bandits: top-k optimization.
RL (DQN/Policy Gradient/SlateQ): session personalization, multi-step reward optimization (return/revenue/long session).
Safety: off-policy assessment (IPS/DR), simulators, caps for research, safe RL.
9) Personalization for causal effect
Uplift models: who should be touched (persuadables), Qini/AUUC, uplift @ k.
Treatment-aware ranking: Include increment probability instead of raw CTR.
Guardrails: Do-Not-Disturb segments, RG rules, fairness.
10) Architecture and MLOps
Feature Store: online/offline parity, point-in-time, TTL for session features.
Candidate services: ANN/FAISS/ScaNN, caching/sharding by segment.
Ranker: gradient boosting/MLP/tower architectures, calibration.
Policy/Re-rank: rules/restrictions, diversification, bandit layer.
Orchestration: request idempotency, p95 latency ≤ 100-300 ms, DLQ/retray.
Observability: correlation _ id trace, PSI, quality metrics, stopcock.
11) Security, privacy, ethics
PII minimization: tokenization, RLS/CLS, masking.
Explainability: top-features/reasons for showing; path of appeal.
Ethics/RG: frequency caps, "quiet hours," prohibitions on aggressive offers from vulnerable groups.
Compliance: audit of decisions/logs, versions of policies and creatives.
12) Passports and decision tables
Reference certificate (example)
ID/version: 'REC _ HYBRID _ RANK _ v5'
Recall: ANN (user/item embeddings), top-500
Ranker: LTR-GBM + MLP (features: user RFM, item meta, context)
Re-rank: PM-2 (diversity), brand quotas, RG filters, frequency caps
Goals/Metrics: NDCG @ 10, eRPM, zhaloby≤Kh, latency p95≤150 ms
A/B: 14 days, CUPED; guardrails - RG/deliverability
Owners/Logging/Runibook
Decision table
13) Pseudo Code (sketch)
A. Hybrid recall + rank + re-rank
python
Recall cands_emb = ann. recall(user_embed, topk=500)
cands_rule = rule_based. popular_by_segment(user, k=200)
cands = dedup(cands_emb + cands_rule)
Rank features = featurize(user, cands, context) # user/item/context scores = ranker. predict(features) # CTR/Value score
Re-rank (policy-aware)
final = rerank(
cands, scores,
constraints=dict(diversity_min={'category':3},
brand_quota={'A':0. 3,'B':0. 3},
rg_filter=True,
freq_caps=per_user_caps(user)),
objective_weights=dict(ctr=0. 6, value=0. 3, novelty=0. 1)
)
return final[:N]
B. Thompson Sampling for Creatives
python beta priors per creative: (α, β)
for creative in creatives:
p_hat = np. random. beta(alpha[creative], beta[creative])
chosen = argmax(p_hat)
show(chosen)
update(alpha, beta, reward=click)
14) Diagnostics and monitoring
Quality: NDCG/Recall @ k, eRPM, coverage/diversity, calibration.
Online: CTR/CTCVR, income/session, retention, complaints/unsubscribes, latency/timeout.
Drift: PSI/KL by key features, oflayn↔onlayn correlation drop.
Restrictions: fulfillment of quotas/diversity, impacts to RG filters, frequency caps.
Runibooks: recall degradation (ANN drop), increase in complaints, surge in timeouts, emergency folback (popular-safe).
15) Frequent errors
Optimization of "raw CTR" instead of increment/value.
There is no re-ranking layer → scant variety, "vision tunnel."
Faces from the future, TZ mixing, inconsistent signal definitions.
Lack of calibration and thresholds → budget and frequency caps "deteriorate."
Ignore RG/ethics and fairness → complaints, risks, regulatory issues.
Online/offline non-synchron feature → a failure in sales.
16) Pre-Release Personalization Checklist
- Model passport (goals, limitations, metrics, owners, versions)
- Recall/Rank/Re-rank posted; ANN and caches warmed
- PIT features and calibration, offline benchmarks (NDCG/PR-AUC) passed
- A/B design and guardrails; decision-ready report
- RG/Frequency/Diversity/Quota Constraints - Implemented and Monitored
- Observability, alerts, stop-crane, folbacks (popular-safe)
- Documentation and Runibooks, Incremental Improvement Plan
Total
Personalization models are effective only as a policy-aware system: rich data and embeddings → a Recall/Rank/Re-rank hybrid → bandits/RL for online adaptation → multi-purpose goals for strict restrictions and ethics → disciplined MLOps and monitoring. Such a circuit provides not only "recommendations," but manageable solutions that increase ROMI, LTV and satisfaction - safely, transparently and reproducibly.