Personalization models

Personalization is a system where data → models → display policy → action → feedback. The goal is to maximize incremental value (income/retention/satisfaction) while meeting constraints (ethics/RG, frequency caps, diversity, freshness, SLA).

1) Data and views

Raw materials: events (views/clicks/games/purchases/deposits), content catalog (attributes), user profiles, context (time/geo/device/channel), quality signals (bot/fraud).

Fici:

User: RFM, category preferences, price sensitivity, time of day, device.
Item: genre/category, studio/provider, language, price/volatility, "freshness."
Context: dow/hod, promo/events, session, login channel.
Embeddings: user/item collaborative spaces (MF/Word2Vec2Rec/transformers), multimodal (text/images).
Quality: point-in-time (without faces), UTC-time, idempotence of events, PII masking.

2) Basic paradigms

1. Content-based - proximity according to the attributes of the address and user profile.
2. Collaborative filtering (CF) - similar users/aytems based on interaction signals.
3. Matrix factorization/embeddings - hidden factors, dot-product/MLP for score.
4. Learning-to-Rank (LTR) - gradient boosting/neural networks for ranking lists (pairwise/listwise).
5. Re-ranking of the layer - post-processing, taking into account diversification/novelty/restrictions.
6. Contextual bandits - online learning with exploration-exploitation.
7. RL/seq-recommendations - path/session optimization (multi-step award).

3) Decision pipeline

1. Recall (rapid candidate selection, 200-5k): ANN by embeddings, rule-base/category, popularity.
2. Rank (exact scoring, 20-200): LTR/MLP with rich features.
3. Re-rank/Policy (Fin. list, 5-30): multipurpose optimization + constraints and diversification.

4. Action: show/push/e-mail/personal showcase with mouth guards and a "quiet watch."

5. Feedback: implicit/explicit signals → retraining/bandit-update.

4) Multi-purpose goals and limitations

Objectives: CTR/CTCVR, retention, revenue, margin, LTV, satisfaction, speed.
Restrictions: contact frequency, RG/compliance, variety of categories, brand/provider quotas, fairness.

Wording:

[
\max \sum_i w_i \cdot \text{Objective}_i \quad
\text{s. t. } \text{caps, RG, diversity, SLA}
]

Practice: do policy-aware re-ranking (see § 7), where speed is combined with rules.

5) Cold start and small data

New users: popularity by segment/channel/geo, content-based by questionnaire/first click, bandit with wide intelligence.
New aitems: content embeddings (text/tags), metadata, "look-alike" by provider/genre.
Few-shot: embedding transfer/shared tower.

6) Scoring metrics

Offline

Classification/ranking: AUC/PR-AUC, NDCG @ k, MAP, Recall @ k.
Business: eCPM/eRPM, expected revenue/margin, LTV proxy.
Multipurpose: weighted metrics (e.g. NDCG with gain = value).
Calibration: Brier, ECE (for probabilities).
Списки: coverage/diversity/novelty/serendipity.

Online

A/B and bandit tests: CTR, CTCVR, income/session, D1/D7 retention, complaints/unsubscribes (guardrails), latency/SLA.
Increment: lift%, CUPED/quasi-experiments in complex randomization.

7) Diversification and policy-aware re-ranking

MMR/PM-2/xQuAD: balance of "relevance × novelty."

Quotas: min/max by genre/provider/risk category.
Fairness: Limit shares to avoid systematic skewing.

Scoring example:

[
\textstyle \text{Score} = \alpha \cdot \hat{p}_{\text{click}} + \beta \cdot \text{Value} - \gamma \cdot \text{Fatigue} + \delta \cdot \text{Novelty}
]

Hysteresis: do not "blink" lists; Update items with inertia.

8) Contextual Bandits and RLs

Bandits (LinUCB, Thompson): fast online-learn, exploration control. Good for first position/creative/channel.
Cascading bandits: top-k optimization.
RL (DQN/Policy Gradient/SlateQ): session personalization, multi-step reward optimization (return/revenue/long session).
Safety: off-policy assessment (IPS/DR), simulators, caps for research, safe RL.

9) Personalization for causal effect

Uplift models: who should be touched (persuadables), Qini/AUUC, uplift @ k.
Treatment-aware ranking: Include increment probability instead of raw CTR.
Guardrails: Do-Not-Disturb segments, RG rules, fairness.

10) Architecture and MLOps

Feature Store: online/offline parity, point-in-time, TTL for session features.
Candidate services: ANN/FAISS/ScaNN, caching/sharding by segment.
Ranker: gradient boosting/MLP/tower architectures, calibration.
Policy/Re-rank: rules/restrictions, diversification, bandit layer.
Orchestration: request idempotency, p95 latency ≤ 100-300 ms, DLQ/retray.
Observability: correlation _ id trace, PSI, quality metrics, stopcock.

11) Security, privacy, ethics

PII minimization: tokenization, RLS/CLS, masking.
Explainability: top-features/reasons for showing; path of appeal.
Ethics/RG: frequency caps, "quiet hours," prohibitions on aggressive offers from vulnerable groups.
Compliance: audit of decisions/logs, versions of policies and creatives.

12) Passports and decision tables

Reference certificate (example)

ID/version: 'REC _ HYBRID _ RANK _ v5'

Recall: ANN (user/item embeddings), top-500

Ranker: LTR-GBM + MLP (features: user RFM, item meta, context)

Re-rank: PM-2 (diversity), brand quotas, RG filters, frequency caps

Goals/Metrics: NDCG @ 10, eRPM, zhaloby≤Kh, latency p95≤150 ms

A/B: 14 days, CUPED; guardrails - RG/deliverability

Owners/Logging/Runibook

Decision table

Condition	Context	Action	Restrictions	Comment
`new_user` & `low_history`	onboarding	popular@segment + content-based seed	frequency caps, RG	cold start
`session_len>3` & `diversity_low`	session	re-rank с MMR	min 3 categories	serendipity
`uplift_push≥τ`	offers	personal push	Do-Not-Disturb, zhaloby≤Kh	effects, not CTR

13) Pseudo Code (sketch)

A. Hybrid recall + rank + re-rank

python
Recall cands_emb = ann. recall(user_embed, topk=500)
cands_rule = rule_based. popular_by_segment(user, k=200)
cands = dedup(cands_emb + cands_rule)

Rank features = featurize(user, cands, context)  # user/item/context scores = ranker. predict(features)      # CTR/Value score

Re-rank (policy-aware)
final = rerank(
cands, scores,
constraints=dict(diversity_min={'category':3},
brand_quota={'A':0. 3,'B':0. 3},
rg_filter=True,
freq_caps=per_user_caps(user)),
objective_weights=dict(ctr=0. 6, value=0. 3, novelty=0. 1)
)
return final[:N]

B. Thompson Sampling for Creatives

python beta priors per creative: (α, β)
for creative in creatives:
p_hat = np. random. beta(alpha[creative], beta[creative])
chosen = argmax(p_hat)
show(chosen)
update(alpha, beta, reward=click)

14) Diagnostics and monitoring

Quality: NDCG/Recall @ k, eRPM, coverage/diversity, calibration.
Online: CTR/CTCVR, income/session, retention, complaints/unsubscribes, latency/timeout.
Drift: PSI/KL by key features, oflayn↔onlayn correlation drop.
Restrictions: fulfillment of quotas/diversity, impacts to RG filters, frequency caps.
Runibooks: recall degradation (ANN drop), increase in complaints, surge in timeouts, emergency folback (popular-safe).

15) Frequent errors

Optimization of "raw CTR" instead of increment/value.

There is no re-ranking layer → scant variety, "vision tunnel."

Faces from the future, TZ mixing, inconsistent signal definitions.

Lack of calibration and thresholds → budget and frequency caps "deteriorate."

Ignore RG/ethics and fairness → complaints, risks, regulatory issues.
Online/offline non-synchron feature → a failure in sales.

16) Pre-Release Personalization Checklist

Model passport (goals, limitations, metrics, owners, versions)
Recall/Rank/Re-rank posted; ANN and caches warmed
PIT features and calibration, offline benchmarks (NDCG/PR-AUC) passed
A/B design and guardrails; decision-ready report
RG/Frequency/Diversity/Quota Constraints - Implemented and Monitored
Observability, alerts, stop-crane, folbacks (popular-safe)
Documentation and Runibooks, Incremental Improvement Plan

Total

Personalization models are effective only as a policy-aware system: rich data and embeddings → a Recall/Rank/Re-rank hybrid → bandits/RL for online adaptation → multi-purpose goals for strict restrictions and ethics → disciplined MLOps and monitoring. Such a circuit provides not only "recommendations," but manageable solutions that increase ROMI, LTV and satisfaction - safely, transparently and reproducibly.

Personalization models