Recommendation systems

The recommendation system is not only a "CTR model." It is a pipeline of data → candidates → ranking → policy → action, → feedback that optimizes incremental value under real-world constraints (speed, frequency caps, diversity, ethics/compliance).

1) Data, signals and representations

Events: views/clicks/adds/purchases/deposits, dwell-time, cancellations.
Content/catalog: attributes (categories/genres/studios/price/freshness/volatility).
User profiles: RFM, preferences, devices/channels, timeslots.
Context: hour/day/holidays/matches, locale/TZ, display site.
Quality: point-in-time recipes, idempotency of events, deadup/antiboot, PII masking.
Embeddings: user/item/context in shared space (MF/Word2Vec2Rec/transformers), multimodal (text/images).

2) Architecture: Recall → Rank → Re-rank → Action

1. Candidate recall (200-5000 candidates): ANN (FAISS/ScaNN), popularity/trends, rule-based filters.
2. Ranking (20-200): LTR (GBM/NN), Tower architectures, binary/multi-target targets (click, conversion, value).
3. Policy-aware re-rank (5-30 in the final list): diversification/novelty/serendipity, brand/category quotas, RG/compliance, frequency caps, fairness.

4. Action: show/push/e-mail/personal showcase with cooldowns and "quiet hours."

5. Feedback: log 'impression→click→action→value', negative feedback (skip, complaint).

3) Model paradigms

Content-based: proximity by IT features and profile; ideal for a cold start for items.
Collaborative filtering: user-user/item-item by interaction matrix.
Factorization/embeddings: MF/BPR/NeuMF, two-tower MLP (user tower × item tower).
Learning-to-Rank: pairwise/listwise (LambdaMART, RankNet), NDCG optimization @ k.
Session/sequential: GRU4Rec, SASRec, Transformers (T5-style) - order/context in the session.
Contextual bandits: LinUCB/Thompson for quick online adaptations and creatives.
RL: SlateQ/DQN/Policy Gradient for multi-step award (retention/LTV).

Causal/uplift approaches: recommendations that take into account the increase, and not the "raw CTR."

4) Objectives, limitations and formulation of the task

Objectives: CTR/CTCVR, revenue/margin/LTV, retention, satisfaction, speed.
Limitations: diversification, provider/category quotas, frequency caps, RG/compliance, fairness/ethics, SLA p95.

Policy-aware re-rank (example of scalarization):

[
\textstyle Score = \alpha \cdot \hat p_{\text{click}} + \beta \cdot \text{Value}

\gamma \cdot \text{Fatigue} + \delta \cdot \text{Novelty} - \sum_j \lambda_j \cdot \text{Penalty}_j
]

where Penalty are quota/RG/frequency/monotony violations.

5) Metrics and scoring

Offline

Relevance/ranking: AUC/PR-AUC, Recall @ k, MAP, NDCG @ k.
Business: eRPM/eCPM, proxy-LTV, expected margin.
Calibration: Brier, ECE (important for thresholds/policies).
Списки: coverage/diversity/novelty/serendipity.

Online

A/B/multi-label tests: CTR, CTCVR, income/session, retention, complaints/unsubscribes (guardrails), latency/timeout.
Causal assessment: CUPED, quasi-experiments (DiD/synthetic control) at limited randomization.
Uplift metrics: Qini/AUUC, uplift @ k - for treatment-aware recommendations.

6) Cold start and sparseness

New users: popular @ segment, content survey, content based on the first click, bandit with wide intelligence.
New aytems: metadata/text embeddings/images + look-alike by studio/category.
Small domains: transfer learning, multi-task (shared tower), cross-domain distillation.

7) Diversification, novelty, serendipity

Algorithms: MMR, xQuAD, PM-2; fines for monotony.
Quotas: min/max by category/brand/risk class.
List stability: position inertia, update hysteresis; do not "flash" the output.

8) Infrastructure and MLOps

Feature Store: PIT recipes, TTL for session features, online/offline parity.
ANN services: FAISS/ScaNN, sharding/cache, replication.
Ranker: real-time features, calibration, version signatures.
Policy/Re-rank layer: limits/quotas/RG/frequencies/diversity.
SLA: end-to-end p95 ≤ 100–300 мс; fallback (popular-safe) under degradation.
Observability: correlation _ id traces, feature drift (PSI), online quality metrics, stop crane.

9) Security, privacy, ethics

PII minimization, RLS/CLS, masking.
RG/compliance filters before display, frequency caps, quiet hours.
Fairness diagnostics by segment; explanation of the reasons for the show; path of appeal.

10) Pseudo-code: Recall → Rank → Re-rank hybrid

python
Recall cand_emb = ann. recall(user_embed, topk=500)
cand_rule = popular. by_segment(user. segment, k=200)
cands = dedup(cand_emb + cand_rule)

Rank features = featurize(user, cands, context)   # user/item/context scores = ranker. predict(features)        # p(click), value

Policy-aware re-rank final = rerank(
cands, scores,
constraints=dict(
diversity_min={'category': 3},
brand_quota={'A':0. 3,'B':0. 3},
rg_filter=True,
freq_caps=get_user_caps(user)
),
objective_weights=dict(ctr=0. 6, value=0. 3, novelty=0. 1)
)
return final[:N]

Thompson Sampling for Creatives (sketch)

python beta priors per creative: (α, β)
samples = {cr: np. random. beta(alpha[cr], beta[cr]) for cr in creatives}
chosen = max(samples, key=samples. get)
show(chosen)
update(alpha, beta, reward=click)

11) Pseudo-SQL: negative feedback and frequency caps

sql
-- Last show and "hide/complain" flags → 7-day bans
WITH last_impr AS (
SELECT user_id, item_id,
MAX(ts) AS last_ts,
BOOL_OR(feedback_hide) AS hidden,
BOOL_OR(feedback_report) AS reported
FROM impressions
GROUP BY 1,2
)
SELECT i.
FROM inventory i
LEFT JOIN last_impr l ON l. user_id=:uid AND l. item_id=i. item_id
WHERE COALESCE(l. hidden,false)=false
AND COALESCE(l. reported,false)=false
AND (l. last_ts IS NULL OR l. last_ts < NOW() - INTERVAL '7 day');

12) Decision table

Condition	Context	Action	Restrictions	Comment
`new_user & low_history`	onboarding	popular@segment + content-seed	frequency caps, RG	cold start
`session_len>3 & diversity_low`	session	MMR/xQuAD re-rank	min 3 categories	serendipity
`uplift_push≥τ`	offers	personal push	Do-Not-Disturb, zhaloby≤Kh	increment, not CTR
`risk_RG≥τ`	any	risk content block	RG/Compliance	safety

13) Anti-patterns

Optimization of "raw CTR" instead of increment and value.

Lack of re-rank layer → excess monotony, "vision tunnel."

Faces from the future; mixing TZ; non-versed signal definitions.
No calibration of probabilities → incorrect thresholds/policies.
Ignore RG/ethics/fairness → complaints/risks/fines.
Online/offline desynchronized feature and metrics - "drawdown" in the food.
Absence of fallback and stop valve.

14) Recommendation launch checklist

System Passport - Objectives, Limitations, Metrics, Owners, Versions
Recall/Rank/Re-rank divorced; ANN warmed up, caches configured
PIT features, calibration, offline benchmarks (NDCG/PR-AUC) passed
A/B design and guardrails; decision-ready report
Restrictions: diversity/quotas/RG/frequency caps - implemented and monitored
SLA p95, traces, alerts, stop crane and popular-safe fallback
Documentation, Runibooks, Incremental Improvement Plan

Total

A strong recommendation system is the policy-aware pipeline: a hybrid Recall/Rank/Re-rank that optimizes incremental value under speed, ethics, and diversity constraints. By adding bandits/RL for online adaptation, MLOps discipline and correct causal assessment, you get not "lists for the sake of lists," but managed solutions that increase ROMI, LTV and user satisfaction - stable and safe.

Recommendation systems