Player Retention Analysis

Retention is the core of the product economy: the longer a player stays active, the higher the LTV, the more stable the income and the more predictable the planning. Below is a complete framework: from correct definitions to survival models and the re-activation circuit.

1) Definitions and accounting units

Unit: player (user/master_id) - by default; for short-term tasks, an "account/device" is allowed, but record this in the metric passport.
Activity: return criterion (≥1 session/ ≥1 rate/ ≥1 deposit) - record.
Retention Dn: proportion of cohort returning on Day n after the reference date.
Rolling/Bracket: Rolling D7 (on any of days 1-7) vs Exact D7 (on day 7).
Churn: no activity for ≥T days (eg, 14/30); is specified as a product rule.
Cohorts: by date of registration/first deposit/first game - choose for the marketing/product task.

💡 Golden rule: Fix the activity trigger, time zone, reference date and churn rule in advance.

2) Basic analytics: cohorts and retention curves

Cohort heat maps: D1/D3/D7/D14/D30/D60; diagonals are comparable between releases and campaigns.
Survival curves: proportion of active from day 0 to N (survival curve).
Curve geometry: "steps" of holidays/releases; early "collapse →" onboarding problems, "long tail" → core loyal.

Pseudo-SQL: Cohort D7

sql
WITH regs AS (
SELECT user_id, DATE_TRUNC('day', ts) AS cohort_day
FROM event_register
),
act AS (
SELECT user_id, DATE_TRUNC('day', ts) AS act_day
FROM event_activity
),
d7 AS (
SELECT r. cohort_day,
COUNT(DISTINCT r. user_id)              AS cohort_size,
COUNT(DISTINCT CASE WHEN a. act_day = r. cohort_day + INTERVAL '7 day'
THEN r. user_id END)       AS retained_d7
FROM regs r
LEFT JOIN act a ON a. user_id = r. user_id
GROUP BY 1
)
SELECT cohort_day, cohort_size,
retained_d7::decimal / NULLIF(cohort_size,0) AS cr_d7
FROM d7
ORDER BY cohort_day;

3) Survival and hazard models

Kaplan-Meier: non-model survival score (S (t)); useful for "shape stripping" of the curve and life median.
Cox PH/Accelerated Failure Time: explanatory models of the influence of characteristics (country, channel, platform, bonuses, content) on hazard (risk of outflow).
Discrete-time hazard (log by day): flexible for product analytics and calendar features.
Re-activation event-Model separately (competing risks) or as a transition in a Markov chain.

4) Markov and semi-Markov models

New → Active → Dormant → Churned → Reactivated.
Transitions: probabilities per period (day/week).
Value: multiply the probabilities of staying in "Active" by the average check/frequency - get the expected contribution to LTV.

5) Bundle retention and LTV

LTV ≈ Σ (Retention_t × ARPU_t × discount).
Elasticity: D7 increase by X pp → LTV increase by Y% (from historical data/models).
Prioritization: Improvements affecting early retention (D1-D7) are almost always the most profitable.

6) Retention segmentation

Onboarding cohorts: first content/play category/behavioural pattern on day 0.
Geo/platform/channel: UX and expectations differences; adjust for calendar/holidays.
Behavior/value: RFM (Recency-Frequency-Monetary), outflow risk, profitability.
Response to incentives: segments on uplift-reaction to offers/notifications.

7) Causality and experiments

A/B: onboarding, tutorials, push strategies; main metric - D7/D14/D30 retention, guardrails - complaints, response time, RG.
Quasi-experiments: DiD/synthetic control when randomization is not possible (e.g. regional kickouts).
Uplift models: target return gains, not activity probabilities; evaluate Qini/AUUC.

8) Re-activation: triggers and policy

Signals: frequency drop, no deposits N days, abnormally low check, completed onboarding without 2nd session.

Decision table (example)

Condition	Context	Action	Cooldown	Guardrails
`risk_churn ≥ 0. 8` & `value_q ≥ 0. 8`	VIP	personal offer L	7d	ROMI≥0
`no_session ≥ 7д` & `no_deposit ≥ 14д`	mass segment. push + e-mail "back to..."	5d	zhaloby≤Kh
`RG_risk ≥ τ`	any	rouse/council RG	1d	FPR≤1%

Hysteresis: different input/output thresholds for signals so as not to "blink."

Channels: in-app, push, e-mail, SMS, call center - with rate-limit and priorities.

9) Retention metrics

D1/D7/D30 (Rolling/Exact), WAU/MAU, Stickiness (DAU/MAU).
Survival median/quantiles; hazard at intervals.
Reactivation rate (R30), Dormancy share.
ROMI re-activation, NNT (how many contacts per 1 return).
Fairness: metric differences by country/platform; Exclude invalid characteristics from policies.

10) Retention dashboards

Cohort heat map + trend lines D1/D7/D30.
Survival/hazard graphs by segment.
Early life funnel: install→reg→KYC→1 igra→1 th deposit.
Action map: signal→resheniye→kanal→iskhod (conversion to return).
Guardrails: freshness of data, coverage of events, complaints, RG indicators.

11) Data and quality

Events: canonical scheme (UTC, versions), idempotency, deadup.
Identities: user/device/e-mail/phone - bridges and gold entry.
Windows and TZ: storing in UTC + local views; single calendar of holidays.
Filters: bots/QA/fraud - exclude from the cohort and activities.
Versioning metrics: 'RET _ D7 _ vN' with changelog.

12) Pseudo-SQL/python recipes

Rolling D30 by cohort

sql
WITH base AS (
SELECT user_id, DATE_TRUNC('day', MIN(ts)) AS cohort_day
FROM event_register GROUP BY 1
),
act AS (
SELECT user_id, DATE_TRUNC('day', ts) AS d
FROM event_activity
),
roll30 AS (
SELECT b. cohort_day,
COUNT(DISTINCT b. user_id)                              AS cohort_size,
COUNT(DISTINCT CASE WHEN a. d BETWEEN b. cohort_day AND b. cohort_day + INTERVAL '30 day'
THEN b. user_id END)                      AS any_1_30
FROM base b LEFT JOIN act a ON a. user_id = b. user_id
GROUP BY 1
)
SELECT cohort_day, any_1_30::decimal/cohort_size AS rolling_d30
FROM roll30;

Kaplan-Meier (sketch)

python t_i - time to outflow or censorship; e_i - event indicator
S(t) = Π_{t_i ≤ t} (1 - d_i / n_i)

Discrete-hazard (log by day)

python
For each user, create records before the event/censorship by day:
target = 1 if there was an outflow on that day; characteristics: calendar, activity, promo, etc.
Training logistic regression/GBM; forecast p_t - probability of outflow on day t.

13) Uplift-targeting retention

Zones: Persuadables (will return if we contact), Sure things (will return and so), Lost causes, Do-not-disturb (contact harms).
Metrics: uplift @ k, Qini/AUUC; politics - we contact the top k by uplift for the budget.
Guardrails: cap on contact frequency, RG/ethics, explainability of cause of contact.

14) Operational operation

SLO: retention board update ≤ 06:00 lock.; latency of risk scoring ≤ 300 ms; Decision→Action ≤ 5 с.

Monitoring: shifts of curves by segments, PSI of feature drift, "event break."

Runibooks: D1 drop (onboarding/release), D7 drop (content/frequency), local communication channel failures.

15) Frequent errors

Mixing of units (sessii↔polzovateli), TZ, activity windows.
Comparison of Rolling and Exact indicators as equals.
Ignoring bots/fraud → inflated D1/D7.
Conclusions on correlation without causal validation.
No hysteresis/cooldowns → contact fatigue.
There is no link with LTV - we optimize CR, but not value.

16) Pre-Release Retention Loop Checklist

Metrics passport (activity trigger, window, TZ, version)
Cohort reports and survival/hazard by segment
Outflow and uplift risk models, kapas and guardrails channels
Plan A/B and/or quasi-experiments for interventions
Freshness/coverage/complaints/RG dashboards
Incident Runybooks, Hysteresis and Rate-Limits in Policy
Bundle retention with LTV and ROMI; prioritization by expected value

Total

Retention analysis is not only a "heat map of cohorts," but a managed system: correct definitions, survival/hazard models, association with value, targeted and ethical interventions, rigorous effect assessment, and operational guardrails. You build a "watch → understand → decide → act → learn" cycle that steadily increases LTV and reduces outflow.

Player Retention Analysis