Conversion Analytics
Conversion Analytics
Conversion is not just "number divisible by number." This is a controlled system: clear definitions and event scheme → the correct denominator and time window → segmentation and attribution → connection with value (LTV/ROMI) → monitoring and experiments. Below is a framework that scales from product activation to payment and marketing funnels.
1) Definitions and basic formula
Funnel events: sequential steps (viewing → clicks → registration → verification → deposit → target action).
Step conversion: (\text {CR} _ {i\to j} =\frac {\text {unique entities that have completed} j\text {after} i} {\text {unique entities that have achieved} i})
Conversion end-to-end: (\text {CR} {0\to k} =\prod {s = 0} ^ {k-1 }\text {CR} _ {s\to s + 1})
Accounting unit: user/session/device/order - fix explicitly.
Time window: limit between steps (e.g. check-in → deposit ≤ 7 days).
2) Datasheets (template)
METRIC: `CR_REG2DEP_7D_v2`
Definition: the proportion of registered users who made a ≥1 deposit of 7 days.
Unit: user (user_id, master_id).
Window: 7 × 24 hours from 'ts _ registration'.
Exceptions: bots/fraud/test accounts/duplicates.
Default segments: country, platform, attraction channel.
Sources: 'event _ register', 'event _ deposit'.
Guardrails: fresh≤1ch, coverage≥99%, FPR antifroda≤Kh.
Version/owners/date dictionary.
3) Event schema and data quality
Canonical schema: 'event _ id', 'user _ id', 'device _ id', 'session _ id', 'ts', 'type', 'payload', 'source', 'version'.
Idempotence: dedup by '(source_id, checksum)'; correction log.
Cleaning: bot filters (speed, headless, known-ASN), fraud flags, test accounts.
Identities: bridge 'user _ id ↔ device/email/phone', help about split/merge users.
4) Correct denominators: frequent traps
Selection bias: "only active yesterday in the denominator" → CR overestimation.
Survival: those who left before the step were excluded - CR is artificially growing.
Mixing of units: denominator - sessions, numerator - users.
Double attribution: one success attributed to multiple channels.
Mean of averages: averaging CR across segments instead of aggregating numerators/denominators.
5) Pseudo-SQL: funnel with windows and uniqueness
sql
WITH regs AS (
SELECT user_id, MIN(ts) AS ts_reg
FROM event_register
WHERE ts BETWEEN:from AND:to
GROUP BY user_id
),
deps AS (
SELECT user_id, MIN(ts) AS ts_dep
FROM event_deposit
GROUP BY user_id
),
eligible AS (
SELECT r.user_id, r.ts_reg, d.ts_dep
FROM regs r
LEFT JOIN deps d
ON d.user_id = r.user_id
AND d.ts_dep BETWEEN r.ts_reg AND r.ts_reg + INTERVAL '7 day'
)
SELECT
COUNT() AS users_reg,
COUNT(ts_dep) AS users_dep_7d,
COUNT(ts_dep)::decimal / COUNT() AS cr_reg2dep_7d
FROM eligible;
Drop-off in steps
sql
-- Пример: просмотр → регистрация → депозит (7d)
WITH base AS (...), -- ваш источник views AS (...), regs AS (...), deps AS (...)
SELECT
COUNT(DISTINCT views.user_id) AS step0_view,
COUNT(DISTINCT regs.user_id) AS step1_reg,
COUNT(DISTINCT CASE WHEN regs.ts BETWEEN views.ts AND views.ts + INTERVAL '24 h'
THEN regs.user_id END) AS view2reg_24h,
COUNT(DISTINCT CASE WHEN deps.ts BETWEEN regs.ts AND regs.ts + INTERVAL '7 day'
THEN deps.user_id END) AS reg2dep_7d;
6) Cohorts and segmentation
Cohorts: form by date of first event (registration/first visit) → compare conversion curves.
Segments: Country/Channel/Platform/OS/Device/Content/Price/Partner.
Funnel by segment: CR and drop-off before/after stocks, releases, UX changes.
Fairness: Check for error/CR differences across sensitive segments (ethics/compliance).
7) Attribution: who "deserved" the conversion
Single-touch: last/first click - simple, but distorts long cycles.
Position-based: U-shaped/linear/time decay.
Data-driven (Shapley/Markov): evaluates channel contribution by sequence.
Duplicate control: one success = one credit (or shared), algorithm version fixed.
8) Micro conversions and click quality
Micro steps: viewing the pricing, adding to the basket, checking KYC, filling out the form 50%.
Traffic quality: bounce-rate, engaged-sessions, share of "valid" views, bot patterns.
Association with value: micro-conversions are only useful if correlated/causally related to the business effect (LTV, GGR, Net).
9) Conversion to Money Link: CAC, LTV, ROMI
CAC: cost of attraction per conversion unit (registration/deposit/purchase).
ROMI: (\frac {\text {Incremental Revenue}} {\text {Marketing Costs}} - 1).
LTV-weighted conversion: prioritize segments/channels not by CR, but by expected value.
Causality: ROMI score - via A/B, DiD, synthetic control; correlation is insufficient.
10) Experiments and uplift
A/B tests: randomization, MDE/power, seasonality and interference accounting.
Metrics: core CR + guardrails (complaints, latency, anti-fraud FPR).
Uplift models: target conversion gains, not event probabilities; rate Qini/AUUC, uplift @ k.
11) Time aspects and windows
Look-back/Look-forward: window between exposure (click/view) and conversion/deposit.
Hysteresis: different input/output thresholds to turn promo regressors on/off so as not to "blink."
Calendar: holidays, salary, large events - mandatory regressors/flags.
12) Multi-devices and deduplication
Cross-device: identifier graph (cookie/device/IDFA/email/phone).
One-to-one: we count one target action once per user (or per order/payment).
Test/online: filtered lists of QA/operators/bots - outside the denominator and numerator.
13) Visualizations and Reports
Step-bars/Sankey: drop-off in steps.
Cohort heat maps: CR on day 1/3/7/14/30.
Bridge graphs: contribution of factors to CR change (UX, promo, channel mix).
Dash: fresh timer, coverage events, guardrails, alerts.
14) Monitoring, SLOs and alerts
SLO freshness: update lag ≤ N minutes/hours.
Quality guards: burst of bots/fraud, discord of identities, drop in coverage.
Alerts: CR deviation from seasonal forecast, event breakage, error growth/latency.
15) Pseudo-SQL: last non-direct attribution
sql
WITH touch AS (
SELECT user_id, channel, ts,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY ts DESC) AS rn
FROM session_touchpoints
WHERE is_direct = false
AND ts <= (SELECT first_conversion_ts FROM conversions WHERE conversions.user_id = session_touchpoints.user_id)
),
credit AS (
SELECT user_id, channel FROM touch WHERE rn = 1
)
SELECT channel, COUNT() AS conv, COUNT()::decimal / SUM(COUNT()) OVER() AS share
FROM credit
GROUP BY channel
ORDER BY conv DESC;
16) Anti-patterns
Mean CR by country/channel without weights.
Mix units (sessions vs users) and time zones.
Ignoring formula and version definitions (metric "floats").
Windows "as it turns out" (not fixed) → not comparable CR.
Lack of bot/fraud filters → high metrics.
Last click attribution as the only truth for all decisions.
17) Checklist before publishing conversion report
- Metric Passport: Definition, Unit, Window, Exceptions, Sources, Version
- Event pattern canonized, dedup/idempotence included
- Bots/fraud/QA accounts excluded; identities mixed
- Windows and denominators documented; temporary zones agreed
- Segments/cohorts tested; invariants (DAU ≤ MAU, daily amounts = month) met
- Attribution selected and described; double credit excluded
- Value relationship: CAC/LTV/ROMI added, causal assessment planned
- Dashboard: freshness, coverage, guardrails; alerts are set up
18) Mini glossary
CR (Conversion Rate): The proportion that completed the target activity.
Drop-off: the share "dropped" between steps.
Attribution: method of assigning merit for conversion by touch.
Cohort: group by date of first event.
ROMI: return on marketing investment (incremental).
Uplift: conversion gain from intervention.
Guardrails: risk limiters (complaints, FPR, latency).
Result
Reliable conversion analytics rely on three whales: correct definitions (denominators/windows/units), data discipline (idempotency, dedup, antiboot), association with value (LTV/CAC/ROMI and causality). By building funnels, cohorts, attribution and monitoring on the described framework, you get metrics by which you can really manage the product and marketing, and not just observe the graphs.