GH GambleHub

Correlation and Cause and Effect

Correlation and Cause and Effect

Correlation captures joint changes in variables. Causation answers the question: what happens if we intervene? In analytics, product and risk management, value brings precisely the causal effect: it allows you to evaluate the increment from a solution, and not just an association.

1) Basic concepts

Correlation (association): statistical relation without interpretation of "why." May be caused by common cause, reverse causation, or chance.

Treatment effect: the expected difference between the world "with intervention" and "without intervention."

Counterfactual: impossible observation "what would happen to the same object without impact."

Confounder: a variable that affects both the cause and the result → creates a false relationship.
Collider: a variable that is affected by both cause and result; the collider condition distorts the association.
Simpson paradox: the direction of the effect changes after taking into account the hidden variable/segment.

2) When correlation is sufficient and when it is not

Descriptive analytics, monitoring, EDA: correlations/ranks/heatmap → detect hypotheses and risks.
Decision-making and impact assessment: causal methods (experiments or quasi-experiments) are required.
Prediction models: Correlations are useful, but for ROI/policies - move to causal estimates or uplift models.

3) Experiments: Gold Standard

A/B tests (randomization): eliminate confounding, make groups comparable.
Guardrails: duration ≥ one cycle of behavior, stable exposure, control of seasonality and interference (spillover).
Metrics: effect, confidence intervals, MDE/power, heterogeneity of effect by segment (Heterogeneous Treatment Effect).
Practice: canary releases, phased rollout, CUPED/covariate control to reduce variance.

4) If experiment is not possible: quasi-experiments

Difference-in-Differences (DiD): Difference in before/after changes between "test" and "control." The key assumption is parallel trends before intervention.
Synthetic control: we build "synthetic" control as a weighted mixture of donor groups. Resistant to different trend dynamics.
Region Discontinuity (RDD): threshold rule for assigning impact; comparison on both sides of the threshold. Important: no "manipulation" of the threshold.
Instrumental variables (IV): the variable affects "treatment" but does not directly affect outcome (except through treatment). Required: relevance and validity of the instrument.
PSM/Matching: test and control with similar covariates; useful as preprocessing, but does not eliminate hidden confounders.
Interrupted Time Series (ITS): evaluation of a trend break at a policy point in the absence of other shocks.

5) Causal Graphs and the criteria for "holes"

DAG (oriented acyclic graph): a visual map of causal relationships. Helps you choose which variables to monitor.
Back-door criterion: we block all the back paths (confounders) - we get an unbiased effect estimate.
Front-door criterion: we use an intermediary that fully carries influence to bypass hidden confounders.
Do not control colliders and descendants of the result: this creates displacements.
Practice: first draw a DAG with domain experts, then choose the minimum set of covariates.

6) Potential outcomes and effect estimates

ATE/ATT/ATC: mean effect across all/treated/controls.
CATE/HTE: effect by segment (country, channel, risk class).
Uplift modeling: we teach the model to rank objects by the expected increase from the intervention, and not by the initial probability of the event.

7) Frequent traps

Reverse causality: "an increase in discounts ↔ a drop in demand" - discounts react to a fall, and not vice versa.
Missing variables: unreported stocks/seasonality/regional changes.
Survivors bias: Analysis of "remainers" only.
Leakage: use of future information in training/assessment.
Mixing metrics: optimizing proxy metrics instead of the business effect (Goodhart).

Regression to the mean: Natural returns to the trend mask "effects."

8) Causality in product, marketing and risk

Marketing/campaigns: uplift targeting, differentiated contact frequencies, causal LTV assessments, DiD/synthetic control ROMIs.
Pricing/promotion: RDD (threshold rules), SKU/region sampling experiments.
Recommendations: off-policy assessment (IPS/DR) and bandits; accounting for interference.
Anti-fraud/RG policies: careful with causality - locks change behavior and data; use quasi-experiments and guardrails on FPR and appeals.
Operation management: ITS for releases and incidents; causal graphs for RCA.

9) Analysis procedure: from hypothesis to solution

1. Formulate the question as causal: "What is the effect of X on Y in horizon T?"

2. Draw a DAG: coordinate with the domain, mark confounders/mediators/colliders.
3. Select design: RCT/A-B, DiD, RDD, IV, synthetic control, matching.
4. Define metrics: main (effect), guardrails (quality/ethics/operations), CATE segments.
5. Prepare data: point-in-time, covariates "before" impact, calendar and seasonality.
6. Evaluate effect: baseline models + robast tests (placebo tests, sensitivity).
7. Check robustness: alternative specifications, exclusion of suspect covariates, leave-one-out.
8. Put into action: policy/rollout, SLO, monitoring and retest when drifting.

10) Robast practices and verification

Pre-trend checks (for DiD): test/control trends are similar before intervention.
Placebo/permutations: "fictitious dates" or "fictitious groups" - the effect must disappear.
Sensitivity analysis: how much a hidden confounder will distort the result.
Bounds/pi-intervals: partially identifiable models → confidence bounds.
Multiple testing-BH/Holm adjustments for multiple segments.
External validity: portability of the effect to other markets/channels (meta-analysis).

11) Effect Reporting Metrics

Absolute effect: Δ in units (pp, cu, minutes).
Relative effect:% to baseline.
NNT/NNH: How many objects need to be processed to achieve one outcome/harm.
Cost-Effectiveness: effect/cost; priorities of budgets.
Uplift @ k/Qini/AUUC: for targeted interventions.

12) Causality in ML practice

Causal Features: Don't always improve prediction accuracy, but are better suited to policies.
Causal Forest/Meta-learners (T/X/S-Learner): CATE score and personal uplift.
Counterfactual fairness: fairness of models taking into account causal paths; blocking "unfair" paths.
Do-op vs predict: Distinguish between "predict" and "what if done." The second requires causal models/emulators.

13) Causal Checklist

  • The question is framed as an intervention/policy effect
  • Built and agreed by DAG; minimum set of covariates (back-door) selected
  • Design selected (RCT/quasi experiment) and key assumptions tested
  • Point-in-time data; excluded faces; calendar/seasonality taken into account
  • Effect and confidence intervals calculated; robast checks were carried out
  • Effect heterogeneity (CATE) and risks (guardrails) assessed
  • Value digitized (ROI, NNT/NNH, error cost)
  • Implementation and monitoring plan; retest criteria

14) Mini glossary

Back-door/Front-door: criteria for selecting covariates for effect identification.
IV (instrumental variable): "lever" changing treatment but not outcome directly.
DiD: difference in before/after changes between groups.
RDD: effect estimate near the rule threshold.
Synthetic Control: control as a weighted combination of donors.
HTE/CATE: heterogeneous/conditional effect by segment.
Uplift: the expected increase from the impact, not the probability of an event.


Result

Correlations help to find hypotheses, causality helps to make decisions. Build a DAG, choose an appropriate design (experiment or quasi-experiment), test assumptions and robustness, measure heterogeneous effects, and translate conclusions into policy with guardrails and monitoring. So analytics ceases to be "about connections" and becomes an engine of change.

Contact

Get in Touch

Reach out with any questions or support needs.We are always ready to help!

Start Integration

Email is required. Telegram or WhatsApp — optional.

Your Name optional
Email optional
Subject optional
Message optional
Telegram optional
@
If you include Telegram — we will reply there as well, in addition to Email.
WhatsApp optional
Format: +country code and number (e.g., +380XXXXXXXXX).

By clicking this button, you agree to data processing.