Data visualization
Data visualization
Visualization is a way to turn data into solutions. A good graph saves time, reduces cognitive load and helps to see patterns rather than "patterns." Below is a field guide: from goals and chart selection to design, storytelling and operation in the product.
1) Targets and audiences
Objectives: research (EDA), explanation (insight → action), monitoring (dashboards), persuasion (presentations).
Audiences: management (high-level and trends), product/marketing (funnels, cohorts), engineers/ML (SLA, drift, model metrics), compliance (risks/control).
The golden rule: one visualization is one main question.
2) Chart selection (cheat sheet)
Anti-patterns: 3D graphics, double axes without obvious need, overloaded legends.
3) Composition and readability
Hierarchy: title → key insight of the → part.
Grid and indents: remove extra lines; numeric signatures are rarer, but appropriate.
Fonts: 3 sizes (title, axes, signatures); avoid kaps and "trifles."
Annotations: Sign peak/anomalous points, policy/campaign changes.
Layout dashboard: rule "Z" or "F," 3-6 cards per screen, one NSM on top.
4) Color and coding
Color value: categorical - quality palettes; ordinal - gradients; divergent - for "above/below normal."
Contrast: ratio ≥ 4. 5:1 for text; check color-blind palettes.
Minimum colors: ideal - 1 accent + 1-2 auxiliary.
Data channel: first position/length, then angle/area, color - as an amplifier.
Accent: emphasize the main thing (highlight), the rest is gray.
5) Storytelling
Frame: context → conflict (question/anomaly) → decoupling (output/action).
Narrative on the graph: leading title (insight), subtitle (how to read), notes (why important).
Comparisons: before/after, control/test, YoY/DoD, normalized values.
Units and scales: explicit units, reasonable rounding, zero point on bar charts.
6) Dashboards: from layout to operation
Layers: Executive (1-2 NSM + 3 drivers), Domain (funnels/cohorts), Ops/ML (SLA/drift/alerts).
Filters: time, segments (country/channel/platform), experiments.
Cards: KPI-tiles with trend/sparkline, drill-down by click.
States: empty (no data), "error," "load."
Update: Specify frequency/lag (e.g. "updated 10 min ago").
7) Visualization quality metrics
Time to insight (TTI): seconds to understand "what is happening here."
Cognitive load: number of elements/legends; the goal is minimum gaze switches.
Reading accuracy: discrepancy "by eye" vs real values.
Usage: clicks/scrolling/saves; which card provides solutions.
Trust: the proportion of correct interpretations in a user test.
8) Availability and localization
Alt texts and descriptive headings.
Colors distinguishable by color blindness; duplicate colors with shape/stroke.
Locales of numbers/dates, right-handed scales for some languages.
Keyboard navigation and screen-reader shortcuts for web dashboards.
9) Anti-patterns
Chartjunk: decorative elements that carry no meaning.
Pies with 7 + sectors: Replace with a bar chart.
Two Y-axes without a clear need: it is better to normalize/show two panels.
False accuracy: 12 decimal places, "torn" scales without warning.
Infinite interactivity: hides the main idea - first a static key view.
10) Data Task Visualization Templates
Cohorts and retention: heatmap/calendar + trend lines D7/D30.
Funnels: step bar + conversion deltas; annotations of experiments.
ML monitoring: metrics (PR-AUC, Recall@FPR≤x%), calibration (Reliability curve), drift (PSI heatmap), latency p95.
Finance: waterfall (bridge) for factor contributions to GGR/revenue.
Anomalies: line with confidence corridor + event/release markers.
Segmentation: small multiples by segment; UMAP scatter with painting.
11) Tools and stack
Research: notebooks + matplotlib/plotly, ggplot-like grammars.
BI/dashboards: Tableau/Power BI/Looker/Metabase/Superset.
Web front: D3/Observable, Plotly. js, Vega-Lite; for production widgets - light canvas/WebGL libraries.
Standards: design system of graphs (colors, grids, fonts), template components.
12) Performance and data
Calculate aggregates on the DWH side; lazily load large series.
Downsampling/binings for long rows; "small multiplicities" instead of giant heatmap.
Caching popular slices; precompute sparklines.
Control N unique categories (≤ 12 per graph).
13) Uncertainty and comparison visualization
Confidence intervals/tapes, error bars, box/violin for distributions.
Transparency/hatching for plan/actual.
Normalize the units; for relative changes - index (t0 = 100).
Do not mix linear and logarithmic scales without explicit explanation.
14) Visual code review and governess
Review checklist: Is the goal clear? Is the schedule correct? legend readable? Units/Source/Date Updated?
Dictionary of terms: uniform definitions of KPIs; a version of formulas on graphs.
Versioning: "dashboard vX," release date, changelog.
Safety: Mask PII; aggregate to a safe level.
15) Pre-publication checklist
- Title articulates insight, not "graph type"
- Axis labels/units/source/date updated
- Scales and zero point are correct; no misleading axes
- Colors are contrasting and color-blind; legend minimal
- Annotations of key events/experiments added
- There are empty/error states and a negotiated update SLA
- Visualization Passes "5 Second Comprehension Test"
Mini Glossary
Small multiples: a series of identical graphs for different segments/periods.
Chartjunk: visual "garbage" that does not carry data.
Diverging palette: a palette with a neutral middle (below/above normal).
Sparklines: Mini-spark charts alongside KPIs.
Total
Strong visualization is not "beautiful graphs," but a clear thought, a correctly chosen type of diagram, a discipline of composition and colors, an honest reflection of uncertainty and a neat dashboard experience. Make a simple start view, emphasize the main thing, document definitions and monitor operation - this is how visualization becomes a control tool, not a decoration.