Materialized Views
A Materialized View (MV) is a physically stored query result (aggregation/projection) that is periodically or continuously updated and available for quick reads. In fact, this is "pre-calculated" data with controlled freshness and cost of reading.
The main goals are:- Stabilize reading latency (p95/p99).
- Unload hot OLTP tables.
- Give a predictable SLA for analytics, APIs and features (recommendations, counters, directories).
1) When to use MV (and when not)
Fit:- Frequently repeated heavy requests (join/agg/window) with a valid update delay.
- CQRS/product projections: dashboards, catalogs, ranked lists, counters.
- Multi-region reads: "local" copies of totals.
- Ultra-strict relevance "per record" without compensation logic → better indexes/OLTP + cache/streaming.
- Complex transactional invariants do not replace transactions when writing MV →.
2) MV vs cache vs projection
Cache: "response copy," managed by TTL/disability at the application level; no schema.
MV: "copy of data," managed by DBMS/engine; there is a scheme, indexes, transactionality refresh.
Projection (event sourcing/CQRS): computed from events; often implemented as a table + incremental updates (that is, essentially "manual MV").
3) Update methods
3. 1 Batch REFRESH (periodic)
Scheduler (cron/skeduler): 'REFRESH MATERIALIZED VIEW...'.
Pros: Simple, predictable, cheap. Cons: stale windows.
3. 2 Incremental refresh
Deltas by keys/time window, upserts in MV.
Source of changes: CDC (Debezium, logical replication), streaming (Kafka/Flink/Spark), triggers.
Pros: low latency and cost. Cons: more complex code and consistency.
3. 3 Streaming MV
In column/streaming ICEs: materialized streams/tables (ClickHouse/Kafka, Flink SQL, Materialize, BigQuery MV).
Pros: Seconds and below. Cons: requires stream infra and clear keys/watermarks.
4) Consistency and "freshness"
Strong consistency of MV occurs with the "atomic" refresh (read-switch to the new version).
More often - bounded staleness: "not older than Δ t/window." Communicate this in API/UX contracts.
For payments/strict invariants, keep the CP core in OLTP and use the MV as a read-plane.
5) Modeling and layout
Make MV narrow in purpose: one task - one MV.
Store temporary keys (event_time/watermark) and business keys (tenant_id entity_id).
Indexes for frequent filters/sorting; column DBMS - for aggregates/scans.
Participation by date/tenant/region for quick refresh and retention.
6) Incremental updates: upsert-projection pattern
1. Change arrives (CDC/event).
2. Consider the delta for the MV string (recompute/merge).
3. 'UPSERT' by key ('tenant _ id, entity_id, bucket').
4. Updating freshness metadata.
Idempotence is mandatory: delta repetition should not break the bottom line.
7) Examples (conceptually)
PostgreSQL (batch refresh)
sql
CREATE MATERIALIZED VIEW mv_sales AS
SELECT date_trunc('day', created_at) AS day,
tenant_id,
SUM(amount) AS revenue,
COUNT() AS orders
FROM orders
GROUP BY 1,2;
-- Быстрые чтения
CREATE INDEX ON mv_sales (tenant_id, day);
-- Без блокировок чтения при обновлении
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_sales;
ClickHouse (streaming MV из Kafka)
sql
CREATE TABLE events_kafka (..., ts DateTime, tenant_id String)
ENGINE = Kafka SETTINGS kafka_broker_list='...',
kafka_topic_list='events',
kafka_format='JSONEachRow';
CREATE MATERIALIZED VIEW mv_agg
ENGINE = AggregatingMergeTree()
PARTITION BY toDate(ts)
ORDER BY (tenant_id, toStartOfMinute(ts)) AS
SELECT tenant_id,
toStartOfMinute(ts) AS bucket,
sumState(amount) AS revenue_state
FROM events_kafka
GROUP BY tenant_id, bucket;
BigQuery MV (auto-update)
sql
CREATE MATERIALIZED VIEW dataset.mv_top_products
AS SELECT product_id, SUM(amount) AS revenue
FROM dataset.orders
WHERE _PARTITIONDATE BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) AND CURRENT_DATE()
GROUP BY product_id;
8) Freshness in interfaces/contracts
Return'X-Data-Freshness: <seconds> '/field'as _ of'.
For critical screens - "update button" and "updated N from back" badge.
In the API, specify the SLO of freshness (e.g. p95 ≤ 60 s).
9) Multi-tenant and regions
The'tenant _ id'key in MV is required.
Fairness: quotas for refresh/stream by tenant; sheduling large MVs at night per tenant.
Residency: MV lives in the same region as the primary data; cross-region - aggregates only.
10) Observability
Metrics:- `freshness_age_ms` (p50/p95/p99), `refresh_latency_ms`, `rows_processed/s`, `refresh_errors`.
- MV/lot size, storage overhead.
- For streaming: connector lag, "water" (watermark), share of late events.
- Теги: `mv_name`, `tenant_id`, `partition`, `refresh_id`, `delta_size`.
- Reports on "recounts" and files for reasons (schema mismatch, timeout).
11) Testing and chaos
Correctness: comparison of MV vs source on subsamples; checksums.
Freshness under load: write load + SLO freshness guarantee.
Schema evolution: adding/renaming fields, CDC connector drop.
Late/Out-of-order: event replays, changing watermarks.
Idempotency: redelivery of deltas/batches.
12) Retention and cost
Store only the windows you need (for example, 90 days); archive old parties.
Regular vacuumization/merge (by engine).
Reduce MV to specific APIs/pages, avoiding the "universal monster."
13) Safety and compliance
Inherit source access policies (RLS/ACL) - do not distribute MVs wider than source tables.
Mask the PII when building MVs, especially for analytics/logs.
Audit refresh/redrives.
14) Typical errors
"One huge MV for everything →" expensive refresh and weak isolation.
Lack of indexes/parties → p99 jumps, refresh strangles the cluster.
Full recalculation instead of deltas where you can incrementally.
Undeclared freshness in API/UX → user complaints about "outdated" data.
Ignoring schema evolution/CDC errors → loss of consistency.
Trying to replace MV with transactions: MV is about reads, not about strict write operations.
15) Quick recipes
Product dashboard: MV by minute/hour buckets, refresh on schedule + on-demand for VIP, p95 freshness ≤ 60 s.
Directory/search: incremental projection from CDC (upsert), indexes by filters, lag ≤ 5-15 s.
Financial reporting: batch MVs with atomic 'REFRESH CONCURRENTLY', checksums, "as_of" in responses.
Global SaaS: regional MVs, aggregation cross-regionally asynchronous.
16) Pre-sale checklist
- Fresh SLA (Δt/p95) defined and reflected in API/UX.
- Selected mode: batch refresh/incremental/streaming; sources (CDC/events) are described.
- MV is designed "by task," there are indexes and partitions, storage is limited to a window.
- The identity of the upsert/aggregates is confirmed by tests; late/out-of-order processing.
- Observability: freshness/lag metrics, alerts, tracking refresh.
- Playbooks: recalculation of the part, redraw after a connector failure, evolution of the scheme.
- Access and PII correspond to the source; audit enabled.
- Cost under control: retention, compression, refresh window time.
- Documentation: what is true in MV, what is a derived layer, business expectations.
Conclusion
Materialized representations are an engineering trade-off between reading speed and relevance. With clear SLA freshness, correct scheme, incremental update and normal telemetry, MV turn heavy requests into predictable milliseconds - without sacrificing reliability and cost control.