GH GambleHub

Materialized Views

A Materialized View (MV) is a physically stored query result (aggregation/projection) that is periodically or continuously updated and available for quick reads. In fact, this is "pre-calculated" data with controlled freshness and cost of reading.

The main goals are:
  • Stabilize reading latency (p95/p99).
  • Unload hot OLTP tables.
  • Give a predictable SLA for analytics, APIs and features (recommendations, counters, directories).

1) When to use MV (and when not)

Fit:
  • Frequently repeated heavy requests (join/agg/window) with a valid update delay.
  • CQRS/product projections: dashboards, catalogs, ranked lists, counters.
  • Multi-region reads: "local" copies of totals.
Not suitable:
  • Ultra-strict relevance "per record" without compensation logic → better indexes/OLTP + cache/streaming.
  • Complex transactional invariants do not replace transactions when writing MV →.

2) MV vs cache vs projection

Cache: "response copy," managed by TTL/disability at the application level; no schema.
MV: "copy of data," managed by DBMS/engine; there is a scheme, indexes, transactionality refresh.
Projection (event sourcing/CQRS): computed from events; often implemented as a table + incremental updates (that is, essentially "manual MV").

3) Update methods

3. 1 Batch REFRESH (periodic)

Scheduler (cron/skeduler): 'REFRESH MATERIALIZED VIEW...'.
Pros: Simple, predictable, cheap. Cons: stale windows.

3. 2 Incremental refresh

Deltas by keys/time window, upserts in MV.
Source of changes: CDC (Debezium, logical replication), streaming (Kafka/Flink/Spark), triggers.
Pros: low latency and cost. Cons: more complex code and consistency.

3. 3 Streaming MV

In column/streaming ICEs: materialized streams/tables (ClickHouse/Kafka, Flink SQL, Materialize, BigQuery MV).
Pros: Seconds and below. Cons: requires stream infra and clear keys/watermarks.

4) Consistency and "freshness"

Strong consistency of MV occurs with the "atomic" refresh (read-switch to the new version).
More often - bounded staleness: "not older than Δ t/window." Communicate this in API/UX contracts.
For payments/strict invariants, keep the CP core in OLTP and use the MV as a read-plane.

5) Modeling and layout

Make MV narrow in purpose: one task - one MV.
Store temporary keys (event_time/watermark) and business keys (tenant_id entity_id).
Indexes for frequent filters/sorting; column DBMS - for aggregates/scans.
Participation by date/tenant/region for quick refresh and retention.

6) Incremental updates: upsert-projection pattern

1. Change arrives (CDC/event).
2. Consider the delta for the MV string (recompute/merge).
3. 'UPSERT' by key ('tenant _ id, entity_id, bucket').
4. Updating freshness metadata.

Idempotence is mandatory: delta repetition should not break the bottom line.

7) Examples (conceptually)

PostgreSQL (batch refresh)

sql
CREATE MATERIALIZED VIEW mv_sales AS
SELECT date_trunc('day', created_at) AS day,
tenant_id,
SUM(amount) AS revenue,
COUNT()  AS orders
FROM orders
GROUP BY 1,2;

-- Quick Reads
CREATE INDEX ON mv_sales (tenant_id, day);

-- No read locks on upgrade
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_sales;

ClickHouse (streaming MV из Kafka)

sql
CREATE TABLE events_kafka (..., ts DateTime, tenant_id String)
ENGINE = Kafka SETTINGS kafka_broker_list='...',
kafka_topic_list='events',
kafka_format='JSONEachRow';

CREATE MATERIALIZED VIEW mv_agg
ENGINE = AggregatingMergeTree()
PARTITION BY toDate(ts)
ORDER BY (tenant_id, toStartOfMinute(ts)) AS
SELECT tenant_id,
toStartOfMinute(ts) AS bucket,
sumState(amount) AS revenue_state
FROM events_kafka
GROUP BY tenant_id, bucket;

BigQuery MV (auto-update)

sql
CREATE MATERIALIZED VIEW dataset. mv_top_products
AS SELECT product_id, SUM(amount) AS revenue
FROM dataset. orders
WHERE _PARTITIONDATE BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) AND CURRENT_DATE()
GROUP BY product_id;

8) Freshness in interfaces/contracts

Return'X-Data-Freshness: <seconds> '/field'as _ of'.
For critical screens - "update button" and "updated N from back" badge.
In the API, specify the SLO of freshness (e.g. p95 ≤ 60 s).

9) Multi-tenant and regions

The'tenant _ id'key in MV is required.
Fairness: quotas for refresh/stream by tenant; sheduling large MVs at night per tenant.
Residency: MV lives in the same region as the primary data; cross-region - aggregates only.

10) Observability

Metrics:
  • `freshness_age_ms` (p50/p95/p99), `refresh_latency_ms`, `rows_processed/s`, `refresh_errors`.
  • MV/lot size, storage overhead.
  • For streaming: connector lag, "water" (watermark), share of late events.
Logs/tracing:
  • Теги: `mv_name`, `tenant_id`, `partition`, `refresh_id`, `delta_size`.
  • Reports on "recounts" and files for reasons (schema mismatch, timeout).

11) Testing and chaos

Correctness: comparison of MV vs source on subsamples; checksums.
Freshness under load: write load + SLO freshness guarantee.
Schema evolution: adding/renaming fields, CDC connector drop.
Late/Out-of-order: event replays, changing watermarks.
Idempotency: redelivery of deltas/batches.

12) Retention and cost

Store only the windows you need (for example, 90 days); archive old parties.
Regular vacuumization/merge (by engine).

Reduce MV to specific APIs/pages, avoiding the "universal monster."

13) Safety and compliance

Inherit source access policies (RLS/ACL) - do not distribute MVs wider than source tables.
Mask the PII when building MVs, especially for analytics/logs.
Audit refresh/redrives.

14) Typical errors

"One huge MV for everything →" expensive refresh and weak isolation.
Lack of indexes/parties → p99 jumps, refresh strangles the cluster.
Full recalculation instead of deltas where you can incrementally.
Undeclared freshness in API/UX → user complaints about "outdated" data.
Ignoring schema evolution/CDC errors → loss of consistency.
Trying to replace MV with transactions: MV is about reads, not about strict write operations.

15) Quick recipes

Product dashboard: MV by minute/hour buckets, refresh on schedule + on-demand for VIP, p95 freshness ≤ 60 s.
Directory/search: incremental projection from CDC (upsert), indexes by filters, lag ≤ 5-15 s.
Financial reporting: batch MVs with atomic 'REFRESH CONCURRENTLY', checksums, "as_of" in responses.
Global SaaS: regional MVs, aggregation cross-regionally asynchronous.

16) Pre-sale checklist

  • Fresh SLA (Δt/p95) defined and reflected in API/UX.
  • Selected mode: batch refresh/incremental/streaming; sources (CDC/events) are described.
  • MV is designed "by task," there are indexes and partitions, storage is limited to a window.
  • The identity of the upsert/aggregates is confirmed by tests; late/out-of-order processing.
  • Observability: freshness/lag metrics, alerts, tracking refresh.
  • Playbooks: recalculation of the part, redraw after a connector failure, evolution of the scheme.
  • Access and PII correspond to the source; audit enabled.
  • Cost under control: retention, compression, refresh window time.
  • Documentation: what is true in MV, what is a derived layer, business expectations.

Conclusion

Materialized representations are an engineering trade-off between reading speed and relevance. With clear SLA freshness, correct scheme, incremental update and normal telemetry, MV turn heavy requests into predictable milliseconds - without sacrificing reliability and cost control.

Contact

Get in Touch

Reach out with any questions or support needs.We are always ready to help!

Telegram
@Gamble_GC
Start Integration

Email is required. Telegram or WhatsApp — optional.

Your Name optional
Email optional
Subject optional
Message optional
Telegram optional
@
If you include Telegram — we will reply there as well, in addition to Email.
WhatsApp optional
Format: +country code and number (e.g., +380XXXXXXXXX).

By clicking this button, you agree to data processing.