GH GambleHub

Materialized Views

A Materialized View (MV) is a physically stored query result (aggregation/projection) that is periodically or continuously updated and available for quick reads. In fact, this is "pre-calculated" data with controlled freshness and cost of reading.

The main goals are:
  • Stabilize reading latency (p95/p99).
  • Unload hot OLTP tables.
  • Give a predictable SLA for analytics, APIs and features (recommendations, counters, directories).

1) When to use MV (and when not)

Fit:
  • Frequently repeated heavy requests (join/agg/window) with a valid update delay.
  • CQRS/product projections: dashboards, catalogs, ranked lists, counters.
  • Multi-region reads: "local" copies of totals.
Not suitable:
  • Ultra-strict relevance "per record" without compensation logic → better indexes/OLTP + cache/streaming.
  • Complex transactional invariants do not replace transactions when writing MV →.

2) MV vs cache vs projection

Cache: "response copy," managed by TTL/disability at the application level; no schema.
MV: "copy of data," managed by DBMS/engine; there is a scheme, indexes, transactionality refresh.
Projection (event sourcing/CQRS): computed from events; often implemented as a table + incremental updates (that is, essentially "manual MV").


3) Update methods

3. 1 Batch REFRESH (periodic)

Scheduler (cron/skeduler): 'REFRESH MATERIALIZED VIEW...'.
Pros: Simple, predictable, cheap. Cons: stale windows.

3. 2 Incremental refresh

Deltas by keys/time window, upserts in MV.
Source of changes: CDC (Debezium, logical replication), streaming (Kafka/Flink/Spark), triggers.
Pros: low latency and cost. Cons: more complex code and consistency.

3. 3 Streaming MV

In column/streaming ICEs: materialized streams/tables (ClickHouse/Kafka, Flink SQL, Materialize, BigQuery MV).
Pros: Seconds and below. Cons: requires stream infra and clear keys/watermarks.


4) Consistency and "freshness"

Strong consistency of MV occurs with the "atomic" refresh (read-switch to the new version).
More often - bounded staleness: "not older than Δ t/window." Communicate this in API/UX contracts.
For payments/strict invariants, keep the CP core in OLTP and use the MV as a read-plane.


5) Modeling and layout

Make MV narrow in purpose: one task - one MV.
Store temporary keys (event_time/watermark) and business keys (tenant_id entity_id).
Indexes for frequent filters/sorting; column DBMS - for aggregates/scans.
Participation by date/tenant/region for quick refresh and retention.


6) Incremental updates: upsert-projection pattern

1. Change arrives (CDC/event).
2. Consider the delta for the MV string (recompute/merge).
3. 'UPSERT' by key ('tenant _ id, entity_id, bucket').
4. Updating freshness metadata.

Idempotence is mandatory: delta repetition should not break the bottom line.


7) Examples (conceptually)

PostgreSQL (batch refresh)

sql
CREATE MATERIALIZED VIEW mv_sales AS
SELECT date_trunc('day', created_at) AS day,
tenant_id,
SUM(amount) AS revenue,
COUNT()  AS orders
FROM orders
GROUP BY 1,2;

-- Быстрые чтения
CREATE INDEX ON mv_sales (tenant_id, day);

-- Без блокировок чтения при обновлении
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_sales;

ClickHouse (streaming MV из Kafka)

sql
CREATE TABLE events_kafka (..., ts DateTime, tenant_id String)
ENGINE = Kafka SETTINGS kafka_broker_list='...',
kafka_topic_list='events',
kafka_format='JSONEachRow';

CREATE MATERIALIZED VIEW mv_agg
ENGINE = AggregatingMergeTree()
PARTITION BY toDate(ts)
ORDER BY (tenant_id, toStartOfMinute(ts)) AS
SELECT tenant_id,
toStartOfMinute(ts) AS bucket,
sumState(amount) AS revenue_state
FROM events_kafka
GROUP BY tenant_id, bucket;

BigQuery MV (auto-update)

sql
CREATE MATERIALIZED VIEW dataset.mv_top_products
AS SELECT product_id, SUM(amount) AS revenue
FROM dataset.orders
WHERE _PARTITIONDATE BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) AND CURRENT_DATE()
GROUP BY product_id;

8) Freshness in interfaces/contracts

Return'X-Data-Freshness: <seconds> '/field'as _ of'.
For critical screens - "update button" and "updated N from back" badge.
In the API, specify the SLO of freshness (e.g. p95 ≤ 60 s).


9) Multi-tenant and regions

The'tenant _ id'key in MV is required.
Fairness: quotas for refresh/stream by tenant; sheduling large MVs at night per tenant.
Residency: MV lives in the same region as the primary data; cross-region - aggregates only.


10) Observability

Metrics:
  • `freshness_age_ms` (p50/p95/p99), `refresh_latency_ms`, `rows_processed/s`, `refresh_errors`.
  • MV/lot size, storage overhead.
  • For streaming: connector lag, "water" (watermark), share of late events.
Logs/tracing:
  • Теги: `mv_name`, `tenant_id`, `partition`, `refresh_id`, `delta_size`.
  • Reports on "recounts" and files for reasons (schema mismatch, timeout).

11) Testing and chaos

Correctness: comparison of MV vs source on subsamples; checksums.
Freshness under load: write load + SLO freshness guarantee.
Schema evolution: adding/renaming fields, CDC connector drop.
Late/Out-of-order: event replays, changing watermarks.
Idempotency: redelivery of deltas/batches.


12) Retention and cost

Store only the windows you need (for example, 90 days); archive old parties.
Regular vacuumization/merge (by engine).

Reduce MV to specific APIs/pages, avoiding the "universal monster."


13) Safety and compliance

Inherit source access policies (RLS/ACL) - do not distribute MVs wider than source tables.
Mask the PII when building MVs, especially for analytics/logs.
Audit refresh/redrives.


14) Typical errors

"One huge MV for everything →" expensive refresh and weak isolation.
Lack of indexes/parties → p99 jumps, refresh strangles the cluster.
Full recalculation instead of deltas where you can incrementally.
Undeclared freshness in API/UX → user complaints about "outdated" data.
Ignoring schema evolution/CDC errors → loss of consistency.
Trying to replace MV with transactions: MV is about reads, not about strict write operations.


15) Quick recipes

Product dashboard: MV by minute/hour buckets, refresh on schedule + on-demand for VIP, p95 freshness ≤ 60 s.
Directory/search: incremental projection from CDC (upsert), indexes by filters, lag ≤ 5-15 s.
Financial reporting: batch MVs with atomic 'REFRESH CONCURRENTLY', checksums, "as_of" in responses.
Global SaaS: regional MVs, aggregation cross-regionally asynchronous.


16) Pre-sale checklist

  • Fresh SLA (Δt/p95) defined and reflected in API/UX.
  • Selected mode: batch refresh/incremental/streaming; sources (CDC/events) are described.
  • MV is designed "by task," there are indexes and partitions, storage is limited to a window.
  • The identity of the upsert/aggregates is confirmed by tests; late/out-of-order processing.
  • Observability: freshness/lag metrics, alerts, tracking refresh.
  • Playbooks: recalculation of the part, redraw after a connector failure, evolution of the scheme.
  • Access and PII correspond to the source; audit enabled.
  • Cost under control: retention, compression, refresh window time.
  • Documentation: what is true in MV, what is a derived layer, business expectations.

Conclusion

Materialized representations are an engineering trade-off between reading speed and relevance. With clear SLA freshness, correct scheme, incremental update and normal telemetry, MV turn heavy requests into predictable milliseconds - without sacrificing reliability and cost control.

Contact

Get in Touch

Reach out with any questions or support needs.We are always ready to help!

Start Integration

Email is required. Telegram or WhatsApp — optional.

Your Name optional
Email optional
Subject optional
Message optional
Telegram optional
@
If you include Telegram — we will reply there as well, in addition to Email.
WhatsApp optional
Format: +country code and number (e.g., +380XXXXXXXXX).

By clicking this button, you agree to data processing.