Optimization of analytical queries
1) Why optimize (iGaming context)
Business speed: GGR/NET reports, providers/games, RG/AML and marketing in p95 SLA.
Cost: less scanned bytes and shafl → below $/request.
Reliability: stable peak hours, no BI freezes.
Scale: Dozens of brands/markets, billions of lines, minutes of freshness.
2) Load profile and SLO
Describe the "first 90%" of requests: windows (7/28/90d), filters ('brand, country, provider, psp, status'), join's, JSON attributes, top K and percentiles.
SLO examples: p95 ≤ 1. 2 s for dashboard, scanned bytes ≤ 256 MB/request, freshness ≤ 5 min.
3) Anatomy of plans: what to look for
Predict/Projection pushdown - Filters and column list are omitted to the source.
Partition pruning & data skipping (min-max/bloom/manifest).
Vectorized scan/late materialization: column reads deferred by JOIN/PROJECT.
Join strategy: Broadcast Hash (BHJ), Sort-Merge (SMJ), Nested Loop (NLJ — избегать).
Spill & shuffle: The volume of shuffling and spilling onto disk is SLA's main enemy.
Adaptive query execution: change of strategy in runtime (BHJ↔SMJ switching, dynamic coales).
The plan should show: how many bytes we read, where is the shaflim, what we cache.
4) Parties, sorting, cluster cases
Parties: by 'date' + 1-2 access dimensions (for example, 'brand, country').
Sorting/clustering: 'ORDER BY/CLUSTER BY/Z-order' by frequent filters/joins ('provider, game_id, occurred_at').
Reclassification and compaction: regular transfer for data skipping; Target file size is 128-1024 MB.
5) JOIN patterns
Broadcast Hash Join (BHJ): small dimension (≤ hundreds of MB) → broadcast to fact.
sql
/ hint if engine supports/
SELECT /+ BROADCAST(dim_provider) /...
Sort-Merge Join (SMJ): large sets, compatible key sorting/cluster cases → minimal shaft.
Pre-join/denormalization: move stable attributes from 'dim _' to the actual snapshot (projection/materialized view) - minus JOIN on the critical path.
Anti/semijoins: rewrite 'NOT IN/EXISTS' into explicit semi-/anti-join plans.
Elimination of a cardinal explosion: check duplicate keys in dimensions, use surrogate-keys.
6) GROUP BY, aggregates and preaggregations
Rollup/Cube/Grouping Sets: one phase instead of multiple aggregations.
sql
SELECT brand, country, DATE(ts) d, SUM(amount)
FROM gold. payments
WHERE ts >= NOW() - INTERVAL '7 days'
GROUP BY GROUPING SETS ((brand,country,d),(brand,d),(d));
Materialized views (MV )/projections: 'payments _ 7d _ by _ brand _ psp', 'rounds _ 1d _ by _ provider _ game'.
Partial → Final aggregation: allow the engine to aggregate partially on workers (local) and finally on the coordinator.
Approximate: HLL for 'COUNT (DISTINCT user)', TDigest for percentiles - multiple cheaper and enough for BI.
7) Window functions (neat)
PARTITION BY exactly on keys with high selectivity; ORDER BY - by column sorting.
Replace heavy windows with preaggregates and semi-joins where possible.
sql
-- Instead of window distinct
SELECT brand, COUNT() users
FROM (SELECT DISTINCT brand, user_id FROM gold. sessions WHERE d>=CURRENT_DATE-7) t
GROUP BY brand;
8) Filters, pagination and TOP-K
Filter order is not important for CBO, but selectivity and indexes/sorting are.
LIMIT … WITH TIES/APPROX TOP-K - shorten the scan.
Pagination: 'keyset pagination' instead of 'OFFSET/LIMIT' for large tables.
sql
-- keyset
SELECT FROM t WHERE (date, id) > (:last_date,:last_id) ORDER BY date, id LIMIT 1000;
9) JSON/Semi-Structured
Materialize hot paths into columns ('device. os`, `psp. method`).
Use inverted/GIN indexes on JSON paths if the engine supports.
Avoid UDF by line: better projection with highlighted attributes.
10) Approx and Sampling
HLL/Theta Sketch: Cheap 'COUNT DISTINCT'.
TDigest/KLL: percentiles p95/p99 without full sort.
Reservoir/stratified sampling: interactive research and previews.
11) Memory, Strait and Concarrency
Spill-guard: memory limits on join/agg; when spilling - reduce batch/parallelism, increase sorting by key.
Concurrency & QoS: pools for "hot" dashboards and heavy hell-hoc; scan/time limits; kill-switch to "forgotten" requests.
Result cache/query cache: enable for repeatable BI templates, disable by freshness token.
12) Regression tests and "double run"
Store reference profiles (plan/scan bytes/time) for top N queries.
Before releasing indexes/clusters - A/B run: compare p95, scanned bytes, skipped share, shuffle.
Create "fail-fast" thresholds: if p95 rose> X% - rollback.
13) Observability and SLO
SLI:- p50/p95/p99 latency, scanned bytes/query, skipped bytes %, files touched;
- shuffle bytes, spilled bytes, peak memory;
- cache hit-rate; accuracy approach-aggregates.
Alerts: rise in scanned bytes, fall in skipped share, frequent NLJs, spillage> thresholds.
14) iGaming cases (recipes)
14. 1 Payments/PSPs: "waiver peaks"
WHERE: `ts BETWEEN now()-7d AND now()`, `brand,country,psp,status`.
Party: day; ORDER/Z-order: `(brand,country,ts)`; bitmap: `psp,status`; bloom: `transaction_id`.
MV: `payments_7d_by_brand_psp(status)`.
Result: p95 → ~ 1s, scanned bytes ↓ 5-10 ×, zero strait.
14. 2 Game Rounds: Top K Games/Hour
ORDER BY / cluster по `(provider, game_id, occurred_at)`; projection for preaggregates.
Approx Top-K + TDigest for p95 round duration.
Bottom line: sub-second graphs on the hot cache.
14. 3 RG/AML Active Limits
JSON'reason '→ column; bitmap `rg_state`, `kyc_level`; semi-join with last state.
Result: report "for 30 days" - seconds, without full scan.
15) Optimization checklist (daily)
1. Collection of top N requests and their profiles (plan/bytes/shafl).
2. Batches by date + agreed sorting/cluster cases.
3. Checking pushdown and projection pruning (only the required columns).
4. JOIN strategy: broadcast small, sort for SMJ, no NLJ.
5. Pre-aggregation/MV for hot dashboards.
6. Approx where valid (distinct/percentiles/top-k).
7. JSON → columns and/or inverted indices.
8. Compaction/reclassification; skipped bytes target ≥ 70%.
9. Results cache and separate concarrency pools.
10. Monitoring: p95, scanned bytes, shuffle, spill, hit-rate.
16) Templates (ready to use)
16. 1 Optimization Policy (YAML)
yaml workload: bi_hot slo:
p95_latency_ms: 1200 scanned_bytes_max_mb: 256 skipped_bytes_share_min: 0. 70 storage:
partition_by: ["date"]
cluster_by: ["brand","country","occurred_at"]
indexes:
bloom: ["transaction_id","user_surrogate_id"]
bitmap: ["psp","status","rg_state"]
aggregation:
mv:
- name: mv_payments_7d_brand_psp window: "7d"
group_by: ["brand","psp","status"]
approx:
count_distinct: "hll"
percentile: "tdigest"
concurrency:
pools: {bi_hot: 50, adhoc: 10}
timeout_s: 120
16. 2 Query regression test (pseudo-SQL)
sql
-- baseline: p95<=1200ms, scanned_bytes<=256MB
EXPLAIN ANALYZE
SELECT brand, psp, status, COUNT() cnt, SUM(amount) amt
FROM gold. payments
WHERE ts >= NOW() - INTERVAL '7 days'
AND brand =:brand AND country =:country
GROUP BY brand, psp, status;
16. 3 DISTINCT rewriting
sql
-- Bad: Heavy COUNT (DISTINCT user_id)
SELECT COUNT(DISTINCT user_id) FROM gold. sessions WHERE d>=CURRENT_DATE-7;
-- Better: HLL sketch/preaggregate
SELECT hll_union(user_hll) FROM agg. sessions_7d_user_hll WHERE d>=CURRENT_DATE-7;
16. 4 Keyset pagination
sql
SELECT
FROM gold. game_rounds
WHERE (occurred_at, round_id) > (:ts,:rid)
AND brand=:brand AND country=:country
ORDER BY occurred_at, round_id
LIMIT 1000;
17) Anti-patterns
'SELECT'in prod; lack of projection pruning.
OFFSET pagination on millions of lines.
COUNT DISTINCT without sketches; percentiles through full sort.
NLJ on large sets; join by JSON expressions.
Small batches and scattered files (metadata storm).
UDF strings in WHERE instead of materializing columns.
Ignore statistics/ANALYZE - blind optimizer and full scan.
No regression tests and no rollback thresholds.
18) Implementation Roadmap
0-30 days (MVP)
1. Measurement of top N requests and installation of SLO/SLI.
2. Batches by date + sorting/cluster cases; enable data skipping/bloom.
3. One MV per hot payment report; HLL/TDigest в BI.
4. Split query pools, enable result cache.
30-90 days
1. Heavy windows census/JSON → preaggregation/columns.
2. Broadcast-join small dimensions; SMJ for large; elimination of NLJ.
3. Schedule compaction and reclassification; Key Advisor.
4. Observability and alerts of degradation, A/B plans, auto-rollback.
3-6 months
1. Projection/MV catalog with versioning and SLA.
2. Approx kernel for distinct/percentile/top-k on all dashboards.
3. Uniform templates for regression tests and budgets $/request.
4. JSON and UDF permanent hygiene: materialization and indices.
19) RACI
Data Platform (R): partitions/clustering/compression, MV/projections, caches, monitoring.
Analytics/BI (R): SQL rewriting, approx aggregates, regression tests.
Domain Owners (C): requirements for sections and accuracy.
Security/DPO (A/R): privacy/PII, k-anonymity of aggregates.
SRE/Observability (C): SLO/alerting, concarrency and capacity.
Finance (C): budgets for $/request and economic effect.
20) Related Sections
Analytical Storage Indexing, Data Schemas and Evolution, Data Validation, DataOps Practices, Data Clustering, Dimension Reduction, Analytics and Metrics API, MLOps: Model Exploitation.
Total
Query optimization is not a "magic hint," but a system: competent data markup (partitions/clusters), preaggregation and approximate algorithms, correct JOIN strategies, cache/concarrency and constant monitoring of p95 and scanned bytes. For iGaming, this means fast and stable metrics for payments, games and compliance - within SLA and budget.