Indexing and Query Optimization

1) Indexing and optimization goals

Latency: P50/P95/P99 reduction.
Throughput: QPS growth without scale-out.
Predictability: stable plans and no "jumps" in response time.
Savings: Less IO/CPU, less cloud bill.
Reliability: reducing locks and deadlocks due to correct access.

Invariants:

Any optimization must maintain correctness and consistency.
Track the effect in metrics and plan logs.

2) Basic index structures and when to apply them

2. 1 B-Tree (default)

Equals/ranges, sort, 'ORDER BY'.
Good for most time/ID/status filters.

2. 2 Hash

Pure equalities ('='), cheaper in memory but out of order (PG: constraints removed but still niche choice).

2. 3 GIN / GiST (PostgreSQL)

GIN: arrays/JSONB keys, full text (tsvector), containment ('@>').
GiST: geo, ranges, kNN.

2. 4 BRIN (PostgreSQL)

Super-cheap index by "naturally sorted" tables (append-only by time). Good for time-series with large tables.

2. 5 Bitmap (MySQL/InnoDB: none native; DW-DBMS/OLAP)

Effective for low cardinality and facets, more often in column storage.

2. 6 Column indices (ClickHouse)

Primary key + data skipping (minmax), secondary через `skip indexes` (bloom, set).
OLAP queries with aggregations and ranges.

2. 7 Inverted indexes (Elasticsearch/OpenSearch)

Full text, facets, hybrid search. For precise filters, use keyword fields and doc values.

2. 8 MongoDB

Single, compound, multikey (arrays), partial, TTL, text, hashed (for uniform key sharding).

3) Key and composite index design

3. 1 Left prefix rule

The order of the fields in the index determines usability.
Query'WHERE tenant_id =? AND created_at >=? ORDER BY created_at DESC` → индекс `(tenant_id, created_at DESC, id DESC)`.

3. 2 Tie-breaker

Add a unique tail (usually 'id') for stable sorting and seek pagination.

3. 3 Partial/filtered indices

Index only "hot" subsets:

sql
CREATE INDEX idx_orders_paid_recent
ON orders (created_at DESC, id DESC)
WHERE status = 'paid' AND created_at > now() - interval '90 days';

3. 4 Covering indices

Include "readable" fields in the index (MySQL: 'INCLUDE' none; PG 11+: `INCLUDE`):

sql
CREATE INDEX idx_user_lastseen_inc ON users (tenant_id, last_seen DESC) INCLUDE (email, plan);

3. 5 Functional/calculated

Normalize keys in index:

sql
CREATE INDEX idx_norm_email ON users (lower(email));

4) Partitioning and sharding

4. 1 Partitioning (PG native/table inheritance; MySQL RANGE/LIST)

Rotation of parties by time ('daily/weekly') simplifies' VACUUM/DELETE '.
Indexes are local partitions → smaller than B-Tree, faster plan.

sql
CREATE TABLE events (
tenant_id bigint,
ts timestamptz,
...
) PARTITION BY RANGE (ts);

4. 2 Partitioning key

In OLTP - by'tenant _ id '(load localization).
In time-series/OLAP - by 'ts' (range queries).
Hybrid: '(tenant_id, ts)' + sub-parties.

4. 3 Sharding

Consistent hashing/range-shard by 'tenant _ id' or by time.
Cross-shard query → scatter-gather and k-way merge; hold the per-shard cursor.

5) Statistics, cardinality and plans

5. 1 Up-to-date statistics

Enable auto-analysis ('autovacuum/autoanalyze'), increase 'default _ statistics _ target' for dirty distributions.

5. 2 Advanced Statistics (PG)

Correlated columns:

sql
CREATE STATISTICS stat_user_country_city (dependencies) ON country, city FROM users;
ANALYZE users;

5. 3 Execution plan

See 'EXPLAIN (ANALYZE, BUFFERS, VERBOSE)'; key fields:

`Rows`, `Loops`, `Actual time`, `Shared Read/Hit`, `Recheck Cond`.
Типы join: Nested Loop, Hash Join, Merge Join.
Seq Scan vs Index Scan/Only Scan/Bitmap Heap Scan.

5. 4 Stability of plans

Parameterization (prepared statements) can "stick" on a bad plan. Use plan cache guardrails (PG: 'plan _ cache _ mode = force_custom_plan' for problem queries) or "forwarding" constants.

6) Optimization of joins and sorts

6. 1 Strategies

Nested Loop: small external, fast index on internal.
Hash Join: large sets, enough memory for hash table.
Merge Join: sorted entries, advantageous in the order already available.

6. 2 Indexes under join

For'A JOIN B ON B.a_id = A.id ', → the index to'B (a_id)'.
For the filter after join - the index on the columns of the filter of the internal table.

6. 3 Triage

Avoid'ORDER BY'without a corresponding index; sorting on large sets is expensive by memory/disk.

7) Query rewrite

Get rid of the "snowflakes" of subqueries; expand in JOIN.
Use CTE-inline (PG ≥12 CTE default inlines, but'MATERIALIZED 'can commit an intermediate result if needed).
Remove'SELECT '→ list the fields (IO/network savings).
Transfer calculations from'WHERE 'to the indexed form (pre-computed columns).
Aggregations: preliminary summary tables/materialized views with incremental update.

8) Butching, limiting and pagination

Batch-insert/update: 500-5000 batches instead of one by one.
Seek pagination by '(sort_key, id)' instead of deep 'OFFSET'.
Limit the dialing before sorting/joyne (push-down 'LIMIT').

9) Caching and denormalization

Application-level query-cache (key = SQL + bind-vars + rights version).
Materialized views for heavy aggregates; rotation/reference plan.
Denormalization - Store frequently read calculated fields (price including discount) but with trigger/background task for consistency.
Redis as L2 for hot keys (with TTL and event disability).

10) The specifics of popular engines

10. 1 PostgreSQL

Индексы: B-Tree, Hash, GIN/GiST, BRIN, partial, functional, INCLUDE.

Example:

sql
CREATE INDEX idx_orders_tenant_created_desc
ON orders (tenant_id, created_at DESC, id DESC)
INCLUDE (amount, status);

Full Text:

sql
CREATE INDEX idx_docs_fts ON docs USING GIN (to_tsvector('russian', title          ' '          body));

10. 2 MySQL/InnoDB

Composite, spanning indexes (by including fields in the key), invisible indexes for tests:

sql
ALTER TABLE orders ALTER INDEX idx_old INVISIBLE; -- check risk-free plans

Histogram statistics ('ANALYZE TABLE... UPDATE HISTOGRAM` в 8. 0).

10. 3 ClickHouse

Primary key = sort; 'ORDER BY (tenant_id, ts, id)'.

Skip indexes:

sql
CREATE TABLE events (
tenant_id UInt64,
ts DateTime64,
id UInt64,
payload String,
INDEX idx_bloom_payload payload TYPE bloom_filter GRANULARITY 4
) ENGINE = MergeTree()
ORDER BY (tenant_id, ts, id);

10. 4 MongoDB

Composite/cartoons: order is important, filter and sort must match index:

js db. orders. createIndex({ tenant_id: 1, created_at: -1, _id: -1 });
db. orders. createIndex({ status: 1 }, { partialFilterExpression: { archived: { $ne: true } } });

Use 'hint ()' for diagnostics, watch for 'covered query'.

10. 5 Elasticsearch/OpenSearch

Keyword vs text fields; doc_values for sorting/aggregates.
Heap segmentation: aggregations - heavy; restrict'size 'and use'composite' aggregations (paging).
Do not include analyzers where an accurate comparison is required.

11) Competitiveness, interlocks and MVCC

Short transactions; avoid "long" reads under 'REPEATABLE READ' needlessly.
Index operations also take locks (write throughput reduction).
Plan online indexing: 'CREATE INDEX CONCRETELY' (PG), 'ALGORITHM = INPLACE '/' ONLINE' (MySQL).
Inserts in the tail for an hour/ID → "hot pages" of the index; distribute the key (UUIDv7/salt).

12) Observability and SLO

Metrics:

'db _ query _ latency _ ms' (P50/P95/P99) by query name.
`rows_examined`, `rows_returned`, `buffer_hit_ratio`.
`deadlocks`, `lock_wait_ms`, `temp_sort_disk_usage`.
Share of plans with'Seq Scan'where'Index Scan' was expected.
Regression alerts when changing the version/parameters of the DBMS.

Logs/tracing:

Enable the slow query log with a threshold (for example, 200 ms).
Correlation of queries with spans (trace_id).
Remove problem query plans and save to object storage for retrospective.

SLO example:

Read P95 '<= 150 ms' with'LIMIT <= 50'and hot tenant.
P95 records' <= 200 ms' with batches up to 1000 lines.

13) Safety and multi-tenancy

Indexes on access control fields ('tenant _ id', 'owner _ id') are required.
Policies (RLS/ABAC) must be pre-filter; otherwise, the optimizer plans incorrectly.
Do not index sensitive fields in clear text; use hashes/tokens.

14) Anti-patterns

Deep 'OFFSET' without seek-cursor alternative.
"One index for all" - memory overload and write-path.
'SELECT'in critical paths.
Functions above column in'WHERE 'without function index.
Unstable plans due to old statistics.
Missing'ORDER BY'while waiting for stable order.
Indexes for the sake of indexes: ROI <0 due to expensive write/support.

15) Implementation checklist

1. Top N requests by QPS and time → select 3-5 candidates.
2. Remove plans' EXPLAIN ANALYZE ', check cardinality vs actual.
3. Design indexes: field order, INCLUDE/partial/functional.
4. Implement partitioning for large tables (temporary/tenant keys).
5. Overwrite queries: remove'SELECT ', inline simple CTEs, restrict set.
6. Enable butching and seek pagination.
7. Configure cache: L1/L2, disability by events.
8. Introduce monitoring of plans and slow-log, alerts for regressions.
9. Perform load tests with real data distribution.
10. Update development guidelines (ORM hints, indexing, limits).

16) Before/after examples

Before:

sql
SELECT FROM orders
WHERE status = 'paid'
ORDER BY created_at DESC
LIMIT 50 OFFSET 5000;

After:

sql
-- Индекс: (status, created_at DESC, id DESC) INCLUDE (amount, currency)
SELECT id, amount, currency, created_at
FROM orders
WHERE status = 'paid'
AND (created_at, id) < (:last_ts,:last_id)   -- seek
ORDER BY created_at DESC, id DESC
LIMIT 50;

17) ORM and API protocols

Avoid N + 1: greedy samples ('includes', 'JOIN FETCH', 'preload').
Explicit field projections, paginate by cursor.
gRPC/REST: limit 'page _ size', fix 'sort', use opaque tokens.
Plan cache: use parameterization; do not generate "unique" SQL per call.

18) Migrations and Operations

Add indexes online and mark as INVISIBLE/CONCURRENT, test plans, then switch.
Index revisions - regular sanitary cleaning: duplicates, unused, "dead" for old features.
Party rotation plan (drop old) and 'VACUUM/OPTIMIZE' schedule.

19) Summary

Query optimization is systems engineering: correct keys and indexes, neat plans, thoughtful partitioning and sharding, discipline in queries and ORM, caching and observability. By following the described patterns, you will get a fast, predictable and economical system that is resistant to data growth and load.

Indexing and Query Optimization

Get in Touch

Quick Contact

The video will be updated soon

We are currently very busy with projects