Hot/Warm/Cold Vaults
1) Why divide data by Hot/Warm/Cold
Different access patterns coexist in the same cluster: interactive requests for fresh data, analytics for recent periods, and rare access to the archive. Layering allows you to:- Optimize cost: Fast and expensive layer for hot workset only.
- Comply with SLO: p95/throughput for online, longer deadlines for history.
- Simplify scaling: horizontally build up cheap layers without overheating the "front."
- Mitigate risks: Different failure/replication domains, independent protection policies.
- Hot - the most recent, frequent reading/writing, minimal latency.
- Warm - changes less often, a lot of reading over time ranges.
- Cold - archive, cheap storage, high TTFB, slow recovery.
2) Profiles and SLOs by level
Hot
Access: milliseconds (p95 ≤ 5-20 ms on KV/indexes; ≤ 100-300 ms on complex queries).
Operations: frequent upsert/append, indexing, OLTP/stream-ingest.
Media: NVMe/SSD, memory, fast network.
Replication: increased (e.g. RF = 3) for RPO≈0, RTO minutes.
Warm
Access: Tens to hundreds of milliseconds/second.
Operations: reading "window," butches, OLAP on fresh history (7-90 days).
Media: SATA SSD/fast HDD/object storage with local cache.
Replication: moderate (RF = 2), compression enabled.
Cold
Access: seconds-hours; frequent offline access, "retrieve-and-scan."
Operations: rare readings, compliance with regulation (retention by years).
Media: object/archive (S3 Glacier/Deep Archive, Azure Archive, GCS Coldline).
Replication: Regional/Interregional, WORM/Legal Hold.
3) Typical techniques by layer
Hot: PostgreSQL (OLTP, partitions), MySQL/InnoDB, Redis/Memcached (кэш), Elasticsearch/Opensearch hot-nodes, ClickHouse горячие партиции, Kafka local log.
Warm: ClickHouse column storage, BigQuery/Snowflake recent parties, Elasticsearch warm-nodes, S3 + Presto/Trino with cache, Tiered storage (Kafka/Pulsar).
Cold: S3/Glacier, GCS Nearline/Coldline/Archive, Azure Cool/Archive, HDFS archive, long-term backups.
4) Lifecycle Policies (ILM) and Automation
4. 1 Concepts
Time partitioning (day/week/month) is the main translation lever between layers.
ILM rules: rollover (by volume/age), shrink/merge, freeze, delete.
Deduplication and compression: enable on warm/cold, avoiding CPU bottlenecks on hot.
4. 2 Examples
Elasticsearch ILM (hot→warm→cold→delete)
json
{
"policy": {
"phases": {
"hot": { "actions": { "rollover": { "max_age": "7d", "max_size": "50gb" } } },
"warm": { "min_age": "7d", "actions": { "allocate": { "require": { "box_type": "warm" } }, "forcemerge": { "max_num_segments": 1 } } },
"cold": { "min_age": "30d", "actions": { "allocate": { "require": { "box_type": "cold" } }, "freeze": {} } },
"delete":{ "min_age": "365d", "actions": { "delete": {} } }
}
}
}
S3 Lifecycle (Standard→Infrequent→Glacier→Expire)
json
{
"Rules": [{
"ID": "logs-lifecycle",
"Filter": { "Prefix": "logs/" },
"Status": "Enabled",
"Transitions": [
{ "Days": 7, "StorageClass": "STANDARD_IA" },
{ "Days": 30, "StorageClass": "GLACIER" }
],
"Expiration": { "Days": 365 }
}]
}
Kafka Tiered Storage (sketch)
properties log. segment. bytes=1073741824 log. retention. ms=259200000 tiered. storage. enable=true remote. log. storage. system=s3 remote. log. storage. bucket=topic-archive
PostgreSQL partitions by date
sql
CREATE TABLE events (
id bigserial, at timestamptz NOT NULL, payload jsonb
) PARTITION BY RANGE (at);
CREATE TABLE events_2025_10 PARTITION OF events
FOR VALUES FROM ('2025-10-01') TO ('2025-11-01')
TABLESPACE ts_hot; -- further ALTER TABLE... SET TABLESPACE ts_warm по ILM
5) Cost and performance modeling
5. 1 Simple TCO model
'TCO = CapEx/OpEx media + network (egress) + CPU for compression/scans + management + DR/replication '.
5. 2 Balance of latency and price
A hot set ≈ 5-20% of the data yields 80-95% of the queries.
The goal is to keep the working set in the Hot/cache (CPU/RAM/NVMe), shift the rest to Warm/Cold.
5. 3 Metrics
hit_ratio_hot, pct_hot_of_total_bytes, cost_per_TB_month{tier}, scan_cost_per_TB, time_to_first_byte{tier}, promotion_rate (cold→warm), demotion_rate (hot→warm/cold).
6) Partitioning, indexing, and caching
Time partitions + secondary indices for "fresh" slices.
The golden rule of requests: filter by time first, then selective keys.
Hierarchical cache: in-proc → Redis → edge; pin caches for hot keys/aggregates.
Bloom filters/skip indexes (ClickHouse, Parquet) to reduce reads to warm/cold.
7) Replication, fault tolerance and DR
Hot: synchronous replication (multi-zone), RPO≈0, fast feilover.
Warm: asynchronous interzone/interregional replica; RPO minutes.
Cold: interregional with WORM (Write Once Read Many), Legal Hold for compliance.
DR-plans: run-books for the restoration of "cold" archives (hours), periodic fire-drills.
8) Safety and compliance
PII/PCI: encryption at rest (KMS), key policies at each stage, masking when moving down.
Retention and removal: automatic deadlines for cold, provable erase (erase reports).
Jurisdictions: storage in the region (EU-only, RU-only, BY-region, etc.), geo-isolation of buckets.
9) Usage patterns
9. 1 Logs and telemetry
Hot: Last 24-72 h at Elasticsearch/ClickHouse on NVMe.
Warm: 30-180 days on SSD/HDD + Parquet in S3.
Cold: > 180 days in Glacier; requests via Trino/Presto "on demand."
9. 2 Transactions/Orders
Hot: OLTP database (PostgreSQL/MySQL) with a short history.
Warm: denormalized snapshots for BI.
Cold: legal archive, export to object storage.
9. 3 ML-ficestore
Hot: online features in Redis/low-latency DB.
Warm: offline features in column/object.
Cold: source datasets, versioned (Delta/Iceberg/Hudi).
10) Interaction with clusters and Kubernetes
Mark StorageClass by tier: 'gold-nvme' (hot), 'silver-ssd' (warm), 'bronze-object' (cold).
Plan pool nodes (tains/labels) for hot/warm/cold workshops.
Sidecar caches (for example, local SSD cache) before requests to object storage.
Example of PVC
yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: { name: db-hot }
spec:
storageClassName: gold-nvme accessModes: [ ReadWriteOnce ]
resources: { requests: { storage: 500Gi } }
11) Observability
Dashboards: distribution of bytes/requests by tier, latency per tier, offload to warm/cold, cost/month.
Alerts: a decrease in hit-ratio hot, an increase in promotion-rate (is there enough hot volume), an increase in TTFB by warm, a slow recovery of cold (SLO breach).
12) Anti-patterns
"All in hot": excessive cost, IO overheating.
"Deep cold without indexes": cheap to store, expensive to read; no fast slice paths.
"No ILM": manual transfers, human errors.
"Uniform replication policy" for all levels: overpayment and uneven RPOs.
Mix prod/archive queries in one calculation pool - interference.
"Unaccounted egress" from cold clouds: surprises in the bill.
13) Implementation checklist
- Classify data sets: SLA, access frequency, storage requirements.
- Select media and engines per layer (NVMe/SSD/HDD/Object/Archive).
- Design time/key partitions, indexes, and formats (Parquet/ORC/Delta).
- Define ILM rules (rollover/transition/expire) and automate.
- Enable compression/coding (ZSTD/LZ4; in cold - stronger).
- Define replication/RPO/RTO and DR procedures.
- Configure the cache hierarchy and pin for hot aggregates.
- Cost/latency metrics and tier alerts.
- Security policies (KMS, legal retention, geo-isolation).
- Regularly review transfer thresholds (seasonality, growth).
14) FAQ
Q: How do you define the boundaries between hot and warm?
A: According to real distributions of requests: "hot working set" = the top 5-20% of keys/parties, providing 80-95% of requests. All that fails is a warm candidate.
Q: Can I read directly from cold?
A: Yes, but plan SLAs under minutes/hours and egress cost; it is more often profitable to repatriate a fragment to warm (staging) before analysis.
Q: What to choose for analytics 30-180 days?
A: Column formats (Parquet/ORC) on object + query engine (Trino/Presto/ClickHouse) with cache; indexes/skip-data to save IO.
Q: How to avoid "warm-up storms" when resampling from cold?
A: Use prefetch/prepare-jobs, limit requests, time shardy, request-coalescing and pin caches on warm.
15) Totals
The Hot/Warm/Cold architecture is cost matching to the access profile plus automatic lifecycle management. Clear SLOs by layer, partitioning and ILM, reasonable replication and cache hierarchy keep "hot" fast, "warm" affordable, and "cold" cheap and secure.