Retention and retention policies
1) Principles
1. Purpose & Minimization. We store exactly that and exactly as much as we need for processing purposes.
2. Policy as Code. Retention is an executable policy, not a PDF.
3. Defense in Depth. TTL/ILM + encryption + audits + Legal Hold.
4. Reversibility & Proof. Deletion is verifiable: action logs, crypto shredding, compliance report.
5. Cost & Carbon Aware. Retention takes into account $/GB-month and carbon footprint of storage/egress.
2) Data classification and "Retenschen map"
Break the sets into classes with goals and legal grounds:- Operational (OLTP): orders, payments, sessions.
- Analytical (DWH/dates): events, log facts, slices.
- Personal (PII/finance/health): require special terms and rights of subjects.
- Technical: logs, metrics, trails, CI artifacts.
- Documents/media: WORM/archive/legasi.
For each class, set: owner, purpose, legal framework, dates, level of protection, current and target storage.
3) ILM Data Lifecycle
Typical conveyor:1. Ingest (hot) → NVMe/SSD, high request rate.
2. Warm → less often read, compression, column formats.
3. Cold/Archive → object/tape, long access.
4. Purge/Delete → guaranteed deletion (including replicas/backups).
Example of an ILM profile (YAML):yaml dataset: events_main owner: analytics purpose: "product analytics"
classification: "pseudonymized"
lifecycle:
- phase: hot; duration: 7d; storage: nvme; format: row
- phase: warm; duration: 90d; storage: ssd; format: parquet; compress: zstd
- phase: cold; duration: 365d; storage: object; glacier: true
- phase: purge; duration: 0d privacy:
pii: false dp_delete_window: 30d # SLA on personal deletions if ligaments appear
4) Policies as code (useful sketches)
4. 1 Admission policy (required tags/TTL)
yaml policy: require-retention-tags deny_if_missing: [owner, purpose, classification, retention]
default_retention:
logs: "30d"
traces: "7d"
metrics:"90d"
4. 2 Gate in CI/CD (Rego) - prohibition of deploy without retension
rego package policy. retention deny[msg] {
some d input. datasets[d].retention == ""
msg:= sprintf("Retention missing for dataset %s", [d])
}
4. 3 S3/object (lifecycle fragment)
yaml
Rules:
- ID: logs-ttl
Filter: { Prefix: "logs/" }
Transitions:
- { Days: 7, StorageClass: STANDARD_IA }
- { Days: 30, StorageClass: GLACIER }
Expiration: { Days: 180 }
NoncurrentVersionExpiration: { NoncurrentDays: 30 }
5) Retention in threads and queues
Kafka:- `retention. ms`/`retention. bytes' - window retention.
- Compaction (`cleanup. policy = compact ') - store the last key value.
- Tiered Storage - we take the "tail" to a cold shooting gallery.
- DLQ is a separate retention and TTL.
properties cleanup. policy=delete,compact retention. ms = 604800000 # 7d for tail removal
min. cleanable. dirty. ratio=0. 5 segment. ms=86400000
Warranties:
- Define the key topic retention ≈ the replay/recalculation business window.
- For billing/audit events, a separate long retention or WORM.
6) Databases and retention
Relational:- Partitioning by date/range, detach & drop old parties.
- Date fields - indexes for TTL requests.
- Temporal tables (system-versioned) + purge policies of older versions.
sql
-- Monthly instalments
CREATE TABLE audit_events (id bigserial, occurred_at timestamptz, payload jsonb) PARTITION BY RANGE (occurred_at);
-- Cleaning over 365 days
DELETE FROM audit_events WHERE occurred_at < now() - interval '365 days';
VACUUM (FULL, ANALYZE) audit_events;
NoSQL/Time-series:
- TTL at the key level (MongoDB TTL index, Redis' EXPIRE ', Cassandra TTL).
- Downsampling for metrics (raw 7d → aggregates 90d → long 365d).
- Retention policies in TSDB (Influence/ClickHouse Materialized Views with dedup/aggregation).
7) Logs, metrics, trails
Logs: limit fields, mask PD, TTL 7-30d, archive 90-180d.
Metrics: raw high-frequency - 7-14d; downsample (5m/1h) — 90–365д.
Trails: tail-sampling and keeping "interesting" (bugs/tails) longer.
yaml observability:
logs: { ttl: "30d", archive: "90d", pii_mask: true }
metrics: { raw: "14d", rollup_5m: "90d", rollup_1h: "365d" }
traces: { sample: "tail-10%", ttl: "7d", error_ttl: "30d" }
8) Removal: types and warranties
Logical (soft-delete): marking a record; convenient for recovery, does not fit the "right to delete."
Physical (hard-delete) - the actual deletion of data/versions/replicas.
Cryptographic (crypto-erasure): delete/replace encryption keys, after which the data is not restored.
Cascade: end-to-end deletion of derivations (caches, indexes, analytics).
request → locate subject data (index by subject_id) → revoke tokens & unsubscribe jobs → delete in OLTP → purge caches → enqueue erasure in DWH/lakes → crypto-shred keys (per-tenant/per-dataset) → emit audit proof (receipt)
9) Right to Remove, Legal Hold and eDiscovery
Right to delete/correct: SLA of execution (for example, ≤30 days), traced actions, receipts.
Legal Hold: on legal request - deletion freeze for specified sets/keys; priority policy over TTL.
eDiscovery: data catalog, full-text/attribute artifact search, export in consistent formats.
yaml legal_hold:
dataset: payments scope: ["txn_id:123", "user:42"]
from: "2025-10-31"
until: "2026-03-31"
reason: "regulatory investigation"
10) Backups vs archives vs WORM
Backups - to recover from loss/damage; short retension, fast RTO.
Archives - long term retention for audit/analytics, cheap, long access.
WORM - immutable media for compliance (finance/reporting); "write-once, read-many" policies.
- Do not count the backup as an "archive for centuries."
- Recovery rehearsals (DR days), time and completeness report.
- Directory of backups with retention, encryption and keys separately from the storage.
11) Privacy and anonymization
Aliasing: PII delayed binding via key table (allows crypto-erasure by key).
Anonymization: irreversible techniques (k-anonymity, noise, generalization); Document method, risk of re-identification and expiry date.
12) Compliance monitoring and reporting
Control panels: proportion of datasets with valid retention, volumes by ILM phases, deletion errors.
Alerts: exceeding the target volume in the hot dash, "hung" deletions expiring Legal Hold.
Reports: monthly deletion audit (number of requests, average term, failures), crypto-shredding printout.
13) Integration into processes: gates and reviews
Design-gate: The new dataset does not get a review without 'owner/purpose/retention'.
Release-gate: migrations that increase retention without owner/justification are blocked.
Cost-gate: volume in hot/warm exceeds budget - trigger for ILM tightening.
Security-gate: prohibition of PD inclusion in logs/trails without disguise and TTL.
14) Anti-patterns
"We keep everything forever - it will suddenly come in handy."
Hard-coded TTLs in applications not rendered in policies.
PD in logs and traces without masking/TTL/deletion.
Incomplete deletion (left in cache/DWH/backups).
Lack of Legal Hold - data erasure under investigation.
One common encryption key for everything - it is impossible to point "crypto-erase."
Zero observability: "we believe we removed," but there is no evidence.
15) Architect checklist
1. For each dataset there is an owner, purpose, classification, retention, storage tier?
2. Are ILM/TTL policies declared as code and applied automatically?
3. PDs are masked in logs/tracks; banned outside "white" sets?
4. Are there personal deletion processes (SLA, audit, receipts)?
5. Crypto-erasure possible (per tenant/per dataset keys, KMS/rotation)?
6. Backups: schedule, encryption, recovery tests, individual keys?
7. Legal Hold/eDiscovery: Supported, prevail over TTL, activity logs maintained?
8. Kafka/queues: specified retention/compaction/tiering, DLQ has separate policies?
9. Metrics and alerts for compliance with Retenschen and volumes on shooting galleries are configured?
10. Are reviews and gates in SDLC blocking artifacts without Retenschen?
16) Mini recipes
16. 1 ClickHouse: 'Cut off the tail' over 180 days
sql
ALTER TABLE events DELETE WHERE event_date < today() - 180;
OPTIMIZE TABLE events FINAL;
16. 2 Redis: TTL и lazy-purge
bash
SET session:123 value EX 3600
CONFIG SET maxmemory-policy allkeys-lru
16. 3 Tail-sampling for trails
yaml tail_sampling:
policies:
- name: keep-errors-and-slow latency_threshold_ms: 500 status_codes: ["5xx"]
rate_limit_per_min: 5000 default_ttl: "7d"
16. 4 Crypto-erasure (idea)
keys:
dataset: users_pii key_id: kms://pii/users/tenant-42 erase(user_id=42):
rotate_or_destroy (key_id) # inability to restore former purge_indexes blocks ("user _ id = 42")
audit("crypto-erasure", user_id)
Conclusion
Retention policies are the "skeleton" of your data platform: they describe how long different classes of data live, where they are at each moment, how they get cheaper over time, and when they disappear without a trace - legally, transparently, and verifiably. Make retention a policy like code, connect ILM with security and cost, enable observability and gates - and you get a system that is both effective, compliant and ready to grow.