Retention and retention policies

1) Principles

1. Purpose & Minimization. We store exactly that and exactly as much as we need for processing purposes.
2. Policy as Code. Retention is an executable policy, not a PDF.
3. Defense in Depth. TTL/ILM + encryption + audits + Legal Hold.
4. Reversibility & Proof. Deletion is verifiable: action logs, crypto shredding, compliance report.
5. Cost & Carbon Aware. Retention takes into account $/GB-month and carbon footprint of storage/egress.

2) Data classification and "Retenschen map"

Break the sets into classes with goals and legal grounds:

Operational (OLTP): orders, payments, sessions.
Analytical (DWH/dates): events, log facts, slices.
Personal (PII/finance/health): require special terms and rights of subjects.
Technical: logs, metrics, trails, CI artifacts.
Documents/media: WORM/archive/legasi.

For each class, set: owner, purpose, legal framework, dates, level of protection, current and target storage.

3) ILM Data Lifecycle

Typical conveyor:

1. Ingest (hot) → NVMe/SSD, high request rate.

2. Warm → less often read, compression, column formats.

3. Cold/Archive → object/tape, long access.

4. Purge/Delete → guaranteed deletion (including replicas/backups).

Example of an ILM profile (YAML):

yaml dataset: events_main owner: analytics purpose: "product analytics"
classification: "pseudonymized"
lifecycle:
- phase: hot; duration: 7d; storage: nvme; format: row
- phase: warm; duration: 90d; storage: ssd; format: parquet; compress: zstd
- phase: cold; duration: 365d; storage: object; glacier: true
- phase: purge; duration: 0d privacy:
pii: false dp_delete_window: 30d # SLA on personal deletions if ligaments appear

4) Policies as code (useful sketches)

4. 1 Admission policy (required tags/TTL)

yaml policy: require-retention-tags deny_if_missing: [owner, purpose, classification, retention]
default_retention:
logs:  "30d"
traces: "7d"
metrics:"90d"

4. 2 Gate in CI/CD (Rego) - prohibition of deploy without retension

rego package policy. retention deny[msg] {
some d input. datasets[d].retention == ""
msg:= sprintf("Retention missing for dataset %s", [d])
}

4. 3 S3/object (lifecycle fragment)

yaml
Rules:
- ID: logs-ttl
Filter: { Prefix: "logs/" }
Transitions:
- { Days: 7, StorageClass: STANDARD_IA }
- { Days: 30, StorageClass: GLACIER }
Expiration: { Days: 180 }
NoncurrentVersionExpiration: { NoncurrentDays: 30 }

5) Retention in threads and queues

Kafka:

`retention. ms`/`retention. bytes' - window retention.
Compaction (`cleanup. policy = compact ') - store the last key value.
Tiered Storage - we take the "tail" to a cold shooting gallery.
DLQ is a separate retention and TTL.

Example:

properties cleanup. policy=delete,compact retention. ms = 604800000 # 7d for tail removal
min. cleanable. dirty. ratio=0. 5 segment. ms=86400000

Warranties:

Define the key topic retention ≈ the replay/recalculation business window.
For billing/audit events, a separate long retention or WORM.

6) Databases and retention

Relational:

Partitioning by date/range, detach & drop old parties.
Date fields - indexes for TTL requests.
Temporal tables (system-versioned) + purge policies of older versions.

SQL Sketch (PostgreSQL):

sql
-- Monthly instalments
CREATE TABLE audit_events (id bigserial, occurred_at timestamptz, payload jsonb) PARTITION BY RANGE (occurred_at);
-- Cleaning over 365 days
DELETE FROM audit_events WHERE occurred_at < now() - interval '365 days';
VACUUM (FULL, ANALYZE) audit_events;

NoSQL/Time-series:

TTL at the key level (MongoDB TTL index, Redis' EXPIRE ', Cassandra TTL).
Downsampling for metrics (raw 7d → aggregates 90d → long 365d).
Retention policies in TSDB (Influence/ClickHouse Materialized Views with dedup/aggregation).

7) Logs, metrics, trails

Logs: limit fields, mask PD, TTL 7-30d, archive 90-180d.
Metrics: raw high-frequency - 7-14d; downsample (5m/1h) — 90–365д.
Trails: tail-sampling and keeping "interesting" (bugs/tails) longer.

Policy (example):

yaml observability:
logs:  { ttl: "30d", archive: "90d", pii_mask: true }
metrics: { raw: "14d", rollup_5m: "90d", rollup_1h: "365d" }
traces: { sample: "tail-10%", ttl: "7d", error_ttl: "30d" }

8) Removal: types and warranties

Logical (soft-delete): marking a record; convenient for recovery, does not fit the "right to delete."

Physical (hard-delete) - the actual deletion of data/versions/replicas.
Cryptographic (crypto-erasure): delete/replace encryption keys, after which the data is not restored.
Cascade: end-to-end deletion of derivations (caches, indexes, analytics).

Personal deletion workflow (pseudo):


request → locate subject data (index by subject_id) → revoke tokens & unsubscribe jobs → delete in OLTP → purge caches → enqueue erasure in DWH/lakes → crypto-shred keys (per-tenant/per-dataset) → emit audit proof (receipt)

9) Right to Remove, Legal Hold and eDiscovery

Right to delete/correct: SLA of execution (for example, ≤30 days), traced actions, receipts.
Legal Hold: on legal request - deletion freeze for specified sets/keys; priority policy over TTL.
eDiscovery: data catalog, full-text/attribute artifact search, export in consistent formats.

Example of Legal Hold (YAML) marking:

yaml legal_hold:
dataset: payments scope: ["txn_id:123", "user:42"]
from: "2025-10-31"
until: "2026-03-31"
reason: "regulatory investigation"

10) Backups vs archives vs WORM

Backups - to recover from loss/damage; short retension, fast RTO.
Archives - long term retention for audit/analytics, cheap, long access.
WORM - immutable media for compliance (finance/reporting); "write-once, read-many" policies.

Rules:

Do not count the backup as an "archive for centuries."
Recovery rehearsals (DR days), time and completeness report.
Directory of backups with retention, encryption and keys separately from the storage.

11) Privacy and anonymization

Aliasing: PII delayed binding via key table (allows crypto-erasure by key).
Anonymization: irreversible techniques (k-anonymity, noise, generalization); Document method, risk of re-identification and expiry date.

12) Compliance monitoring and reporting

Control panels: proportion of datasets with valid retention, volumes by ILM phases, deletion errors.
Alerts: exceeding the target volume in the hot dash, "hung" deletions expiring Legal Hold.
Reports: monthly deletion audit (number of requests, average term, failures), crypto-shredding printout.

13) Integration into processes: gates and reviews

Design-gate: The new dataset does not get a review without 'owner/purpose/retention'.
Release-gate: migrations that increase retention without owner/justification are blocked.
Cost-gate: volume in hot/warm exceeds budget - trigger for ILM tightening.
Security-gate: prohibition of PD inclusion in logs/trails without disguise and TTL.

14) Anti-patterns

"We keep everything forever - it will suddenly come in handy."

Hard-coded TTLs in applications not rendered in policies.
PD in logs and traces without masking/TTL/deletion.
Incomplete deletion (left in cache/DWH/backups).
Lack of Legal Hold - data erasure under investigation.

One common encryption key for everything - it is impossible to point "crypto-erase."

Zero observability: "we believe we removed," but there is no evidence.

15) Architect checklist

1. For each dataset there is an owner, purpose, classification, retention, storage tier?
2. Are ILM/TTL policies declared as code and applied automatically?
3. PDs are masked in logs/tracks; banned outside "white" sets?
4. Are there personal deletion processes (SLA, audit, receipts)?
5. Crypto-erasure possible (per tenant/per dataset keys, KMS/rotation)?
6. Backups: schedule, encryption, recovery tests, individual keys?
7. Legal Hold/eDiscovery: Supported, prevail over TTL, activity logs maintained?
8. Kafka/queues: specified retention/compaction/tiering, DLQ has separate policies?
9. Metrics and alerts for compliance with Retenschen and volumes on shooting galleries are configured?
10. Are reviews and gates in SDLC blocking artifacts without Retenschen?

16) Mini recipes

16. 1 ClickHouse: 'Cut off the tail' over 180 days

sql
ALTER TABLE events DELETE WHERE event_date < today() - 180;
OPTIMIZE TABLE events FINAL;

16. 2 Redis: TTL и lazy-purge

bash
SET session:123 value EX 3600
CONFIG SET maxmemory-policy allkeys-lru

16. 3 Tail-sampling for trails

yaml tail_sampling:
policies:
- name: keep-errors-and-slow latency_threshold_ms: 500 status_codes: ["5xx"]
rate_limit_per_min: 5000 default_ttl: "7d"

16. 4 Crypto-erasure (idea)


keys:
dataset: users_pii key_id: kms://pii/users/tenant-42 erase(user_id=42):
rotate_or_destroy (key_id) # inability to restore former purge_indexes blocks ("user _ id = 42")
audit("crypto-erasure", user_id)

Conclusion

Retention policies are the "skeleton" of your data platform: they describe how long different classes of data live, where they are at each moment, how they get cheaper over time, and when they disappear without a trace - legally, transparently, and verifiably. Make retention a policy like code, connect ILM with security and cost, enable observability and gates - and you get a system that is both effective, compliant and ready to grow.

Retention and retention policies

Conclusion

Get in Touch

Quick Contact

The video will be updated soon

We are currently very busy with projects