Caching strategies

1) Why cache and where to do it

Cache is a fast memory layer that reduces latency and load on expensive resources (CPU/DB/external API). Important goals:

Speed (p95/p99 lower), cost (less egress/CPU), stability (less dependencies under the peak).
Peak smoothing and isolation from "noisy neighbors."

Typical levels:

1. Client (browser/mobile) - HTTP cache, IndexedDB, local storage.

2. Edge/CDN - POP nodes are closer to the user, cache static and part of the API.

3. L7-gateway/Reverse-proxy - Nginx/Envoy/Varnish (microcash, SWR).

4. Service cache - Redis/Memcached within the cluster.

5. In-process - in-memory (Caffeine/Guava/LRU-map).

6. Cache in the database - material representations, secondary indexes.

Rule: cache as close to the consumer as possible, but keep the truth once.

2) Cache patterns

2. 1 Cache-aside (“lazy loading”)

The application first reads from the cache; in case of a miss - from the source, then writes to the cache.
Pros: simplicity, control. Cons: cold starts, mismatch windows.

2. 2 Read-through

Reading is always through the cache, which itself goes to the source when it misses (library/proxy layer).
It is convenient to centralize TTL/serialization policies.

2. 3 Write-through / Write-back (write-behind)

Write-through: write to cache and source synchronously → consistency higher, latency higher.
Write-back: write to cache, asynchronous flash write to source → fast, but risk of loss and conflict.

2. 4 Refresh-ahead (proactive)

Predicts "TTL will expire soon" and updates the key in the background, preventing stampede.

2. 5 Negative caching

Caching "no data/404/empty" to a short TTL reduces the load on the source.

2. 6 Micro-caching

Very short TTLs (0. 5-5 s) on L7 for "almost dynamics" (lists, main) - sharply reduces tails.

3) HTTP cache: headers and control

3. 1 Basic headings

`Cache-Control`: `max-age`, `s-maxage` (для shared кэшей), `public/private`, `no-store`, `stale-while-revalidate`, `stale-if-error`.
Validators: 'ETag' (content hash), 'Last-Modified'.
Queries with conditions: 'If-None-Match', 'If-Modified-Since' → 304 Not Modified.

3. 2 Vary and keys

'Vary: Accept-Encoding, Authorization, Cookie, Accept-Language '- generates different cache options. Minimize 'Vary' so as not to "blow up" the cardinality.

3. 3 HTTP Response Example


Cache-Control: public, max-age=60, s-maxage=300, stale-while-revalidate=60
ETag: "a1b2c3"
Vary: Accept-Encoding

4) Key design and TTL

4. 1 Keys

Structure: 'tenant: user: {id}: profile: v3' (include schema version).
Avoid PII in the key.
For collections - key + query parameters (normalized and sorted).

4. 2 TTL and consistency

A short TTL reduces mismatch but increases misses.
For critical data - validators ('ETag') and SWR (stale-while-revalidate).
For rarely changing - long TTL + "bombs" of disability.

4. 3 Versioning/basting

For incompatible changes, change the prefix/key version ('v2 → v3').
For static resources - content hash in the file name.

5) Disability: strategies and practices

5. 1 Direct deletion

'DEL key '/' PURGE'on the proxy. Danger: Races between removal and multiple readers.

5. 2 Surrogate keys

Associate the document with a set of tags (category/author). Disability - by tag.
В Varnish/Edge — `Surrogate-Key: article:42 tag:author:7` + `BAN tag:author:7`.

5. 3 Event-driven disability

Pub/Sub (Kafka/NATS): when the source changes, we publish the "invalidate" event.
Cache consumers listen and delete/update keys.

5. 4 Two-phase

First, we mark the key obsolete (soft TTL), service the stale, update it in the background and atomically replace it.

6) Dealing with stampede/dogpile and hot keys

6. 1 Request coalescing (singleflight)

One producer updates the key, the rest are waiting for the result (mutex/label "updates").

6. 2 Jitter к TTL

Add randomness (± 10-20%) to the TTL to avoid synchronous swelling.

6. 3 Soft-TTL + hard-TTL

Before soft-TTL, we serve from the cache, in parallel with the refresh trigger; by hard-TTL - we consider a miss.

6. 4 Hot Keys

Local caches over shared (two-tier).
Hot-key replication to multiple shards and random selection (read-only only).
Rate limit for updating a specific key.

6. 5 Example of Redis + Lua (singleflight-sketch)

lua
-- SETNX lock with TTL to avoid deadlocks local ok = redis. call("SET", KEYS[1], "1", "NX", "EX", ARGV[1])
if ok then return "LOCKED"
else return "WAIT"
end

7) Preemption policies and cache reception

7. 1 Eviction

LRU: simple and good for locality.
LFU: Better for "long-lived" hot keys.
ARC/TinyLFU: recency/frequency balance.

7. 2 Admission

Do not let in giant rare objects (TinyLFU/Bloom filters).
Compression of large values (LZ4/Zstd) at the size/latency boundary.

8) Charding and topologies

8. 1 Consistent hashing

Stably distributes keys to nodes, reduces movement during cluster growth/compression.

8. 2 Redis/Memcached topologies

Redis Cluster (slots/shards), Sentinel (feilover), read-only replication.
Memcached is a client-side sharding (ketama hashing), without server-level replication.

8. 3 Local + Distributed

Cascade: in-proc (micro-TTL/LRU) → Redis (TTL longer) → source.
Be careful with TTL colons and cache validators.

9) Edge, CDN and L7 cache

9. 1 Micro-cache на Nginx

nginx proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=api:100m inactive=10m;
map $request_method $skip_cache { default 0; POST 1; PUT 1; DELETE 1; }

server {
location /api/list {
if ($skip_cache) { add_header Cache-Control "no-store"; }
proxy_cache api;
proxy_cache_valid 200 2s;       # micro-cache proxy_cache_use_stale error timeout updating;
proxy_cache_background_update on;   # SWR add_header X-Cache $upstream_cache_status;
proxy_pass http://upstream;
}
}

9. 2 Envoy (SWR and conditions)

yaml http_filters:
- name: envoy. filters. http. cache typed_config:
"@type": type. googleapis. com/envoy. extensions. filters. http. cache. v3. CacheConfig typed_config:
"@type": type. googleapis. com/envoy. extensions. http. cache. file_system_http_cache. v3. FileSystemHttpCacheConfig cache_path: "/var/cache/envoy"

9. 3 Varnish (Surrogate keys)

Use 'Surrogate-Key' and 'ban' on tags for batch disability.

10) Cache and data consistency

10. 1 Read-your-writes

For user profiles/recycle bin, provide either short TTLs, write-through, or client marking (bypass for N seconds after writing).

10. 2 Eventual vs Strong

For recommendatory/analytical - eventual + long TTL.
For money/order statuses - short TTL, validation, sometimes without cache on critical paths.

10. 3 Invariants

Do not cache fields that affect security/ACLs without strict TTLs and re-validation.

11) Observability, SLO and management

11. 1 Metrics

hit_ratio (общий и per-route), byte_hit_ratio, miss_rate.
stampede_prevented_total, refresh_ahead_total, ban/purge_total.
Latency: p50/p95/p99 from cache vs from source.
hot_keys_topN and their QPS/bytes.

11. 2 Logs and traces

Log 'X-Cache: HIT/MISS/STALE/UPDATING'.
In traces, mark the source of the response ('cache = true', 'tier = edge' service 'local').

11. 3 SLO approach

Example: "for API/catalog p99 ≤ 250 ms, cache hit ≥ 85%, stampede ≤ 0. 1% of requests."

11. 4 Runbooks

"Misses grow" → check TTL, warm-up/disability, hot-keys, cache size and acceptance policy.

12) Safety and multi-tenancy

Embed tenant-id in keys (and in'Vary 'for HTTP).
Do not cache private responses as' public '.
Encrypt cache with sensitive data or store only non-PII/ID.

13) Typical recipes

13. 1 Catalog/Tape (almost dynamic)

Edge-microcash 1-3 s + SWR, inside - Redis for 15-60 s, disability by update events.

13. 2 User profile

Cache-aside with TTL 30-120 s, bypass 5-10 s after profile update (cookie/header), or write-through.

13. 3 Currency courses/reference books

Long TTL (minutes-hours) + target disability when new data is published; 'ETag' for conditional GETs.

13. 4 Search results

Edge-microcash 1-2 s, inside - refresh-ahead and coalescing, normalization of query parameters in the key.

14) Anti-patterns

Cash without disability: hope only for TTL → long windows of irrelevance.
Giant 'Vary': "explosion" of options → low hit-rate.
Single cache for prod/experiments → contamination.
No protection against stampede → source spikes when TTL expires.
Cash/rights/ACL cache without strict guarantees.
Compression of "everything in a row" - extra CPUs, deterioration of p99 on small objects.

15) Implementation checklist

Define the cache levels and their targets (edge/service/local).
Design keys (versioning, tenant, parameter normalization).
Select the pattern (cache-aside/read-through/refresh-ahead).
Configure TTL/soft-TTL/jitter, enable SWR.
Implement coalescing/singleflight, stampede protection.
Organize disability (events, tags, purge/ban).
Enter hit-ratio/latency metrics and'X-Cache 'dashboards.
Perform hot key load tests.
Write SLO and runbooks.
Check the security/tenant isolation and'Vary '.

16) FAQ

Q: What to choose - cache-aside or read-through?
A: For simple services - cache-aside. We need centralization and a single policy - read-through.

Q: How to understand the optimal TTL?
A: Start from permissible obsolescence, frequency of updates and target hit-rate; add jitter and observe p95/p99/cost.

Q: When is write-back appropriate?

A: For high-load streams, where eventual consistency is acceptable and there is a reliable queue/log for "adding."

Q: Can authorized responses be cached?
A: Yes, but mark 'private' and/or include tenant/user in the/' Vary 'switch. For truly-private - client cache.

Q: How to warm up the cache?
A: Lists of popular keys, background wormer, replay from logs, warming up before release/peak (black Friday, etc.).

17) Totals

Effective caching is key design + reasonable TTL + a well-chosen pattern, enhanced by event disability, SWR/refresh-ahead, and stampede protection. Tier the cache (client/edge/service), add observability and SLO - and get stable latency tails, predictable cost and peak resilience.

Caching strategies

Get in Touch

Quick Contact

The video will be updated soon

We are currently very busy with projects