Optimize infrastructure spending

Brief Summary

The financial efficiency of infrastructure rests on three things:

1. Transparent measurability (tags, showback/chargeback, $/unit of value).

2. Engineering discipline (rightsizing, auto-scale, correct storage/cache/network classes).

3. Architectural solutions (where bytes and milliseconds "flow").

The goal is to lower TCO while maintaining SLO and development speed.

Business metrics and unit-economics

$/1000 RPS - cost of handling 1000 requests on key routes.
$/ms p95 is the cost of reducing the delay tail by 1 ms (important for conversion).
$/player/month or $/deposit - for iGaming/fintech.
TCO = compute + storage + network egress + managed services + licenses + support.
Capitalization of technical debt: record how much the "unrecorded" latency/leakage of logs costs.

Example:

If the API costs $120/h and gives 60k RPS at the target p95, then $/1000 RPS ≈ $2/h. Any optimization must be compared with this "unit price."

Inventory and tagging

Tags are required: 'env', 'owner', 'product', 'service', 'region', 'cost-center', 'tier'.
Showback/Chargeback: Weekly team/service reports.
Control of "draw" resources: without tags - do not deploy, do not extend.

SQL thumbnail for DWH report (idea):

sql
SELECT env, product, service,
SUM(cost_usd) AS cost_month,
SUM(rps) AS rps_month,
SUM(cost_usd)/NULLIF(SUM(rps)/1000,0) AS usd_per_1k_rps
FROM finops_daily
WHERE usage_date BETWEEN:from AND:to
GROUP BY 1,2,3;

Rightsizing and instance classes

CPU/Memory profiles: take profiles under load; reduce requests/limits to a CPU "work point" of 50-70%.
Instance size: N small ones are often more profitable instead of M large ones (better bin-packing + CA).
ARM instances: cheaper with comparable performance if the stack is compatible.

Hot/cold pools: Keep a small warm reserve instead of constant "fat."

Discounts and consumption patterns

Reserved/Savings Plans/Committed Use: Book a sustainable base (40-70% savings).
Spot/Preemptible: for non-critical/asynchronous tasks, CI, analytics, cache workers.
Mix strategy: base - reserved, peaks - on-demand, background - spot.

Auto-scaling and elasticity

HPA/KEDA on SLO signals (latency, queue lag, RPS), not just on the CPU.
Cluster Autoscaler with warm pools and image pre-pull for fast starts.
Scale-down with hysteresis so as not to "saw" clusters (anti-flapping).

Network and egress - a quiet budget eater

CDNs/tiered-cache/origin-shield reduce egress from origin.
Compression (Brotli/gzip), webp/avif, diff API (transfer only modified fields).
Group calls to external APIs, use keepalive/retry-budget.
Fewer chats inside DC: event-driven, butching, event aggregation.

Storage and data

Storage classes: hot (NVMe), warm (gp2/gp3), cold (S3/Glacier/archive).
Lifecycle-policies: automatic translation of "old" objects into cheap classes.
Compression/partitioning to DWH, TTL to temporary tables/snapshots.
Avoid redundant replication: reasonable RF, economical snapshot policies.
Caching: Redis/Memcached for hot-set instead of "expensive" database reads.

Logs, metrics, trails - pay wisely

Sampling logs (rate-limit by level/pattern), "structural" logs instead of chatter.
Tail-based sampling for tracks (save p99 tails and errors, cut the rest aggressively).
Downsampling metrics: aggregation in push-gates, high-res storage only 7-14 days.
PII filtering - reduces both risk and volume.

Architecture and "millisecond cost"

HTTP/2/3 + resumption: less handshake → less CPU/egress/latency.
Cache key and TTL: high hit-ratio - direct money (less origin and DB).
gRPC/protobaf for service-service: fewer bytes.
Batch/stream for background tasks; idempotency → fewer retreats.
Database choice: do not store "all in one" - cheap KV/caches for frequent reads, analytics - in column DWH.
Data schemas: short fields/compressed types, index cardinality control.

DR, reserves and multi-region

Business goal: RTO/RPO → cost of DR. Do not overpay for an asset-asset if there is enough asset-liability.
Keep cold backups in cheap class, replica differential.
A single package of PoR/regions: each zone pulls ≥60% of the peak → withstand neighbor failure without "golden" redundancy.

Environments and CI/CD

Hibernation staging/preview environments, auto-TTL.
CI runners on spot, artifact cache, concurrency constraints.
Test data is compact, on-the-fly generation, not gigabyte storage.

Manage vendors and licenses

Review volumes and price types quarterly.
A competitive backup provider is an argument in bargaining.

Licenses (APM/security): Count $ for a useful signal, not for "all the logs of the world."

Processes and management

FinOps ceremonies: weekly team report, monthly Cost Review (top 10 "leaks," action items).
Guardrails: project/space quotas, budget alerts, ban on deploying untagged resources.
Blameless post-sea on "price incidents" (leak logs, runaway autoscale).
IaC: all limits, classes, TTL - in the repository, PR review.

Savings checklist

Tags/showback/chargeback are included, there are no "draw" resources.
Rightsizing by profile, ARM/other types rated.
Commits close the base, spot - background/analytics/CI.
HPA/KEDA by SLO metrics, CA with warm pools.
CDN/tiered-cache, compression, cache key without noise.
Stores: classes, lifecycle, TTL, caches for hot-set.
Logs/trails: sampling, tail-based, PII filters.
DR by RTO/RPO, cold backups in cheap class.
Environments with auto-TTL, CI on spot.
FinOps rhythms and guardrails in IaC.

Common errors

"Optimization without metrics": no $/1000 RPS → cannot compare options.
Disconnected/unused resources hang for months.
Storage of "everything" in hot class, absence of lifecycle.
Logs as "black hole": 100% ingest, 0% use.
Auto-scale over CPU excluding latency/queues → overpayment and SLO regression.
Too aggressive DR without business justification.
Microservices "for show" - the growth of interservice traffic and overhead.

Mini playbooks

1) Quick account audit (48 hours)

1. Cut by top 10 services/region. 2) For each - $/1000 RPS, hit-ratio CDN, egress.
2. Roll out TTL/cache keys, turn off noisy logs. 4) Enable lifecycle on S3/facilities.

2) 25% egress reduction

1. Tiered-cache+shield, `stale-while-revalidate`. 2) Compress images into webp/avif.
2. Diff API and gzip/brotli on text. 4) Check repeated requests/retrays.

3) Cut off DB costs

1. Top queries (p95/IO) → indexes/butching. 2) Hot-set в Redis.
2. Archiving old data (TTL), read-replicas on a cheap stack.

4) Termination of the "saw" of the scale

1. Increase stabilization/cooldown. 2) MinReplicas> 0 at peak.
2. Pre-heating of connections/TLS. 4) Cut off excess retrays.

Example of "economical" Nginx (compression, cache, SWR)

nginx proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=EDGE:512m max_size=50g inactive=7d;

server {
listen 443 ssl http2 reuseport;

Compression brotli on; brotli_comp_level 5; gzip on;

Static: year, immutable location/assets/{
add_header Cache-Control "public, max-age=31536000, immutable" always;
try_files $uri =404;
}

Semi-dynamics: s-maxage + SWR location/catalog/{
proxy_cache EDGE;
add_header Cache-Control "public, s-maxage=600, max-age=120, stale-while-revalidate=900, stale-if-error=86400" always;
proxy_ignore_headers Set-Cookie;
proxy_pass https://origin_catalog;
}
}

iGaming/fintech specific

Peaks (matches/tournaments): raise 'minReplicas' in advance and warm up CDN/TLS, but keep the headroom pointwise - only on hot tracks (catalogs, lobbies, matches), the rest - in degrad mode.
Payments/PSP: directory cache (BIN, limits), idempotency reduces the cost of takes, a separate egress pool for provider whitelists.
Anti-fraud/bots: "gray" routes and cheap challenges on the edge instead of an expensive deep check for each request.
Live content/providers: cache at the edge + limiting the frequency of updates; CDN contracts to revise for large events.

Total

Cost optimization is not a one-time cleaning, but a constant FinOps process: measure value ($/unit), automate cost-effective solutions (cache/TTL/sampling), use discounts and the right resource classes, keep elasticity under SLO and do not complicate the architecture where it does not pay off. This will reduce TCO while maintaining product speed and platform stability.