→ Latency Technologies and Infrastructure and API Response Optimization

Latency and API Response Optimization

1) What is "latency" and why it matters

Latency - total request delay: network (DNS + TCP + TLS + RTT), balancer/gateway, application, DB/caches/queues, external integrations. P95/P99 are critical for business, not average: it is the "tail" that destroys UX, CR and SLO.

Basic SLIs:

'SLI _ latency _ P95 = P95 (response _ time) 'in 5/30 minutes
'SLI _ latency _ P99 = P99 (response _ time) '
'SLI _ queue _ time = P95 (worker _ queue _ time) '
'SLI _ ext _ call _ P95 = P95 (external _ provider _ latency) '

2) Delay source map (and where to dig)

1. Network and protocols: DNS, TCP handshakes, TLS, head-of-line (HTTP/1. 1), packet loss, BBR/ECN.
2. Gateway/balancer: slow health-check, invalid timeouts, hot bottoms.
3. Application: locks, GC/stop-the-world, synchronous I/O, contention.
4. Repositories: slow database queries, no indexes, cold pages.
5. External services: PSP/KYC, third-party APIs (narrow SLAs).
6. Queues and background jabs: overloaded workers, no backpressure.
7. Cache/edge: cache misses, weak TTL, invalid disability.

3) Network and protocols

3. 1 DNS/TCP/TLS

DNS prefetch/preconnect at the front, long-lived IP to PSP.
Keep-Alive/connection pooling in clients; on the server - aggregate connections.
TLS: resumption/Session Tickets, a modern cipher package; avoid 0-RTT for unsafe idempotent operations.
TCP: disable Nagle ('TCP _ NODELAY') for chats/small packets; tune'initial window ', enable BBR where appropriate.

3. 2 HTTP/2 и HTTP/3

HTTP/2: multiplexing reduces HTTP/1 HOL locks. 1; monitor thread priorities.
HTTP/3/QUIC: lower impact of losses/RTT; useful on a mobile/international network.
Header compression: HPACK/QPACK, but keep a reasonable header size.

3. 3 Balancing/routing

Locality-aware (zoning), EWMA/least-request versus hot instances.
Sticking sessions - only if there is a state; otherwise stateless + shared cache/sessions.

4) Formats, payload, compression

Squeeze: Brotli (text), Gzip as fallback; binary formats: Protobuf/Avro for gRPC/internal APIs.
Reduce payload: selective fields ('fields =...'), pagination, conditional GET (ETag/If-None-Match), delta responses.
GraphQL: persisted queries, prohibition of "fat" fragments, limits of depth and complexity.
Avoid N + 1: Joyns/preposition, butch endpoints for aggregates.

5) Timeouts, retreats, idempotence

Chain timeouts client <gateway <appa <storage/external call.
Retrai with backoff + jitter, only for temporary errors; expose budgets on retrayes.
Idempotence: query key/token + save result; retreats should not duplicate operations (especially finances).
Circuit Breaker: Open when degraded; hedged/backup requests for tails (send duplicate via P95).

6) Queues, asynchrony and backpressure

Do not block the synchronous path: heavy operations (KYC scans, reporting) - in the background.
Backpressure: limit consumption from the queue, fix concurrency.
Batching/coalescing - Combine small transactions (for example, updating balances with aggregation).
Outbox/Inbox: guaranteed delivery of events in case of failures.

7) Application: runtimes and pools

Pools of connections to databases/caches/HTTP; limit them so as not to "strangle" the backend.
JVM: profile GC (G1/ZGC), avoid large allocations; .NET - ThreadPool/async; Node. js - do not block the event loop, take out the CPU-heavy.
Python: asynchronous drivers (asyncpg/httpx), uvloop; CPU tasks via worker-pool.
Warm-up: warm up JIT/caches, "warm pools" instances to peaks.

8) Databases and caches

Indexes and plans: regular 'EXPLAIN', auto-vacuum/analysis, scan limit.
Connection pooling (PgBouncer/Multiplexing), short transactions.
Cache strategies: read-through, write-through/write-behind; TTL + disability by event.
Sharding/replicas: reading from slaves, "hot keys" - local caches (near-cache).

9) Caching and edge

CDN/edge for static/directories, cache API responses (if safe) for 'Cache-Control', 'ETag'.
Stale-while-revalidate and stale-if-error for UX-stability.
Geo-allocation: nearest RRR/region reduces RTT.

10) Architectural patterns vs. P99 tails

Hedged requests - Duplicate a slow request to another instance after the threshold.
Request collapsing: one "leading" request to the database, the rest are waiting for the result (avoids storms).
Priority: VIP/critical operations - dedicated pool/priority.
Graceful degradation: Trim minor fields/widgets when overloaded.

11) Configs (approximately)

11. 1 NGINX (Timeouts/Compression)

nginx proxy_connect_timeout  1s;
proxy_send_timeout   2s;
proxy_read_timeout   2s;
send_timeout      2s;

gzip on;
gzip_types application/json text/plain text/css application/javascript;

11. 2 Envoy (hedge + retry budget)

yaml
RetryPolicy:
retry_on: 5xx,reset,connect-failure num_retries: 2 per_try_timeout: 300ms retry_back_off: { base_interval: 50ms, max_interval: 200ms }
retry_priority:
name: envoy. retry_priorities. previous_priorities
HedgePolicy:
hedge_on_per_try_timeout: true initial_requests: 1 additional_request_chance: 0. 2

11. 3 gRPC (client)

json
{
"methodConfig": [{
"name": [{"service": "payments. Service"}],
"timeout": "0. 8s",
"retryPolicy": {
"maxAttempts": 3,
"initialBackoff": "0. 05s",
"maxBackoff": "0. 2s",
"backoffMultiplier": 2. 0,
"retryableStatusCodes": ["UNAVAILABLE","DEADLINE_EXCEEDED"]
}
}]
}

12) Observability: Measure correctly

RED/USE metrics + OTel trails: 'trace _ id' through gateway-service-database-external APIs.
Individual labels: 'api _ version', 'region', 'partner', 'endpoint'.
Dashboards: P50/P95/P99, queue time, error mix, retry rate, cache hit.
Synthetics from target countries/ASN (TR/BR/EU) and by critical paths (reg→depozit, payout).

SLO example:

Core API: 'P95 ≤ 250ms', 'P99 ≤ 500ms' (30 days)
PSP webhook processing: 'P99 ≤ 60s' with retras
Freshness catalogue: 'P95 lag ≤ 30s'

13) FinOps и latency

Milliseconds are worth the money: estimate the $/ms winnings in CR/ARPPU.
Right-sizing: always faster ≠ more expensive; competent cache/formats are cheaper and faster.
Egress/edge: CDN reduces RTT and cost of outgoing traffic from the region.

14) Optimization checklist (step by step)

1. Set SLO and measure tails (P95/P99) by endpoints/regions/partners.
2. Turn on HTTP/2/3, TLS resumption, long-lived connections.
3. Squeeze and lose weight answers: Brotli/Gzip, fields on demand, pagination, ETag.
4. Set timeouts/retreats/breakers; add idempotency.
5. Cache/edge: hit-rate and correct TTL; stale modes.
6. DB: indexes, plans, pools, replicas; eliminate N + 1.
7. Asynchronize heavy: queues, butching, backpressure.
8. Hedge/collapse/priority for critical paths.
9. Warm-up and predictive scaling to picks (tournaments/matches).
10. Synthetics and alerts on P99 and queue time; regular perf reviews.

15) Anti-patterns

One global timeout "for all" and uncontrolled retrays (DDOS itself).
Sticking sessions without the need to → hot nodes.
Large JSONs without compression and field filters.

Synchronous calls to slow external APIs in the "hot path."

Absence of indexes/limits in the database; N + 1 in ORM.
No cache/edge and ETag; persistent complete responses.
A mix of business and technical errors into one "retractable" basket.

16) iGaming context/fintech: practical notes

Reg→depozit (CR): route priority, individual pool, 'P99 ≤ 500ms'; degradation - disable UI "decorations."

PSP integrations: concarrency limits, retrays by time codes, warm connections, regional egress-IP.
VIP operations: guaranteed pool/priority, bypassing public queues.
Tournaments/events: predictive scale, cache warm-up, prefetch.
Reporting: async and SLA on freshness, does not block the production path.

Total

Latency optimization is a discipline of balance: network (HTTP/2/3, TLS), protocols and cache, timeouts/retrays with idempotency, DB/caches, asynchronous patterns and P95/P99 observability. By focusing on the tails and eliminating the "narrow necks," you stabilize the response, improve conversion and lower the cost per millisecond - where it really affects the business.

→ Latency Technologies and Infrastructure and API Response Optimization

Latency and API Response Optimization

Total

Get in Touch

Quick Contact

The video will be updated soon

We are currently very busy with projects