GH GambleHub

→ Latency Technologies and Infrastructure and API Response Optimization

Latency and API Response Optimization

1) What is "latency" and why it matters

Latency - total request delay: network (DNS + TCP + TLS + RTT), balancer/gateway, application, DB/caches/queues, external integrations. P95/P99 are critical for business, not average: it is the "tail" that destroys UX, CR and SLO.

Basic SLIs:
  • 'SLI _ latency _ P95 = P95 (response _ time) 'in 5/30 minutes
  • 'SLI _ latency _ P99 = P99 (response _ time) '
  • 'SLI _ queue _ time = P95 (worker _ queue _ time) '
  • 'SLI _ ext _ call _ P95 = P95 (external _ provider _ latency) '

2) Delay source map (and where to dig)

1. Network and protocols: DNS, TCP handshakes, TLS, head-of-line (HTTP/1. 1), packet loss, BBR/ECN.
2. Gateway/balancer: slow health-check, invalid timeouts, hot bottoms.
3. Application: locks, GC/stop-the-world, synchronous I/O, contention.
4. Repositories: slow database queries, no indexes, cold pages.
5. External services: PSP/KYC, third-party APIs (narrow SLAs).
6. Queues and background jabs: overloaded workers, no backpressure.
7. Cache/edge: cache misses, weak TTL, invalid disability.

3) Network and protocols

3. 1 DNS/TCP/TLS

DNS prefetch/preconnect at the front, long-lived IP to PSP.
Keep-Alive/connection pooling in clients; on the server - aggregate connections.
TLS: resumption/Session Tickets, a modern cipher package; avoid 0-RTT for unsafe idempotent operations.
TCP: disable Nagle ('TCP _ NODELAY') for chats/small packets; tune'initial window ', enable BBR where appropriate.

3. 2 HTTP/2 и HTTP/3

HTTP/2: multiplexing reduces HTTP/1 HOL locks. 1; monitor thread priorities.
HTTP/3/QUIC: lower impact of losses/RTT; useful on a mobile/international network.
Header compression: HPACK/QPACK, but keep a reasonable header size.

3. 3 Balancing/routing

Locality-aware (zoning), EWMA/least-request versus hot instances.
Sticking sessions - only if there is a state; otherwise stateless + shared cache/sessions.

4) Formats, payload, compression

Squeeze: Brotli (text), Gzip as fallback; binary formats: Protobuf/Avro for gRPC/internal APIs.
Reduce payload: selective fields ('fields =...'), pagination, conditional GET (ETag/If-None-Match), delta responses.
GraphQL: persisted queries, prohibition of "fat" fragments, limits of depth and complexity.
Avoid N + 1: Joyns/preposition, butch endpoints for aggregates.

5) Timeouts, retreats, idempotence

Chain timeouts client <gateway <appa <storage/external call.
Retrai with backoff + jitter, only for temporary errors; expose budgets on retrayes.
Idempotence: query key/token + save result; retreats should not duplicate operations (especially finances).
Circuit Breaker: Open when degraded; hedged/backup requests for tails (send duplicate via P95).

6) Queues, asynchrony and backpressure

Do not block the synchronous path: heavy operations (KYC scans, reporting) - in the background.
Backpressure: limit consumption from the queue, fix concurrency.
Batching/coalescing - Combine small transactions (for example, updating balances with aggregation).
Outbox/Inbox: guaranteed delivery of events in case of failures.

7) Application: runtimes and pools

Pools of connections to databases/caches/HTTP; limit them so as not to "strangle" the backend.
JVM: profile GC (G1/ZGC), avoid large allocations; .NET - ThreadPool/async; Node. js - do not block the event loop, take out the CPU-heavy.
Python: asynchronous drivers (asyncpg/httpx), uvloop; CPU tasks via worker-pool.
Warm-up: warm up JIT/caches, "warm pools" instances to peaks.

8) Databases and caches

Indexes and plans: regular 'EXPLAIN', auto-vacuum/analysis, scan limit.
Connection pooling (PgBouncer/Multiplexing), short transactions.
Cache strategies: read-through, write-through/write-behind; TTL + disability by event.
Sharding/replicas: reading from slaves, "hot keys" - local caches (near-cache).

9) Caching and edge

CDN/edge for static/directories, cache API responses (if safe) for 'Cache-Control', 'ETag'.
Stale-while-revalidate and stale-if-error for UX-stability.
Geo-allocation: nearest RRR/region reduces RTT.

10) Architectural patterns vs. P99 tails

Hedged requests - Duplicate a slow request to another instance after the threshold.
Request collapsing: one "leading" request to the database, the rest are waiting for the result (avoids storms).
Priority: VIP/critical operations - dedicated pool/priority.
Graceful degradation: Trim minor fields/widgets when overloaded.

11) Configs (approximately)

11. 1 NGINX (Timeouts/Compression)

nginx proxy_connect_timeout  1s;
proxy_send_timeout   2s;
proxy_read_timeout   2s;
send_timeout      2s;

gzip on;
gzip_types application/json text/plain text/css application/javascript;

11. 2 Envoy (hedge + retry budget)

yaml
RetryPolicy:
retry_on: 5xx,reset,connect-failure num_retries: 2 per_try_timeout: 300ms retry_back_off: { base_interval: 50ms, max_interval: 200ms }
retry_priority:
name: envoy. retry_priorities. previous_priorities
HedgePolicy:
hedge_on_per_try_timeout: true initial_requests: 1 additional_request_chance: 0. 2

11. 3 gRPC (client)

json
{
"methodConfig": [{
"name": [{"service": "payments. Service"}],
"timeout": "0. 8s",
"retryPolicy": {
"maxAttempts": 3,
"initialBackoff": "0. 05s",
"maxBackoff": "0. 2s",
"backoffMultiplier": 2. 0,
"retryableStatusCodes": ["UNAVAILABLE","DEADLINE_EXCEEDED"]
}
}]
}

12) Observability: Measure correctly

RED/USE metrics + OTel trails: 'trace _ id' through gateway-service-database-external APIs.
Individual labels: 'api _ version', 'region', 'partner', 'endpoint'.
Dashboards: P50/P95/P99, queue time, error mix, retry rate, cache hit.
Synthetics from target countries/ASN (TR/BR/EU) and by critical paths (reg→depozit, payout).

SLO example:
  • Core API: 'P95 ≤ 250ms', 'P99 ≤ 500ms' (30 days)
  • PSP webhook processing: 'P99 ≤ 60s' with retras
  • Freshness catalogue: 'P95 lag ≤ 30s'

13) FinOps и latency

Milliseconds are worth the money: estimate the $/ms winnings in CR/ARPPU.
Right-sizing: always faster ≠ more expensive; competent cache/formats are cheaper and faster.
Egress/edge: CDN reduces RTT and cost of outgoing traffic from the region.

14) Optimization checklist (step by step)

1. Set SLO and measure tails (P95/P99) by endpoints/regions/partners.
2. Turn on HTTP/2/3, TLS resumption, long-lived connections.
3. Squeeze and lose weight answers: Brotli/Gzip, fields on demand, pagination, ETag.
4. Set timeouts/retreats/breakers; add idempotency.
5. Cache/edge: hit-rate and correct TTL; stale modes.
6. DB: indexes, plans, pools, replicas; eliminate N + 1.
7. Asynchronize heavy: queues, butching, backpressure.
8. Hedge/collapse/priority for critical paths.
9. Warm-up and predictive scaling to picks (tournaments/matches).
10. Synthetics and alerts on P99 and queue time; regular perf reviews.

15) Anti-patterns

One global timeout "for all" and uncontrolled retrays (DDOS itself).
Sticking sessions without the need to → hot nodes.
Large JSONs without compression and field filters.

Synchronous calls to slow external APIs in the "hot path."

Absence of indexes/limits in the database; N + 1 in ORM.
No cache/edge and ETag; persistent complete responses.
A mix of business and technical errors into one "retractable" basket.

16) iGaming context/fintech: practical notes

Reg→depozit (CR): route priority, individual pool, 'P99 ≤ 500ms'; degradation - disable UI "decorations."

PSP integrations: concarrency limits, retrays by time codes, warm connections, regional egress-IP.
VIP operations: guaranteed pool/priority, bypassing public queues.
Tournaments/events: predictive scale, cache warm-up, prefetch.
Reporting: async and SLA on freshness, does not block the production path.

Total

Latency optimization is a discipline of balance: network (HTTP/2/3, TLS), protocols and cache, timeouts/retrays with idempotency, DB/caches, asynchronous patterns and P95/P99 observability. By focusing on the tails and eliminating the "narrow necks," you stabilize the response, improve conversion and lower the cost per millisecond - where it really affects the business.

Contact

Get in Touch

Reach out with any questions or support needs.We are always ready to help!

Telegram
@Gamble_GC
Start Integration

Email is required. Telegram or WhatsApp — optional.

Your Name optional
Email optional
Subject optional
Message optional
Telegram optional
@
If you include Telegram — we will reply there as well, in addition to Email.
WhatsApp optional
Format: +country code and number (e.g., +380XXXXXXXXX).

By clicking this button, you agree to data processing.