GH GambleHub

Connection pools and latency

Connection pools and latency

1) Why pools are needed

Connections are expensive (TCP/TLS handshakes, authentication, warm-up). The pool allows:
  • Re-use ready-made connections (keep-alive) → below TTFB.
  • Control concurrency and give backpressure instead of an avalanche of retreats.
  • Reduce p95/p99 tails due to correct size and timeouts.

Key risks: waiting queues in the pool, head-of-line blocking, content for connections and a storm of retreats.

2) Math Base: How to Count Pool Size

We use Little's law: 'L = λ × W'. For a pool, this means:
  • 'λ' is the average request stream (RPS).
  • 'W'is the average connection busy per request (service time, including network latency and remote service operation).
  • Minimum pool size is'N _ min ≈ λ × W '.
  • Add a margin for variations and p99: headroom 20-50%.
  • Example: 300 RPS, average hold-time 40 ms → 'N _ min = 300 × 0. 04 = 12`. With a margin of 50%, 18 connections are →.

If the tails are large: consider 'W _ p95' or 'W _ p99' for critical paths - pools grow.

3) General design principles

1. Short data path: reuse (keep-alive, HTTP/2/3 multiplexing).
2. Limitation of parallelism: it is better to refuse quickly (429/503) than to fry the backend.
3. Timeouts> retreats: Set small timeouts and rare jitter retreats.
4. Client queues are shorter than server queues (fast fail-fast).

5. Backpressure: when the pool is full - immediately NACK/error/collbeck "later."

6. Isolation of pools by targets: DB, cache, external PSP - their limits.

4) HTTP/1. 1 vs HTTP/2/3, keep-alive

HTTP/1. 1: one connection request at a time (practically); need a pool with multiple connections per host.
HTTP/2: stream multiplexing in one TCP; fewer connections, but HOL-blocking on TCP is possible when packets are lost.
HTTP/3 (QUIC): streaming independence over UDP - fewer HOL problems, faster first bytes.

Settings that help:
  • keep-alive timeout 30-90s (by profile), limit of requests for connection (graceful recycle).
  • Preheating (preconnect) at the start of the worker.
  • Limit the maximum flows per HTTP/2 (e.g. 100-200).
NGINX (upstream keepalive):
nginx upstream backend {
server app-1:8080;
server app-2:8080;
keepalive 512;
keepalive_requests 1000;
keepalive_timeout 60s;
}
proxy_http_version 1. 1;
proxy_set_header Connection "";
Envoy (HTTP/2 pool):
yaml http2_protocol_options:
max_concurrent_streams: 200 common_http_protocol_options:
idle_timeout: 60s max_connection_duration: 3600s

5) DB Pools: PgBouncer, HikariCP, drivers

The goal is to limit competitive transactions and keep short connection holds.

5. 1 PgBouncer (PostgreSQL)

Modes: 'session '/' transaction '/' statement'. For API - more often transaction.
Important parameters are 'pool _ size', 'min _ pool _ size', 'reserve _ pool _ size', 'server _ idle _ timeout', 'query _ wait _ timeout'.

ini
[databases]
appdb = host=pg-primary port=5432 dbname=appdb

[pgbouncer]
pool_mode = transaction max_client_conn = 5000 default_pool_size = 100 min_pool_size = 20 reserve_pool_size = 20 query_wait_timeout = 500ms server_idle_timeout = 60 server_reset_query = DISCARD ALL

5. 2 HikariCP (Java)

Small, fast connections, hard timeouts.

properties dataSourceClassName=org. postgresql. ds. PGSimpleDataSource maximumPoolSize=30 minimumIdle=5 connectionTimeout=250 validationTimeout=200 idleTimeout=30000 maxLifetime=1800000 leakDetectionThreshold=5000
Rules:
  • `maximumPoolSize ≈ RPS × W × headroom`.
  • 'connectionTimeout'hundreds of milliseconds, not seconds.
  • Enable leak detection.

5. 3 Go/Node/Python - examples

Go http. Client (reuse + timeouts):
go tr:= &http. Transport{
MaxIdleConns:    512,
MaxIdleConnsPerHost: 128,
IdleConnTimeout:   60 time. Second,
TLSHandshakeTimeout: 2 time. Second,
}
c:= &http. Client{
Transport: tr,
Timeout:  2 time. Second ,//general
}
Node. js keep-alive agent:
js const http = require('http');
const agent = new http. Agent({ keepAlive: true, maxSockets: 200, maxFreeSockets: 64, timeout: 60000 });
psycopg / SQLAlchemy (Python):
python engine = create_engine(
url, pool_size=30, max_overflow=10, pool_recycle=1800, pool_pre_ping=True, pool_timeout=0. 25
)

6) Waiting queues and tail-latency

Tails occur when:
  • The pool is smaller than 'λ × W' → the connection queue is growing.
  • Load unevenness (bursts) without buffer and limits.
  • Long requests take up the connection and create a HOL.
Countermeasures:
  • Separate pools by request type (fast/slow).
  • Implement a client-side timeout. If expired - fast NACK.
  • Outlier detection and circuit-breaking on routes (Envoy, HAProxy).
  • Quotas for "heavy" routes, a separate pool for reports/exports.
Envoy circuit breaker (example):
yaml circuit_breakers:
thresholds:
- priority: DEFAULT max_connections: 200 max_pending_requests: 100 max_requests: 1000 max_retries: 2

7) Timeouts and retreats (correct order)

1. Connect timeout (short: 50-250 ms inside DC).
2. TLS handshake timeout (500–1000 ms вне DC).
3. Request/Read timeout (closer to the route SLO).
4. Retry: maximum 1 time, only for idempotent methods; jitter + backoff.
5. Retray budget: global limit as a percentage of RPS (for example, ≤ 10%).

8) Keep-alive, Nagle, protocols

Disable Nagle (TCP_NODELAY) for small message RPCs.
Enable HTTP keep-alive wherever possible.
Watch the TIME_WAIT - tune 'reuse '/' recycle' only if you understand the consequences; better - reuse connections, not kernel tuning.
TLS - Use session resumption and ALPN.

9) OS/Kernel tuning (with caution)

`net. core. somaxconn`, `net. ipv4. ip_local_port_range`, `net. ipv4. tcp_fin_timeout`.
Descriptors: 'nofile' ≥ 64k per proxy process.
IRQ balance, GRO/LRO - by traffic profile.
Priority - profile; tuning without metrics is often harmful.

10) Observability: what to measure

Pool utilization: busy/total, p50/p95 connection pending.
In-flight requests and their hold-time (route slices).
Retray error budget: proportion of repeats.
Connection churn Create/Close per second.
TCP/TLS: SYN RTT, handshakes, session reuse.
Для БД: active connections, waiting, long transactions, locks.

Графики: «RPS vs pool wait», «hold-time distribution», «reuse ratio», «circuit trips».

11) Case recipes

11. 1 API gateway → backend

HTTP/2 to backends, 'max _ concurrent _ streams = 200'.
A pool of 20-40 connections per service per gateway node.
Timeouts: connect 100ms, per-try 300-500ms, shared 1-2s, 1 retry with jitter.

11. 2 PostgreSQL → service via PgBouncer

'pool _ mode = transaction ', 'default _ pool _ size' by formula (RPS × W × 1. 3).
In 'connectionTimeout≤250ms', short transactions (<100ms).
Heavy reporting requests - separate pool/replica.

11. 3 gRPC internal

One channel (HTTP/2) per target host with a thread limit of 100-200.
Deadline on RPC on SLO route, retray only idempotent.
Long RPC trace sampling and hold-time metrics.

12) Implementation checklist (0-30 days)

0-7 days

Measure'W '(hold-time) on key routes/clients.
Calculate'N _ min = λ × W'and add 30-50% headroom.
Enable keep-alive and short connection timeouts.

8-20 days

Separate pools (fast/slow/external).
Type circuit-breakers and retray budgets.
Add dashboards: pool wait p95, reuse ratio, in-flight.

21-30 days

Load runs with bursts, chaos test "fall of the backend."

Tail optimization: isolation of heavy routes, local caches.
Document formulas and limits in the runbook'ax.

13) Anti-patterns

Pool size "at random" and no headroom.
Large connection waiting timeouts → long tails instead of fast failures.
Many retreats without jitter and idempotency → a storm.
One shared pool for all request types.
Long transactions keep the connection (DB) → starvation of the rest.
Disabled keep-alive or too small idle → churn limits and TTFB growth.

14) Maturity metrics

Pool wait p95 in prod <10% of total p95 route.
Reuse ratio (> 90% for internal HTTP;> 80% for external).
DB txn time p95 < 100–200 ms; percentage of long transactions <1%.
Retry rate <5% (and ≤ budget), errors due to timeouts are stable and predictable.
Documented pool settlement for all critical customers.

15) Conclusion

Effective connection pooling is queue engineering + timeout discipline. Measure'W ', calculate the pool' λ × W'with a margin, turn on keep-alive/HTTP2 +, separate slow paths, keep short timeouts and minimal retras with jitter. Add "pool wait vs latency" observability and circuit-breakers - and you get low TTFB, controlled p99 tail and surge resistance without overheating backends.

Contact

Get in Touch

Reach out with any questions or support needs.We are always ready to help!

Telegram
@Gamble_GC
Start Integration

Email is required. Telegram or WhatsApp — optional.

Your Name optional
Email optional
Subject optional
Message optional
Telegram optional
@
If you include Telegram — we will reply there as well, in addition to Email.
WhatsApp optional
Format: +country code and number (e.g., +380XXXXXXXXX).

By clicking this button, you agree to data processing.