Connection pools and latency
Connection pools and latency
1) Why pools are needed
Connections are expensive (TCP/TLS handshakes, authentication, warm-up). The pool allows:- Re-use ready-made connections (keep-alive) → below TTFB.
- Control concurrency and give backpressure instead of an avalanche of retreats.
- Reduce p95/p99 tails due to correct size and timeouts.
Key risks: waiting queues in the pool, head-of-line blocking, content for connections and a storm of retreats.
2) Math Base: How to Count Pool Size
We use Little's law: 'L = λ × W'. For a pool, this means:- 'λ' is the average request stream (RPS).
- 'W'is the average connection busy per request (service time, including network latency and remote service operation).
- Minimum pool size is'N _ min ≈ λ × W '.
- Add a margin for variations and p99: headroom 20-50%.
- Example: 300 RPS, average hold-time 40 ms → 'N _ min = 300 × 0. 04 = 12`. With a margin of 50%, 18 connections are →.
If the tails are large: consider 'W _ p95' or 'W _ p99' for critical paths - pools grow.
3) General design principles
1. Short data path: reuse (keep-alive, HTTP/2/3 multiplexing).
2. Limitation of parallelism: it is better to refuse quickly (429/503) than to fry the backend.
3. Timeouts> retreats: Set small timeouts and rare jitter retreats.
4. Client queues are shorter than server queues (fast fail-fast).
5. Backpressure: when the pool is full - immediately NACK/error/collbeck "later."
6. Isolation of pools by targets: DB, cache, external PSP - their limits.
4) HTTP/1. 1 vs HTTP/2/3, keep-alive
HTTP/1. 1: one connection request at a time (practically); need a pool with multiple connections per host.
HTTP/2: stream multiplexing in one TCP; fewer connections, but HOL-blocking on TCP is possible when packets are lost.
HTTP/3 (QUIC): streaming independence over UDP - fewer HOL problems, faster first bytes.
- keep-alive timeout 30-90s (by profile), limit of requests for connection (graceful recycle).
- Preheating (preconnect) at the start of the worker.
- Limit the maximum flows per HTTP/2 (e.g. 100-200).
nginx upstream backend {
server app-1:8080;
server app-2:8080;
keepalive 512;
keepalive_requests 1000;
keepalive_timeout 60s;
}
proxy_http_version 1. 1;
proxy_set_header Connection "";
Envoy (HTTP/2 pool):
yaml http2_protocol_options:
max_concurrent_streams: 200 common_http_protocol_options:
idle_timeout: 60s max_connection_duration: 3600s
5) DB Pools: PgBouncer, HikariCP, drivers
The goal is to limit competitive transactions and keep short connection holds.
5. 1 PgBouncer (PostgreSQL)
Modes: 'session '/' transaction '/' statement'. For API - more often transaction.
Important parameters are 'pool _ size', 'min _ pool _ size', 'reserve _ pool _ size', 'server _ idle _ timeout', 'query _ wait _ timeout'.
ini
[databases]
appdb = host=pg-primary port=5432 dbname=appdb
[pgbouncer]
pool_mode = transaction max_client_conn = 5000 default_pool_size = 100 min_pool_size = 20 reserve_pool_size = 20 query_wait_timeout = 500ms server_idle_timeout = 60 server_reset_query = DISCARD ALL
5. 2 HikariCP (Java)
Small, fast connections, hard timeouts.
properties dataSourceClassName=org. postgresql. ds. PGSimpleDataSource maximumPoolSize=30 minimumIdle=5 connectionTimeout=250 validationTimeout=200 idleTimeout=30000 maxLifetime=1800000 leakDetectionThreshold=5000
Rules:
- `maximumPoolSize ≈ RPS × W × headroom`.
- 'connectionTimeout'hundreds of milliseconds, not seconds.
- Enable leak detection.
5. 3 Go/Node/Python - examples
Go http. Client (reuse + timeouts):go tr:= &http. Transport{
MaxIdleConns: 512,
MaxIdleConnsPerHost: 128,
IdleConnTimeout: 60 time. Second,
TLSHandshakeTimeout: 2 time. Second,
}
c:= &http. Client{
Transport: tr,
Timeout: 2 time. Second ,//general
}
Node. js keep-alive agent:
js const http = require('http');
const agent = new http. Agent({ keepAlive: true, maxSockets: 200, maxFreeSockets: 64, timeout: 60000 });
psycopg / SQLAlchemy (Python):
python engine = create_engine(
url, pool_size=30, max_overflow=10, pool_recycle=1800, pool_pre_ping=True, pool_timeout=0. 25
)
6) Waiting queues and tail-latency
Tails occur when:- The pool is smaller than 'λ × W' → the connection queue is growing.
- Load unevenness (bursts) without buffer and limits.
- Long requests take up the connection and create a HOL.
- Separate pools by request type (fast/slow).
- Implement a client-side timeout. If expired - fast NACK.
- Outlier detection and circuit-breaking on routes (Envoy, HAProxy).
- Quotas for "heavy" routes, a separate pool for reports/exports.
yaml circuit_breakers:
thresholds:
- priority: DEFAULT max_connections: 200 max_pending_requests: 100 max_requests: 1000 max_retries: 2
7) Timeouts and retreats (correct order)
1. Connect timeout (short: 50-250 ms inside DC).
2. TLS handshake timeout (500–1000 ms вне DC).
3. Request/Read timeout (closer to the route SLO).
4. Retry: maximum 1 time, only for idempotent methods; jitter + backoff.
5. Retray budget: global limit as a percentage of RPS (for example, ≤ 10%).
8) Keep-alive, Nagle, protocols
Disable Nagle (TCP_NODELAY) for small message RPCs.
Enable HTTP keep-alive wherever possible.
Watch the TIME_WAIT - tune 'reuse '/' recycle' only if you understand the consequences; better - reuse connections, not kernel tuning.
TLS - Use session resumption and ALPN.
9) OS/Kernel tuning (with caution)
`net. core. somaxconn`, `net. ipv4. ip_local_port_range`, `net. ipv4. tcp_fin_timeout`.
Descriptors: 'nofile' ≥ 64k per proxy process.
IRQ balance, GRO/LRO - by traffic profile.
Priority - profile; tuning without metrics is often harmful.
10) Observability: what to measure
Pool utilization: busy/total, p50/p95 connection pending.
In-flight requests and their hold-time (route slices).
Retray error budget: proportion of repeats.
Connection churn Create/Close per second.
TCP/TLS: SYN RTT, handshakes, session reuse.
Для БД: active connections, waiting, long transactions, locks.
Графики: «RPS vs pool wait», «hold-time distribution», «reuse ratio», «circuit trips».
11) Case recipes
11. 1 API gateway → backend
HTTP/2 to backends, 'max _ concurrent _ streams = 200'.
A pool of 20-40 connections per service per gateway node.
Timeouts: connect 100ms, per-try 300-500ms, shared 1-2s, 1 retry with jitter.
11. 2 PostgreSQL → service via PgBouncer
'pool _ mode = transaction ', 'default _ pool _ size' by formula (RPS × W × 1. 3).
In 'connectionTimeout≤250ms', short transactions (<100ms).
Heavy reporting requests - separate pool/replica.
11. 3 gRPC internal
One channel (HTTP/2) per target host with a thread limit of 100-200.
Deadline on RPC on SLO route, retray only idempotent.
Long RPC trace sampling and hold-time metrics.
12) Implementation checklist (0-30 days)
0-7 days
Measure'W '(hold-time) on key routes/clients.
Calculate'N _ min = λ × W'and add 30-50% headroom.
Enable keep-alive and short connection timeouts.
8-20 days
Separate pools (fast/slow/external).
Type circuit-breakers and retray budgets.
Add dashboards: pool wait p95, reuse ratio, in-flight.
21-30 days
Load runs with bursts, chaos test "fall of the backend."
Tail optimization: isolation of heavy routes, local caches.
Document formulas and limits in the runbook'ax.
13) Anti-patterns
Pool size "at random" and no headroom.
Large connection waiting timeouts → long tails instead of fast failures.
Many retreats without jitter and idempotency → a storm.
One shared pool for all request types.
Long transactions keep the connection (DB) → starvation of the rest.
Disabled keep-alive or too small idle → churn limits and TTFB growth.
14) Maturity metrics
Pool wait p95 in prod <10% of total p95 route.
Reuse ratio (> 90% for internal HTTP;> 80% for external).
DB txn time p95 < 100–200 ms; percentage of long transactions <1%.
Retry rate <5% (and ≤ budget), errors due to timeouts are stable and predictable.
Documented pool settlement for all critical customers.
15) Conclusion
Effective connection pooling is queue engineering + timeout discipline. Measure'W ', calculate the pool' λ × W'with a margin, turn on keep-alive/HTTP2 +, separate slow paths, keep short timeouts and minimal retras with jitter. Add "pool wait vs latency" observability and circuit-breakers - and you get low TTFB, controlled p99 tail and surge resistance without overheating backends.