Connection pools and latency

1) Why pools are needed

Connections are expensive (TCP/TLS handshakes, authentication, warm-up). The pool allows:

Re-use ready-made connections (keep-alive) → below TTFB.
Control concurrency and give backpressure instead of an avalanche of retreats.
Reduce p95/p99 tails due to correct size and timeouts.

Key risks: waiting queues in the pool, head-of-line blocking, content for connections and a storm of retreats.

2) Math Base: How to Count Pool Size

We use Little's law: 'L = λ × W'. For a pool, this means:

'λ' is the average request stream (RPS).
'W'is the average connection busy per request (service time, including network latency and remote service operation).
Minimum pool size is'N _ min ≈ λ × W '.
Add a margin for variations and p99: headroom 20-50%.
Example: 300 RPS, average hold-time 40 ms → 'N _ min = 300 × 0. 04 = 12`. With a margin of 50%, 18 connections are →.

If the tails are large: consider 'W _ p95' or 'W _ p99' for critical paths - pools grow.

3) General design principles

1. Short data path: reuse (keep-alive, HTTP/2/3 multiplexing).
2. Limitation of parallelism: it is better to refuse quickly (429/503) than to fry the backend.
3. Timeouts> retreats: Set small timeouts and rare jitter retreats.
4. Client queues are shorter than server queues (fast fail-fast).

5. Backpressure: when the pool is full - immediately NACK/error/collbeck "later."

6. Isolation of pools by targets: DB, cache, external PSP - their limits.

4) HTTP/1. 1 vs HTTP/2/3, keep-alive

HTTP/1. 1: one connection request at a time (practically); need a pool with multiple connections per host.
HTTP/2: stream multiplexing in one TCP; fewer connections, but HOL-blocking on TCP is possible when packets are lost.
HTTP/3 (QUIC): streaming independence over UDP - fewer HOL problems, faster first bytes.

Settings that help:

keep-alive timeout 30-90s (by profile), limit of requests for connection (graceful recycle).
Preheating (preconnect) at the start of the worker.
Limit the maximum flows per HTTP/2 (e.g. 100-200).

NGINX (upstream keepalive):

nginx upstream backend {
server app-1:8080;
server app-2:8080;
keepalive 512;
keepalive_requests 1000;
keepalive_timeout 60s;
}
proxy_http_version 1. 1;
proxy_set_header Connection "";

Envoy (HTTP/2 pool):

yaml http2_protocol_options:
max_concurrent_streams: 200 common_http_protocol_options:
idle_timeout: 60s max_connection_duration: 3600s

5) DB Pools: PgBouncer, HikariCP, drivers

The goal is to limit competitive transactions and keep short connection holds.

5. 1 PgBouncer (PostgreSQL)

Modes: 'session '/' transaction '/' statement'. For API - more often transaction.
Important parameters are 'pool _ size', 'min _ pool _ size', 'reserve _ pool _ size', 'server _ idle _ timeout', 'query _ wait _ timeout'.

ini
[databases]
appdb = host=pg-primary port=5432 dbname=appdb

[pgbouncer]
pool_mode = transaction max_client_conn = 5000 default_pool_size = 100 min_pool_size = 20 reserve_pool_size = 20 query_wait_timeout = 500ms server_idle_timeout = 60 server_reset_query = DISCARD ALL

5. 2 HikariCP (Java)

Small, fast connections, hard timeouts.

properties dataSourceClassName=org. postgresql. ds. PGSimpleDataSource maximumPoolSize=30 minimumIdle=5 connectionTimeout=250 validationTimeout=200 idleTimeout=30000 maxLifetime=1800000 leakDetectionThreshold=5000

Rules:

`maximumPoolSize ≈ RPS × W × headroom`.
'connectionTimeout'hundreds of milliseconds, not seconds.
Enable leak detection.

5. 3 Go/Node/Python - examples

Go http. Client (reuse + timeouts):

go tr:= &http. Transport{
MaxIdleConns:    512,
MaxIdleConnsPerHost: 128,
IdleConnTimeout:   60 time. Second,
TLSHandshakeTimeout: 2 time. Second,
}
c:= &http. Client{
Transport: tr,
Timeout:  2 time. Second ,//general
}

Node. js keep-alive agent:

js const http = require('http');
const agent = new http. Agent({ keepAlive: true, maxSockets: 200, maxFreeSockets: 64, timeout: 60000 });

psycopg / SQLAlchemy (Python):

python engine = create_engine(
url, pool_size=30, max_overflow=10, pool_recycle=1800, pool_pre_ping=True, pool_timeout=0. 25
)

6) Waiting queues and tail-latency

Tails occur when:

The pool is smaller than 'λ × W' → the connection queue is growing.
Load unevenness (bursts) without buffer and limits.
Long requests take up the connection and create a HOL.

Countermeasures:

Separate pools by request type (fast/slow).
Implement a client-side timeout. If expired - fast NACK.
Outlier detection and circuit-breaking on routes (Envoy, HAProxy).
Quotas for "heavy" routes, a separate pool for reports/exports.

Envoy circuit breaker (example):

yaml circuit_breakers:
thresholds:
- priority: DEFAULT max_connections: 200 max_pending_requests: 100 max_requests: 1000 max_retries: 2

7) Timeouts and retreats (correct order)

1. Connect timeout (short: 50-250 ms inside DC).
2. TLS handshake timeout (500–1000 ms вне DC).
3. Request/Read timeout (closer to the route SLO).
4. Retry: maximum 1 time, only for idempotent methods; jitter + backoff.
5. Retray budget: global limit as a percentage of RPS (for example, ≤ 10%).

8) Keep-alive, Nagle, protocols

Disable Nagle (TCP_NODELAY) for small message RPCs.
Enable HTTP keep-alive wherever possible.
Watch the TIME_WAIT - tune 'reuse '/' recycle' only if you understand the consequences; better - reuse connections, not kernel tuning.
TLS - Use session resumption and ALPN.

9) OS/Kernel tuning (with caution)

`net. core. somaxconn`, `net. ipv4. ip_local_port_range`, `net. ipv4. tcp_fin_timeout`.
Descriptors: 'nofile' ≥ 64k per proxy process.
IRQ balance, GRO/LRO - by traffic profile.
Priority - profile; tuning without metrics is often harmful.

10) Observability: what to measure

Pool utilization: busy/total, p50/p95 connection pending.
In-flight requests and their hold-time (route slices).
Retray error budget: proportion of repeats.
Connection churn Create/Close per second.
TCP/TLS: SYN RTT, handshakes, session reuse.
Для БД: active connections, waiting, long transactions, locks.

Графики: «RPS vs pool wait», «hold-time distribution», «reuse ratio», «circuit trips».

11) Case recipes

11. 1 API gateway → backend

HTTP/2 to backends, 'max _ concurrent _ streams = 200'.
A pool of 20-40 connections per service per gateway node.
Timeouts: connect 100ms, per-try 300-500ms, shared 1-2s, 1 retry with jitter.

11. 2 PostgreSQL → service via PgBouncer

'pool _ mode = transaction ', 'default _ pool _ size' by formula (RPS × W × 1. 3).
In 'connectionTimeout≤250ms', short transactions (<100ms).
Heavy reporting requests - separate pool/replica.

11. 3 gRPC internal

One channel (HTTP/2) per target host with a thread limit of 100-200.
Deadline on RPC on SLO route, retray only idempotent.
Long RPC trace sampling and hold-time metrics.

12) Implementation checklist (0-30 days)

0-7 days

Measure'W '(hold-time) on key routes/clients.
Calculate'N _ min = λ × W'and add 30-50% headroom.
Enable keep-alive and short connection timeouts.

8-20 days

Separate pools (fast/slow/external).
Type circuit-breakers and retray budgets.
Add dashboards: pool wait p95, reuse ratio, in-flight.

21-30 days

Load runs with bursts, chaos test "fall of the backend."

Tail optimization: isolation of heavy routes, local caches.
Document formulas and limits in the runbook'ax.

13) Anti-patterns

Pool size "at random" and no headroom.
Large connection waiting timeouts → long tails instead of fast failures.
Many retreats without jitter and idempotency → a storm.
One shared pool for all request types.
Long transactions keep the connection (DB) → starvation of the rest.
Disabled keep-alive or too small idle → churn limits and TTFB growth.

14) Maturity metrics

Pool wait p95 in prod <10% of total p95 route.
Reuse ratio (> 90% for internal HTTP;> 80% for external).
DB txn time p95 < 100–200 ms; percentage of long transactions <1%.
Retry rate <5% (and ≤ budget), errors due to timeouts are stable and predictable.
Documented pool settlement for all critical customers.

15) Conclusion

Effective connection pooling is queue engineering + timeout discipline. Measure'W ', calculate the pool' λ × W'with a margin, turn on keep-alive/HTTP2 +, separate slow paths, keep short timeouts and minimal retras with jitter. Add "pool wait vs latency" observability and circuit-breakers - and you get low TTFB, controlled p99 tail and surge resistance without overheating backends.

Connection pools and latency