Serverless functions and cold start

1) What is cold start and why it occurs

Cold start - additional latency when creating a new execution isolation (sandbox/container/micro-VM) before processing the event. Typical conveyor:

1. Medium allocation (container/micro-VM, runtime loading).

2. VPC/ENI priming, secrets, files, configuration.

3. Code initialization (import of modules, connection to the database, loading of models).

4. Handler execution.

Warm start (reuse) skips steps 1-3. The probability of cold start increases at peaks, after downtime, with increasing parallelism and with code/config updates.

2) How to measure and target (SLO)

Metrics: 'init _ duration' (initialization), 'duration _ total', "share of cold starts," p95/p99 latency, error connecting to dependencies after downtime.
Telemetry removal: platform logs + own labels (for example, 'cold = true/false' if there is a'context. isColdStart 'or its own flag in static closure).
SLO goals (example): API "login" p95 ≤ 200 ms, cold share ≤ 3%; background jobs - p95 ≤ 1 s. For "money" routes - separate, more stringent.

3) The main levers of cold start reduction

3. 1 Concarrency control and heating

Provisioned Concurrency/Min Instances: Holds N warm environments. Use for critical grips.
Warmers/warm-up: scheduled calls (cron/scheduler) to keep workers warm. Do it wisely (region, time, load).
Burst buffers: Raise the concurrency limit in advance before expected peaks.

3. 2 Packaging and dependencies

Small deploy-artifact: tree-shaking, '--only prod' dependencies, layers (AWS Layers) for large libs.
Lazy-init: import heavy modules inside the handler on first access; lazily open connections.
Warm resources: cache SDK/connection clients in the global scope to reuse at warm start.

3. 3 Network and VPC

Without VPC for functions that do not need privacy (otherwise ENI-attach adds tens to hundreds of ms).
If VPC is required, use the provider's VPC economy mode (ENI pools/optimization), proxy to the database (RDS Proxy/Cloud SQL Auth Proxy) and connection pooling.

3. 4 Languages and Runtimes

Node. js/Go start fastest; Python - usually fast but sensitive to large imports; Java/.NET is heavier without GraalVM/AOT and profiling.
For JVM, consider SnapStart/CRaC/Graal Native; for. NET — trimmed Self-Contained.

3. 5 Initialization and state

Put expensive initialization into the initialization hook (init phase), and not into the request path.
Use on-demand loading of configs/secrets with local cache (TTL).
Do not store the user state in memory - only cache signals/connectors.

4) Architectural patterns that reduce the impact of cold start

4. 1 Asynchron and queues

We accept the request → validate → put it in the queue/bus (SQS/PubSub/Queue Storage) → answer the 202/Accepted → process it with the background.
Suitable for non-interactive operations (payments, reports, heavy calculations).

4. 2 Precompute/Pre-cache

Generation of accesses/directories/feature flags in advance by triggers (CRON/events) and storage in the KV/cache/edge.

4. 3 Fan-out/Fan-in

We divide a long operation into several short functions (Map/Reduce-like) → less risk of timeouts and repeated cold.

4. 4 Edge-offload

The simplest checks (JWT/HMAC, geo-redirect, antiboot) are performed on edge (Workers/Functions @ Edge) in order to save RTT and unload origin.

5) Practice: configs and techniques

5. 1 AWS Lambda (provisioned + RDS Proxy)

hcl
Terraform sketch: enable provisioned concurrency on the sales version of the resource "aws_lambda_provisioned_concurrency_config" "api" {
function_name = aws_lambda_function. api. function_name qualifier   = aws_lambda_alias. prod. name provisioned_concurrent_executions = 20
}

RDS Proxy for connection pool "aws_db_proxy" "rds_proxy" {
name          = "pg-proxy"
engine_family     = "POSTGRESQL"
idle_client_timeout  = 1800 require_tls      = true
}

Node. js (lazy initialization and reuse):

js let pgClient ;//reuse between warm runs let cold = true;

exports. handler = async (event, ctx) => {
const isCold = cold; cold = false;
if (!pgClient) {
const { Client } = await import('pg');     // lazy import pgClient = new Client({ host: process. env. PG_PROXY, ssl: true });
await pgClient. connect();
}
const t0 = Date. now();
const data = await pgClient. query('select 1');
return {
statusCode: 200,
headers: { 'x-cold-start': String(isCold), 'x-elapsed-ms': String(Date. now()-t0) },
body: JSON. stringify({ ok: true })
};
};

5. 2 GCP Cloud Run / Cloud Functions (min instances)

yaml
Cloud Run service. yaml apiVersion: serving. knative. dev/v1 kind: Service metadata: { name: api }
spec:
template:
metadata:
annotations:
autoscaling. knative. dev/minScale: "5" # keep warm run containers. googleapis. com/cpu-throttling: "false"
spec:
containerConcurrency: 80 containers:
- image: gcr. io/proj/api:latest env:
- { name: DB_HOST, value: "10. 0. 0. 5" }

5. 3 Azure Functions (AlwaysOn/PreWarm)

Premium/Elastic plans with AlwaysOn; pre-warmed instances ≥ predictive p95 concurrency.

6) Timeouts, retreats, deadlines

Pass the general deadline (client-side) through the header ('x-deadline-ms '/' grpc-timeout'), shorten the'per-hop timeout' inside the function.
Repeats only for idempotent operations; Use Idempotency-Key and deduplication.
For the front API - hedging (duplicate request after p90) and circuit breaker for long-distance dependencies.

7) Working with databases/caches/secrets

Pools/proxies (RDS Proxy/Cloud SQL Proxy/pgBouncer) instead of thousands of short connections.
Short TTL secret + in memory cache with background update.
Cache (Redis/Memcached/KV): loading "heavy" directories on init, but with a time limit.

8) Code organization and assembly

Separate handlers for narrow use-cases; one "thick" bundle = long init.
ESBuild/Rollup: exclude unused, combine only critical.
Layers/Extensions - for large libs (OpenSSL models, SDK) to reuse the provider's cache.

9) Peak testing and simulation

Synthetics of "cold" starts: forcibly turn off min instances and drive parallel traffic in steps.
A/B: compare the share of cold, p95, connection error to DB/secrets, costs.
GameDay: peak load × 2 from the all-time high, warm-up off.

10) Cost (FinOps)

Min instances/provisioned concurrency cost money - enable only for hot routes.
Reduce runtime: cache, short timeouts, avoiding unnecessary SDKs.
Consider egress (calls to external APIs) and logging (the volume of logs grows quickly at cold peaks).

11) Antipatterns

One monolithic handler with tens of megabytes of dependencies.
Mandatory connection to the database on each call (without reuse/proxy).

VPC for all functions "just in case."

Long timeouts and blind retreats → "tails" and phantom write-offs.
Warming up "everything in a row" around the clock.
Secret initialization in the request path (lentice init> 100 ms - transfer to init/cache).

12) Specifics of iGaming/Finance

Money paths (deposits/withdrawals): keep provisioned/min instances, separate SLOs, strict limitation of timeouts and repetitions (idempotency is mandatory).
KYC/PSP: unstable external APIs - wrap in queue + worker, at the front - 202/polling/webhook.
Regulatory and audit: immutable logs (WORM), inbound event log with'Idempotency-Key ', correlation' trace _ id '.
Data residency: deploy functions that process PII in regional accounts/projects; no edge caches with PII.

13) Prod Readiness Checklist

SLI/SLO defined: p95/p99, cold fraction, route targets.
Provisioned/min instances on critical functions are enabled; concarrency prediction.
Bundle minimized; heavy faces are carried out into layers; lazy-import/initialization.
Reuse SDK/DB clients; RDS/SQL Proxy is configured; connection pool.
VPC only where needed; ENI/proxies optimized; secrets via manager + local TTL cache.
Timeouts/deadlines/retreats: backoff + jitter; only idempotent repeats.
Synthetics "cold" + load tests; alerts for growth in the share of cold and p99.
Runbooks: how to increase provisioned, how to change minScale, how to include degradation.
For iGaming: separate SLO/dashboards "ways of money," Idempotency-Key, WORM audit.

14) TL; DR

Cold start is inevitable, but manageable: keep warm instances where it matters, reduce the bundle, apply lazy-init and reuse connections, avoid unnecessary VPC, take out heavy operations in line/workers and use edge for easy rules. For critical financial paths - separate SLOs, idempotency and strict timeouts; measure the share of cold and turn on warming only where it pays off.

Serverless functions and cold start

Get in Touch

Quick Contact

The video will be updated soon

We are currently very busy with projects