Service Mesh: Istio, Linkerd
Service Mesh: Istio, Linkerd
1) What is Service Mesh and when is it needed
Service Mesh is a network data/control plane layer providing end-to-end mTLS, routing, fault tolerance, and observability between services without code rewriting.
Objectives:- Default security (zero-trust, service identities, access policy).
- Traffic management (Canary/Blue-Green, A/B, shadowing).
- Reliability (retras, timeouts, circuit breaking).
- Observability (metrics, logs, trails).
- Operational standardization (policies as code, GitOps).
- Many microservices with multilinguality and mTLS requirement.
- Need advanced routing/experimentation scenarios without changing the application.
- There are audit/policy requirements at the network level.
2) Istio vs Linkerd - a brief comparison
3) Architecture and deployment models
3. 1 Sidecar mesh (classic)
Each Pod receives a proxy sidecar.
Pros: maturity, full L7 control.
Cons: CPU/RAM overhead, complexity of depletion/debugging.
3. 2 Istio Ambient Mesh
ztunnel (L4) on node + waypoint proxies (L7) as required.
Pros: lower cost and complexity, gradual inclusion of L7.
Cons: Newer, not all L7 cases are available without waypoint.
4) Identity and mTLS (zero-trust)
4. 1 SPIFFE/SPIRE and certificates
Each workout is assigned a SPIFFE ID: 'spiffe ://cluster. local/ns/NS/sa/SA`.
Authentication: mutual TLS between services.
Key rotation - automatically (short TTL).
4. 2 Istio (PeerAuthentication + DestinationRule)
yaml apiVersion: security. istio. io/v1 kind: PeerAuthentication metadata: { name: default, namespace: payments }
spec:
mtls: { mode: STRICT }
apiVersion: networking. istio. io/v1 kind: DestinationRule metadata: { name: payments-dr, namespace: payments }
spec:
host: payments. svc. cluster. local trafficPolicy:
tls: { mode: ISTIO_MUTUAL }
4. 3 Linkerd - mTLS default
Enabled after 'linkerd install' + 'linkerd inject'.
Clusters - own trust-anchor, automatic rotation.
5) Traffic management
5. 1 Istio: VirtualService (routes, canaries)
yaml apiVersion: networking. istio. io/v1 kind: VirtualService metadata: { name: payments }
spec:
hosts: ["payments"]
http:
- route:
- destination: { host: payments, subset: v1 } # stable weight: 90
- destination: { host: payments, subset: v2 } # canary weight: 10 retries: { attempts: 2, perTryTimeout: 300ms }
timeout: 2s
DestinationRule (LB/CB):
yaml apiVersion: networking. istio. io/v1 kind: DestinationRule metadata: { name: payments }
spec:
host: payments subsets:
- name: v1 labels: { version: v1 }
- name: v2 labels: { version: v2 }
trafficPolicy:
loadBalancer: { simple: LEAST_CONN }
outlierDetection:
consecutive5xx: 5 interval: 5s baseEjectionTime: 30s maxEjectionPercent: 50
5. 2 Linkerd: ServiceProfile + TrafficSplit
yaml apiVersion: linkerd. io/v1alpha2 kind: ServiceProfile metadata:
name: payments. default. svc. cluster. local spec:
routes:
- name: POST /withdraw condition:
method: POST pathRegex: "/withdraw"
isRetryable: true timeout: 2s apiVersion: split. smi-spec. io/v1alpha2 kind: TrafficSplit metadata: { name: payments }
spec:
service: payments backends:
- service: payments-v1 weight: 90
- service: payments-v2 weight: 10
6) Ingress/Egress and API gateways
Istio Gateway (ingress/egress) - controls incoming/outgoing traffic, TLS termination, mTLS passthrough.
Linkerd works with existing ingress controllers (NGINX/Contour/Traefik); egress - via NetworkPolicy/egress-gateway-patterns.
Egress policies: domain whitelists, SNI-policy, direct internet ban.
7) Authorization and policy
7. 1 Istio AuthorizationPolicy (RBAC/ABAC)
yaml apiVersion: security. istio. io/v1 kind: AuthorizationPolicy metadata: { name: allow-withdraw, namespace: payments }
spec:
selector: { matchLabels: { app: payments } }
action: ALLOW rules:
- from:
- source:
principals: ["spiffe://cluster. local/ns/api/sa/gateway"]
to:
- operation:
methods: ["POST"]
paths: ["/withdraw"]
when:
- key: request. auth. claims[role]
values: ["cashout"]
7. 2 Linkerd policy (server + serverauthorization)
yaml apiVersion: policy. linkerd. io/v1beta3 kind: Server metadata: { name: payments-server, namespace: payments }
spec:
podSelector: { matchLabels: { app: payments } }
port: 8080 apiVersion: policy. linkerd. io/v1beta3 kind: ServerAuthorization metadata: { name: allow-gateway, namespace: payments }
spec:
server: { name: payments-server }
client:
meshTLS:
identities: [".ns. api. serviceaccount. identity. linkerd. cluster. local"]
8) Observability and telemetry
8. 1 Metrics
Istio Telemetry API → Prometheus: `istio_requests_total`, `istio_request_duration_milliseconds_bucket`, `istio_tcp_received_bytes_total`.
Linkerd viz: `request_total`, latency p50/p95/p99, `success_rate`.
8. 2 Trails and logs
Push W3C Trace Context.
Istio/Envoy → OTLP в OpenTelemetry Collector; Linkerd - via sidecar loggers/app SDK.
8. 3 Instances
Add trace _ id to the duration histograms for jump-to-trace.
9) Rate limits, WAF, custom filters
Istio: EnvoyFilter/WASM for local rate limits, eksternal-rate-limit service (Redis), as well as WAF logic (Lua/WASM).
Linkerd: limited native support; rate limit - at ingress/gateway level.
10) Multi-cluster
Istio: east-west gateway, shared PKI or trust-bundle, service discovery via ServiceEntry, Federation.
Linkerd: `linkerd multicluster link`, gateway per cluster, service-mirror контроллер.
Use-cases: asset-regions, traffic localization, federated zero-trust.
11) Performance and cost
Sidecar mesh: CPU/RAM overhead per Pod, increased latency (usually + 1-3 ms per hop in steady-state).
Ambient (Istio): less consumption for L4, L7 is turned on point.
Linkerd: Lightweight proxy is generally less overhead, but less extreme L7 capabilities.
Practice: measure p95/CPU before/after, keep SLO gates for degradation.
12) Safety
mTLS everywhere, short TTL, automatic rotation.
Policy as Code (OPA/Gatekeeper, Kyverno) for'authorizationPolicy: ALLOW all'prohibitions.
Secrets - through CSI/Vault, not in manifestos.
Egress control: deny-by-default, explicit allow-lists.
Separate trust domains for environments (prod/stage).
13) Integration with releases and SLO gating
Canary/Blue-Green are implemented by mesh routes (see examples).
Metrics Analysis (Prometheus/SpanMetrics) in Argo Rollouts AnalysisTemplate - Hitchhiking/Rollback at burn-rate/p95/5xx.
Annotations of releases in Grafana: comparison 'version = stable' canary '.
14) Anti-patterns
Include mesh "everywhere and at once" → infrastructure shock.
Ignore the cardinality of the metrics/logs from the proxy → overloading the TSDB/log storage.
Leave mTLS in PERMISSIVE/opaque mode forever.
Try to make complex WAF/business logic inside EnvoyFilter instead of gateway/application.
No egress policy - internet leaks/compliance bypass.
Proxies with ': 15000' debug open to the outside.
15) Implementation checklist (0-60 days)
0-15 days
Model selection: Sidecar vs Ambient (Istio )/Linkerd by load profile.
Enable mTLS STRICT, basic authorization policies for 1-2 critical services.
Basic routes (timeout/retries), RED/SLO dashboards.
16-30 days
Canary/TrafficSplit, outlier detection/circuit breaking on hot tracks.
OTEL integration: trails + Exemplars; alert burn-rate.
Egress-gateways and domain whitelists; deny-by-default.
31-60 days
Multi-cluster link (if necessary), federation trust.
Policy as Code на AuthorizationPolicy/ServerAuthorization.
Game-day: simulation of incident and route/policy rollback.
16) Maturity metrics
mTLS (STRICT/auto-rotate) coverage ≥ 95% of services.
The share of traffic through canary/progressive releases ≥ 80%.
Average overhead p95 <+ 5% of baseline (after optimization).
0 open egress without permission, 100% services with basic AuthZ.
RCA "from schedule to track" ≤ 2 minutes (p50).
17) Examples of "politics as code"
Gatekeeper (prohibition PERMISSIVE in prod)
yaml apiVersion: constraints. gatekeeper. sh/v1beta1 kind: K8sIstiomTLSStrict metadata: { name: deny-permissive-prod }
spec:
match:
kinds: [{ apiGroups: ["security. istio. io"], kinds: ["PeerAuthentication"] }]
namespaces: ["prod-"]
parameters:
allowedModes: ["STRICT"]
Kyverno (required labels for VS/DR)
yaml apiVersion: kyverno. io/v1 kind: ClusterPolicy metadata: { name: require-mesh-labels }
spec:
rules:
- name: vs-dr-labels match:
any:
- resources:
kinds: ["VirtualService","DestinationRule"]
validate:
message: "owner and service labels required"
pattern:
metadata:
labels:
owner: "?"
service: "?"
18) Operational tips
Version policies and routes (semver), promotion through GitOps.
Proxy observability: individual "proxy saturation" dashboards (CPU/heap, retries, 429/503).
Cardinality budget: labels' route ',' code ',' destination '- only template.
Network limits/namespace quotas (NetworkPolicy/LimitRange).
Command documentation: runbook "how to roll back mTLS routes/policy/keys."
19) Conclusion
Istio and Linkerd do the same thing - standardize the safety, reliability and visibility of cross-service communications - but do so at different depths and cost of ownership.
You need rich L7 capabilities and flexible policies - take Istio (consider Ambient to reduce overhead).
Need simplicity and small overhead - take Linkerd.
Whichever mesh you choose: Enable mTLS by default, manage routing as code, associate metrics with tracks, close egress, and add SLO gating to releases. Then the network layer will cease to be a "black box" and will become a predictable tool for stability and speed of change.