Operations and → Management Execution Policies and Runtime Restrictions

Execution Policies and Runtime Restrictions

1) Purpose

Runtime policies make the behavior of services predictable, safe and economical: limit "noisy neighbors," prevent leaks and overheating, ensure compliance and retention of SLOs when the load increases.

Key objectives: isolation, equitable allocation of resources, controlled degradation, reproducibility, audit.

2) Scope

Computing and memory: CPU, RAM, GC pauses, thread limits.
Disk/storage: IOPS/throughput, quotas, fs-policies (read-only).
Сеть: egress/ingress, bandwidth shaping, network policies.
Processes/system calls: seccomp, capabilities, ulimit.
Orchestration: Kubernetes QoS, requests/limits, priorities, tains/affinity.
API/gateways: rate-limits, quotas, timeouts/retrays, circuit-breakers.
Data/ETL/streams: batch/stream concurrency, consumer lag budgets.
Security: AppArmor/SELinux, rootless, secrets/kofigi.
Policy-as-Code: OPA/Gatekeeper, Kyverno, Conftest.

3) Basic principles

Fail-safe by default: it is better to drop unnecessary requests than drop.
Budget-driven: Timeouts/Retrays fit into the request time budget and SLO error budget.
Small blast radius: namespace/pool/host/shard isolation.
Declarative & auditable: all restrictions - in code/repository + change log.
Multi-tenant fairness: no tenant/team can "suck out" the entire cluster.

4) Computing and memory

4. 1 Kubernetes и cgroup v2

requests/limits: requests guarantee the share of CPU/memory; limits include throttling/OOM-killer.
QoS classes: Guaranteed/Burstable/BestEffort - keep critical workflows in Guaranteed/Burstable.
CPU: `cpu. shares`, `cpu. max '(throttle), CPuset for pinning.
Memory: 'memory. max`, `memory. swap. max '(usually swap off) oom_score_adj for priority.

4. 2 Patterns

Headroom 20-30% on node, anti-affinity for duplication.
GC limits: JVM '-Xmx' <k8s memory limit; Go: `GOMEMLIMIT`; Node: `--max-old-space-size`.
ulimit: 'nofile', 'nproc', 'fsize' - by service profile.

5) Disk and storage

IOPS/Throughput quotas on PVC/cluster-storage; Log/data separation.
Read-only root FS, tmpfs for temporary files, size limit '/tmp '.
FS-watchdog: alerts for volume filling and inode growth.

6) Network and traffic

NetworkPolicy (ingress/egress) — zero-trust east-west.
Bandwidth limits: tc/egress-policies, QoS/DSCP for critical flows.
Egress controller: list of allowed domains/subnets, audit DNS.
mTLS + TLS policies - encryption and forced protocol version.

7) Process safety

Seccomp (allowlist syscalls), AppArmor/SELinux profiles.
Drop Linux capabilities (leave minimum), 'runAsNonRoot', 'readOnlyRootFilesystem'.
Rootless containers, signed images and attestations.
Secrets-only via Vault/KMS, tmp-tokens with short TTL.

8) Time policies: timeouts, retreats, budgets

Timeout budget: sum of all hops ≤ SLA end-to-end.
Retrai with backoff + jitter, maximum attempts in error class.
Circuit-breaker: open with error %/timeout p95 above threshold → fast failures.
Bulkheads: separate connection-pools/queues for critical paths.
Backpressure: limiting producers to lag consumers.

9) Rate-limits, quotas and priority

Algorithms: token/leaky bucket, GCRA; local + distributed (Redis/Envoy/global).
Granularity: API key/user/organization/region/endpoint.
Priority gradients: "payment/authorization" flows - gold, analytics - bronze.
Quotas per day/month, "burst" and "sustained" limits; 429 + Retry-After.

10) Orchestration and planner

PriorityClass: protection of P1 pods from displacement.
PodDisruptionBudget: downtime bounds on updates.
Tains/Tolerations, (anti) Affinity - isolation workloads.
RuntimeClass: gVisor/Firecracker/Wasm for sandboxes.
Horizontal/Vertical autoscaling with guard thresholds and max-replicas.

11) Data/ETL/Stream Policies

Concurrency per job/topic, max batch size, checkpoint interval.
Consumer lag budgets: warning/critical; DLQ and retray limit.
Freshness SLA for storefronts, a pause of heavy jobs at peaks of prod traffic.

12) Policy-as-Code and admission-control

OPA Gatekeeper/Kyverno: no pods without requests/limits, no 'readOnlyRootFilesystem', with 'hostNetwork', ': latest'.
Conftest for pre-commit Helm/K8s/Terraform checks.
Mutation policies: auto-adding sidecar (mTLS), annotations, seccompProfile.

Example Kyverno - prohibition of container without limits:

yaml apiVersion: kyverno. io/v1 kind: ClusterPolicy metadata:
name: require-resources spec:
validationFailureAction: Enforce rules:
- name: check-limits match:
resources:
kinds: ["Pod"]
validate:
message: "We need resources. requests/limits for CPU and memory"
pattern:
spec:
containers:
- resources:
requests:
cpu: "?"
memory: "?"
limits:
cpu: "?"
memory: "?"

Example of OPA (Rego) - timeouts ≤ 800 ms:

rego package policy. timeout

deny[msg] {
input. kind == "ServiceConfig"
input. timeout_ms> 800 msg: = sprintf ("timeout% dms exceeds budget 800ms," [input. timeout_ms])
}

13) Observability and compliance metrics

Compliance%: percentage of podes with correct requests/limits/labels.
Security Posture: share of pods with seccomp/AppArmor/rootless.
Rate-limit hit%, shed%, throttle%, 429 share.
p95 timeouts/retraces, circuit-open duration.
OOM kills/evictions, CPU throttle seconds.
Network egress denied events, egress allowlist misses.

14) Checklists

Before laying out the service

Requests/limits are written; QoS ≥ Burstable
Timeouts and retrays fit into end-to-end SLAs
Circuit-breaker/bulkhead enabled for external dependencies
NetworkPolicy (ingress/egress) и mTLS
Seccomp/AppArmor, drop capabilities, non-root, read-only FS
Rate-limits and quotas on API gateway/service
PDB/priority/affinity specified; autoscaling is configured

Monthly

Audit policy exceptions (TTL)
Review Time/Error Budgets
Fire-drill test: shed/backpressure/circuit-breaker
Rotating secrets/certificates

15) Anti-patterns

Without requests/limits: "burst" eats up neighbors → cascading crashes.
Global retreats without jitter: a storm in addictions.
Infinite timeouts: "hanging" connections and exhaustion of pools.
': latest' and mutable tags: unpredictable runtime builds.
Open egress: leaks and unmanaged dependencies.
No PDB: Updates knock out the entire pool.

16) Mini playbooks

A. CPU throttle% at payments-service

1. Check limits/requests and profile hot paths.
2. Temporarily raise requests, turn on autoscale by p95 latency.
3. Enable limits/rates cash-back, reduce complexity of queries.
4. Post-fix: denormalization/indices, revision of limits.

B. 429 growth and API complaints

1. Report on the keys/organizations → ran into the quota.
2. Enter hierarchical quotas (per- org→per -key), raise burst for gold.
3. Communication and guidance on backoff; enable adaptive limiting.

B. Mass OOM kills

1. Reduce concurrency, enable heap limit and profiling.
2. Recalculate Xmx/GOMEMLIMIT for real peak-usage.
3. Retrain GC/pools, add swap-off and soft-limit alerts.

17) Configuration examples

K8s container with secure settings (fragment):

yaml securityContext:
runAsNonRoot: true allowPrivilegeEscalation: false readOnlyRootFilesystem: true capabilities:
drop: ["ALL"]

Envoy rate-limit (fragment conceptually):

yaml rate_limit_policy:
actions:
- request_headers:
header_name: "x-api-key"
descriptor_key: "api_key"

Nginx ingress - timeouts and restrictions:

yaml nginx. ingress. kubernetes. io/proxy-connect-timeout: "2s"
nginx. ingress. kubernetes. io/proxy-read-timeout: "1s"
nginx. ingress. kubernetes. io/limit-rps: "50"

18) Integration with change and incident management

Any policy relaxation is via RFC/CAB and temporary exception with TTL.
Policy violation incidents → post-mortem and rule updates.
Compliance dashboards are connected to the release calendar.

19) The bottom line

Execution policies are a "railing" for the platform: they do not interfere with driving fast, they do not allow falling. Declarative constraints, automatic enforcement, good metrics, and exception discipline turn chaotic exploitation into a manageable and predictable system - with controlled cost and sustainable SLOs.

Operations and → Management Execution Policies and Runtime Restrictions

Execution Policies and Runtime Restrictions

Monthly

Get in Touch

Quick Contact

The video will be updated soon

We are currently very busy with projects