Strengthening the production environment and auditing

1) Objectives and area of responsibility

Production is not only the "most stable environment," but also the most attacked. Our task:

minimize attack area and Blast Radius;
protect channels, accounts, secrets and artifacts of delivery;
Detect and respond to incidents faster than MTTR targets
Confirm compliance (GDPR/PCI DSS/local rules)
preserve auditability of all critical actions.

Key principles: Zero Trust, Least Privilege, Segmentation, Everything-as-Code, Security-by-Default.

2) Network perimeter and segmentation

Segments: Edge (WAF, bot management, DDoS), DMZ (gateway), App (microservices), Data (DB/caches), Backoffice/Ops (CI/CD, observability).
L4/L7 policies: deny-by-default, explicit allow by services/namespaces/ports.
mTLS within a cluster TLS 1. 2 + on the perimeter, HSTS, secure ciphers.
Input filter: WAF (OWASP Top-10), anti-bot, rate limits, geo/ASN blocks, CAPTCHA on risk paths.
DDoS protection: always-on + auto-mitigation, separate profiles for API/static content.
Egress control: only necessary external hosts for providers (PSP/KYC/games).

3) Identities, access and privileges (IAM/PAM)

SSO (OIDC/SAML) + MFA for humans; OIDC tokens/Workload Identity for services.
RBAC/ABAC: roles with minimum required permissions; "break-glass" access under audit and TTL.
PAM: on-demand privileged session check-out, full recording and logging.
CIEM (clouds): search for excessive rights and dead roles, auto-remediation.
Access to production data: only through approved jump/proxy, with PII masking.

4) Secrets and cryptography

KMS/HSM: key storage, envelope encryption, rotation with notifications.
Secret manager: short-lived credits, exclude secrets from Git/logs.
Signatures: artifacts (cosign), webhooks (HMAC), service tokens.
PAN/PII fields: tokenization/encryption at-rest; masking in logs and previews.
Rotation policies: keys/certificates/passwords - routine and forced.

5) Containers and Kubernetes (CWPP/KSPM)

Base images: minimal, scanning vulnerabilities on CI; rootless wherever possible.
Admission policies (OPA/Gatekeeper/Kyverno): prohibit ': latest', 'privileged', hostPath; require image signatures.
NetworkPolicies: Service-to-service communication only when needed.
PodSecurity: limited capabilities, read-only FS, seccomp, AppArmor.
Secrets: from Secret Store CSI (KMS); no plain secret in manifests.
Runtime protection: behavioral rules (eBPF), alerts to anomalies.

Example of an OPA rule (disallow unsigned images):

rego package k8sadmission deny[msg] {
input. request. kind. kind == "Pod"
some c image:= input. request. object. spec. containers[c].image not startswith(image, "registry. company. com/signed/")
msg:= sprintf("Image must be signed and come from trusted registry: %v", [image])
}

6) Supply chain: trust but check

SBOM per build; storage and linking to release.
Image/manifest signatures, validation on admission controller.
SLSA certifications: provable origin of artifacts.
Policy-as-Code: Conftest/OPA on the Terraform/Helm/K8s before the merge.
Prohibition of "last-minute patching" on the product: all changes are only through the pipeline.

7) Vulnerability and patch management

SCA/SAST/DAST в CI; blocking thresholds for critical/high.
Weekly update batches (images, OS packages, libraries) + emergency unscheduled.
Corrections performed → tickets/releases linked to CVE/SBOM.
EASM: external view of the attack surface (subdomains, open ports, certificates).
Regular pen tests: at least once a year + targeted at critical flows (payments/CCM).

8) Logs, metrics, traces and storage of audit artifacts

Standardized logs (JSON) with 'trace _ id', 'request _ id', user/tenant/geo (pseudonymous), no PII/PAN.
Metrics: p50/p95/p99, error-rate, saturation, DLQ, retrai, business KPI (Time-to-Wallet).
OTel: end-to-end for critical routes (deposit/CCL/output).
SIEM: event correlation (authentication, role changes, admin actions, WAF/bot rules).
SOAR: auto-reactions (insulation of the hearth, token recall, IP/ASN block, release ban).
Retention: operating logs - 30-90 days hot storage, audit artifacts - longer, according to policies.

Minimum log format (example):

json
{
"ts":"2025-11-05T15:00:00Z",
"sev":"WARN",
"svc":"payments-api",
"route":"POST /v1/payments",
"trace_id":"2f9f...e1",
"user":"anon",
"tenant":"eu-casino-12",
"geo":"EU",
"event":"circuit_breaker_open",
"provider":"psp-1"
}

9) Anti-bots, scams and defensive scenarios

Bot management: signatures/behavior, device-fingerprint, dynamic challenges.
Rate limits/quotas: per-user/tenant/IP; adaptive in anomalies.
RASP sensors on critical endpoints (attempts to bypass webhooks signatures, clock drift, re-delivery).
Fraud signals: correlation by channels (logins, payments, KYC), auto-escalation.

10) Protection, DR and BCP

RTO/RPO targets are defined and tested (for example, RTO ≤ 1 hour, RPO ≤ 5 minutes for payment databases).
Backups: encrypted, periodically in offline storage; regular restore tests.
Geo-duplication: asset-liability/asset-asset by region; DNS failover with TTL control.
Directory of critical dependencies (PSP/KYC/game aggregators) and switching plans.

11) Incidents and response

Runbooks: for provider drop, latency growth, token compromise, DDoS.
On-call: 24/7, rotations and blast pages; joint "war-room" practice.
Communications: message templates for customers/partners and regulators.
Post-mortem (blameless): actions to prevent repetition, updating policies/playbooks.

12) Compliance and privacy

GDPR: data minimization, consent registers, right to delete/port; DPIA for new providers.
PCI DSS: PAN tokenization/isolation zones, network segments, strict access logs.
Local requirements (market jurisdictions): data storage in the region, reporting, update windows.
Data Lineage: where and how PII/PAN flow; schemes and DPIA in DevPortal.

13) Audit: Types, Artifacts and Cycle

Audit types:

Internal (quarterly): compliance with policies, control of changes, accesses, secrets, logs, pipelines.
External (annually/by requirements): PCI/GDPR/local regulators, pen tests, SOC reports of providers.

Key artifacts (what to cook in advance):

Security policies, role IAM matrix, exception list with expiration date.
Infrastructure change logs (IaC), CI/CD reports (SBOM, signatures, tests).
Register of providers (PSP/KYC/games), DPIA/vendor-risk assessments, contracts and SLAs.
Sales access logs, secret rotation results, SIEM/SOAR reports.
DR/BCP plans and protocols of recent restore tests.

Audit approach:

"Evidence-first": each practice is a verifiable artifact.
"No humans in prod": maximum via pipelines and approved applications; all sessions - under the log.
Trace everything - Map changes to incidents/metrics.

14) Guardrails-as-Code: Examples

Conftest for Terraform (public database ban):

rego package terraform. deny deny[msg] {
input. resource. type == "aws_db_instance"
input. resource. publicly_accessible == true msg:= "RDS must not be public"
}

AdmissionPolicy (K8s): require security labels and resource limits

yaml apiVersion: kyverno. io/v1 kind: ClusterPolicy metadata:
name: enforce-security-labels-and-limits spec:
rules:
- name: require-labels match: {resources: {kinds: ["Deployment","StatefulSet"]}}
validate:
message: "security labels required"
pattern:
metadata:
labels:
security. tier: "?"
data. classification: "?"
- name: require-limits match: {resources: {kinds: ["Deployment","StatefulSet"]}}
validate:
message: "resources limits/requests required"
pattern:
spec:
template:
spec:
containers:
- resources:
limits:
cpu: "?"
memory: "?"
requests:
cpu: "?"
memory: "?"

15) Daily hygiene checklist

WAF/bot policies active, signatures updated; anti-DDoS in always-on mode.
Admission controllers in the cluster in the enforce state, not audit.
All production images are signed; SBOM is available and tied to the release.
Critical/high vulnerabilities - missing or fixed with date exceptions.
Rotation of secrets/certificates - on schedule, no delays.
SIEM correlates IAM/release entry/change events; SOAR playbooks are being tested.
Backups passed, restore test on schedule; DR plan is valid.
Access to food - only via SSO + MFA/PAM; all sessions are recorded.
"No PII in logs" - validated by scanners; masking is enabled.
Release gates and observability updated "as-code."

16) Maturity model (brief)

1. Basic - manual changes, single perimeter, partial monitoring.
2. Advanced - segmentation, IAM/RBAC, signed artifacts, WAF/DDoS, SIEM, regular patches.
3. Expert - Zero Trust, guardrails-as-code, SLSA-attestation, runtime-protection, SOAR-automation, "no humans in prod," continuous audit.

17) Implementation Roadmap

M0-M1 (MVP): network segmentation, WAF/DDoS, SSO + MFA, KMS, basic Admission-policy, standardized logs/metrics/trails, SIEM.
M2-M3: image signatures and admission verification, SBOM, Conftest/OPA on IaC, PAM, rotation plan, regular patches, first DR tests.
M4-M6: SOAR playbooks, eBPF/runtime detection, EASM, compliance package (PCI/GDPR), full set of audit artifacts, ring-DR by region.
M6 +: Zero-Trust network (mTLS everywhere), CIEM, automated audit trail reports, continuous "purple-team" testing.

Summary

Strong prod is not a set of "iron" rules, but a system: segmentation, strict identities and secrets, secure delivery, managed containers, observability, and automated response. Add verifiability (audit artifacts, SBOM/signatures, logs), and the production environment becomes predictable, manageable, and ready for external audits - without compromises on release speed and business SLOs.

Strengthening the production environment and auditing

Summary

Get in Touch

Quick Contact

The video will be updated soon

We are currently very busy with projects