Operations and → Management Audit Configurations

Audit configurations

1) Purpose and value

Auditing configurations ensures provable accountability and repeatability of change: who, when and what changed; what is justified; as tested; how to roll back. This reduces the risk of incidents, leaks of secrets, compliance inconsistencies and "hidden" edits in the prod.

Key results:

A single source of truth (SoT) for configs.
Full change tracing (end-to-end).
Predictable releases and quick rollback.
Compliance and security policies.

2) Scope

Infrastructure: Terraform/Helm/Ansible/K8s manifests, network ACL/WAF/CDN.
Application configs: 'yaml/json/properties' files, feature flags, limits/quotas.
Secrets and keys: vault/kms, certificates, tokens, passwords.
Data pipelines: schemas, transformations, ETL/stream schedules.
Integrations: PSP/KYC/providers, webhooks, retry/timeout policies.
Observability: Alert rules, dashboards, SLO/SLA.

3) Principles

Config as Data: declarative, versioned, testable artifacts.
Immutability and idempotency: reproducibility of the medium from the code.
Schemes and contracts: strict validation (JSON-Schema/Protobuf), back/forward compatibility.
Minimizing manual edits: changes only through MR/PR.
Separation of duties (SoD) and 4-eyes: author! = deploer; mandatory review.
Attribution and signatures: signatures of commits/releases, attestations of artifacts.

4) Audit architecture

1. SCM (Git) as SoT: all configs in the repository, the 'main' branch is protected.

2. Registers:

Config Registry (directory of configs, possessions, SLAs, environments),
Schema Registry (config/event schema versions),
Policy Engine (OPA/Conftest) - set of checks.
3. CI/CD-gates: format/scheme → static check → policy checks → secret scan → dry-run → change plan.
4. Delivery: GitOps (e.g. ArgoCD/Flux) with drift detector and application audit logs.
5. Evidence Store: a repository of audit artifacts (plan, logs, signatures, builds, SBOM).
6. Action log: invariable log (append-only) of'CREATE/APPROVE/APPLY/ROLLBACK/ACCESS 'events.

5) Audit data model (minimum)

Сущности: `ConfigItem(id, env, service, owner, schema_version, sensitivity)`

События: `change_id, actor, action, ts, diff_hash, reason, approvals[]`

Артефакты: `plan_url, test_report_url, policy_report, signature, release_tag`

Connections: RFC/ticket ↔ PR ↔ depla (sha) ↔ release recording ↔ SLO monitoring.

6) Change process (end-to-end)

1. RFC/ticket → target, risk, backout.
2. PR/MR → linting, schematic validation, policy checks, secret scan.
3. Plan/preview → dry-run/plan, resource diff, cost/impact estimate.
4. Approve (4-eyes/SoD, CAB label at high risk).
5. Deploy (by window/calendar) → GitOps applies; drift alert enabled.
6. Verification → smoke/SLO-gardrails, confirmation of the result.
7. Archiving evidence → evidence store; updating the config dictionary.

7) Policies and rules (examples)

SoD: PR author does not hold in prod.

Time limit: No production outside "freeze."

Scope: changing sensitive keys requires 2 updates from Security/Compliance.
Secrets: forbidden to keep in repo; vault path + access role references only.
Nets: ingress with '0. 0. 0. 0/0 'is not allowed without a temporary exception and TTL.
Alerts: it is forbidden to reduce the criticality of P1 without CAB.

8) Secret control

Vault/KMS storage, short TTLs, automatic rotation.
Secret scanning in CI (key patterns, high-entropy).
Isolation of secrets by environments/roles; minimum necessary privileges.
Encryption "on the wire" and "at rest"; closed audit logs of access to secrets.

9) Tools (variable)

Lint/Schema: `yamllint`, `jsonschema`, `ajv`, `cue`.
Policy: OPA/Conftest, Checkov/tfsec/kube-policies.
GitOps: ArgoCD/Flux (drift detection, audit, RBAC).
Secrets: HashiCorp Vault, cloud KMS, cert managers.
Scanners: trufflehog, gitleaks (secrets); OPA/Regula (rules).
Reporting: export logs to DWH/BI, link to incident and change system.

10) Examples of rules and artifacts

JSON-Schema for Limit Configuration

json
{
"$schema": "http://json-schema. org/draft-07/schema#",
"title": "limits",
"type": "object",
"required": ["service", "region", "rate_limit_qps"],
"properties": {
"service": {"type":"string", "pattern":"^[a-z0-9-]+$"},
"region": {"type":"string", "enum":["eu","us","latam","apac"]},
"rate_limit_qps": {"type":"integer","minimum":1,"maximum":5000},
"timeouts_ms": {"type":"integer","minimum":50,"maximum":10000}
},
"additionalProperties": false
}

Conftest/OPA (rego) - deny '0. 0. 0. 0/0` в ingress

rego package policy. network

deny[msg] {
input. kind == "IngressRule"
input. cidr == "0. 0. 0. 0/0"
msg:= "Ingress 0. 0. 0. 0/0 is not allowed. Specify specific CIDRs or throw an exception with TTL"
}

Conftest/OPA - SoD

rego package policy. sod

deny[msg] {
input. env == "prod"
input. pr. author == input. pr. merger msg: = "SoD: PR author cannot hold in prod."
}

SQL (DWH) - who reduced the criticality of alerts in a month

sql
SELECT actor, COUNT() AS cnt
FROM audit_events
WHERE action = 'ALERT_SEVERITY_CHANGED'
AND old_value = 'P1' AND new_value IN ('P2','P3')
AND ts >= date_trunc('month', now())
GROUP BY 1
ORDER BY cnt DESC;

Git commit message example (required fields)


feat(config/payments): raise PSP_B timeout to 800ms in EU

RFC: OPS-3421
Risk: Medium (PSP_B only, EU region)
Backout: revert PR + restore timeout=500ms
Tests: schema ok, conftest ok, e2e ok

11) Monitoring and alerting

Drift-detection: config in a cluster ≠ Git → P1/P2 signal + auto-remediation (reconcile).
High-risk change: change networks/secrets/policies - notification in # security-ops.
Missing evidence: deploy without plan/signature/reports - block or alert.
Expired assets: certificate/key validity periods → pro-active alerts.

12) Metrics and KPIs

Audit Coverage% - the share of configs under schemas/policies/scanners.
Drift MTTR is the average drift clearing time.
Policy Compliance% - Pass policies to PR.
Secrets Leak MTTR - from leak to recall/rotation.
Backout Rate - the proportion of rollbacks of config changes.
Mean Change Size - average diff on lines/resources (less is better).

13) Reporting and Compliance

Audit traces: storage ≥ 1-3 years (according to requirements), unchangeable storage.
Regulatory: ISO 27001/27701, SOX-like SoD, GDPR (PII), industry requirements (iGaming: accounting for changes in GGR/NGR calculations, limits, bonus rules).
Monthly reports: top changes, policy violations, drift, expiring certificates, rotation status.

14) Playbooks

A. Drift detected in prod

1. Block auto-deposit for affected service.
2. Remove the snapshot of the current state.
3. Compare with Git, initiate 'reconcile' or rollback.
4. Create incident P2, specify drift source (manual kubectl/console).
5. Enable protection: no direct changes (PSP/ABAC), notify owners.

B. PSP certificate expired

1. Switch to the backup path/PSP, lower the timeouts/retraces.
2. Issue a new certificate through the PKI process, update the config through Git.
3. Smoke test, return traffic, close the incident, post-mortem.

C. Secret hit PR

1. Revoke key/token, use rotation.
2. Rewrite history/remove artifact from caches, issue RCA.
3. Add a rule to the secret scanner, train the command.

15) Anti-patterns

Manual edits "on sale" without a trace and rollback.
Configs without schemes and without validation.
Secrets in Git/CI variables without KMS/Vault.

Monorepos with the equivalent of "global super-right."

"Deaf" GitOps without drift alerts and application logs.
Huge PRs "all at once" - unclear attribution and high risk.

16) Checklists

Before merge

Diagram and linters passed
OPA/Conftest policies are green
Secret-scan - "clean"
Plan/diff attached, risk assessed, backout ready
2 April (prod) and SoD met

Before deploy

Release window and calendar checked
Drift monitoring is active
SLO gardrails configured, smoke tests ready

Monthly

Rotation of keys/certificates on schedule
Inventory of owners and rights
OPA/Exclusion Rules Review (TTL)
Fire-drill test

17) Design tips

Split the changes into small diffuses; one PR is one goal.
Mandatory PR/commit templates with RFC/risk/rollback.
For dynamic configs, use "config centers" with audit and rollback.
Versionize circuits; prohibit breaking without migrations.
Visualize the "config map": what, where, who is controlled.

18) Integration with Change and Incident Management

PR ↔ RFC ↔ release calendar ↔ incidents/post-mortems.
Auto-tie metrics (SLO/business) to config releases.
Auto-create tasks to delete old flags/exceptions (TTL).

19) The bottom line

Auditing configurations is not "paper reporting," but an operational reliability mechanism: configs are data, changes are controlled and verifiable, secrets are under lock and key, and the whole story is transparent and verifiable. This is how a stable, compliant and predictable platform is built.

Operations and → Management Audit Configurations

Audit configurations

Conftest/OPA (rego) - deny '0. 0. 0. 0/0` в ingress

Conftest/OPA - SoD

SQL (DWH) - who reduced the criticality of alerts in a month

Git commit message example (required fields)

Before deploy

Monthly

Get in Touch

Quick Contact

The video will be updated soon

We are currently very busy with projects