Secret management
Secret Management
1) Why and what exactly we consider a "secret"
Secret - any material whose disclosure leads to compromise of the system or data: passwords, API tokens, OAuth/JWT private keys, SSH keys, certificates, encryption keys (KEK/DEK), webhook signature keys, DSN databases/caches, vendor keys (payments, mail/SMS providers), cookie salts/pepper, bot/chat tokens, licenses.
Secrets live in the code, a config, an environment, container images, CI/CD, Terraform/Ansible, logs/dumps - a task of management of secrets: account → storage → delivery → use → rotation → response → audit → utilization.
2) Architecture principles
Centralization. One trusted layer (Vault/Cloud Secret Manager/KMS) for storage, issuance, and auditing.
Least Privileges (PoLP). Access only to the necessary services/roles, for a minimum period.
Short life. Dynamic/time secrets with TTL/lease are preferred.
Crypto-agility. Ability to change algorithms/key lengths without downtime.
Separating secrets from code/images. No passwords in repositories, no Docker images.
Observability and audit. Each operation of issuing/reading secrets is logged and deleted.
Automatic rotation. Rotation is a process in pipeline, not a manual action.
3) Typical solutions and component roles
KMS/HSM. Root trust, encryption/key wrapping operations (envelope).
Secret Manager/Vault. Secret version store, ACL, audit, dynamic secrets (DB, cloud-IAM, PKI), rotation templates.
PKI/CA. Issuing short-lived mTLS/SSH/JWT signatures.
Agent/sidecar. Delivery of secrets to runtime (tmpfs, in-memory k/v, hot-reload files).
CSI drivers/operators. Integration with Kubernetes (Secret Store CSI Driver, cert-manager).
Encryption layer in Git. SOPS/age, git-crypt (for infrastructure code).
4) Classification and policy
Separate secrets by criticality (P0/P1/P2) and volume of damage (tenant-scoped, environment-scoped, org-wide). For each class, specify:- TTL/lease and rotation frequency;
- output method (dynamics vs static), format, media;
- access policy (who/where/when/why), mTLS and mutual authentication requirements;
- audit (that we log how much we store, who reviews);
- break-glass procedures and recalls.
5) Secret life cycle
1. Creation: through the Secret Manager API with metadata (owner, tags, scope).
2. Storage: encrypted (envelope: DEK wrapped with KEK from KMS/HSM).
3. Delivery: at the request of an authorized entity (OIDC/JWT, SPIFFE/SVID, mTLS).
4. Usage: exclusively in memory/in tmpfs; prohibition of logging/dumps.
5. Rotation: by TTL or event (compromise); Support for parallel versions (N-1)
6. Recall/blocking: immediate expiration of lease, account/key disablement in the target system.
7. Disposal: destruction of versions/material, clear audit chain.
6) Dynamic secrets (recommended by default)
The idea: the secret is issued for a short time and automatically expires. Examples:- Database credentials (Postgres/MySQL) with TTL 15-60 min.
- Temporary cloud keys (AWS/GCP/Azure) by service role.
- SSH certificates (5-30 minutes), X.509 certificates (hour/day).
- Temporary JWT for signing requests, session-tickets brokers.
- Pros: minimal blast radius, simplified recall (nothing will "remain" in the world).
7) Delivery of secrets in runtime
Kubernetes:- Secret Store CSI Driver → mounting secrets from an external manager to pod as files (tmpfs).
- Avoid Kubernetes Secret as the only source (base64 ≠ encryption); If necessary, enable the KMS provider for etcd.
- Sidecar agent (Vault Agent/Secrets Store) with auto-reneval lease and hot-reload.
- VM/Bare-metal: system agent + mTLS to Vault/Secret Manager, cache in memory, minimal TCB.
- Serverless: cloud integration with transparent substitution of secrets as environment variables/files, but avoid long-lived envvars - preferably files/in memory.
Example (Kubernetes + CSI, conceptually)
yaml apiVersion: v1 kind: Pod metadata: { name: app }
spec:
serviceAccountName: app-sa # is associated with a role in Secret Manager volumes:
- name: secrets csi:
driver: secrets-store. csi. k8s. io readOnly: true volumeAttributes:
secretProviderClass: app-spc containers:
- name: app volumeMounts:
- mountPath: /run/secrets name: secrets readOnly: true
8) CI/CD and IaC integrations
CI: workers receive short-lived tokens according to OIDC (Workload Identity). Ban on "masked" secrets that get into the logs; step "leak scan" (trufflehog/gitleaks).
CD: Deploy takes secrets at the time of display, does not write them into artifacts.
IaC: Terraform stores variables in Secret Manager; state is encrypted and access restricted.
SOPS/age: for repos - store encrypted manifests, keys - under the control of KMS.
Example (SOPS fragment)
yaml apiVersion: v1 kind: Secret metadata: { name: app }
data:
PASSWORD: ENC[AES256_GCM,data:...,sops:...]
sops:
kms:
- arn: arn:aws:kms:...
encrypted_regex: '^(data stringData)$'
version: '3. 8. 0'
9) Access policies and workload authentication
Workload identity: SPIFFE/SPIRE, Kubernetes SA→OIDC→IAM-роль, mTLS.
Temporary tokens: short TTL, narrow scope.
ABAC/RBAC in Secret Manager: "who can read the X secret in the Y environment" is separate from "who can create/rotate."
Multi-tenancy: separate namespaces/key-rings per tenant; individual policies and reporting.
10) Rotation, versions and compatibility
Separate the secret ID and its version ('secret/app/db # v17').
Support two active versions (N and N-1) for non-stop rotation.
Rotation is event-based: upon dismissal, compromise, change of provider, migration of algorithms.
Automate: cron/backend rotation in Vault/Secret Manager + webhook triggers for application restart/reneval.
Mini recipe "two-key" webhook rotation
text
T0: we publish two secrets in the provider: current, next
T1: the application starts accepting signatures by both current and next
T2: external system switches signature to next
T3: we do next -> current, re-release new next
11) Off-runtime storage: backups and artifacts
Never get into artifacts (images, log archives, dumps).
Secret Manager backups - encrypt, storage keys outside the same loop (separation of duties).
Tags and DLP scans: detecting secrets in S3/Blob/GCS, Git, CI artifacts.
12) Observability, audit and SLO
Metrics: number of issues/secret/service, share of expired lease, average TTL, rotation time, convergence time (seconds/minutes before "accepting" the new version).
Audit logs: who/what/when/where/why; storage separately, also encrypted.
SLO: 99% output <200 ms; 0 leaks in logs; 100% of secrets have owner/TTL/tags; 100% critical secrets - dynamic or rotation ≤ 30 days.
Alerts: secret expires <7 days (for static), spike in authentication failures to storage, no secret reads> N days (dead), unexpected geo/ASN sources.
13) Frequent mistakes and how to avoid them
Secrets in Git/imagery. Use SOPS/age and scanners; policy to prohibit "bare" lines.
Envvars as a long-term medium. Give preference to tmpfs/in-memory files; clean the environment at forks/dumps.
Same secrets for dev/stage/prod. Divide by environment.
Long-lived static passwords. Switch to dynamic/short-lived.
A single master key "for everything." Divide by tenant/project/service.
No hot-reload. The application requires a restart → the vulnerability window during rotation.
14) Examples of integrations (schematic)
Vault dynamic Postgres access
hcl
Vault: role -> issues the user to the database with TTL 30m and privileges only to the app path "database/creds/app-role" {
capabilities = ["read"]
}
Application requests/database/creds/app-role -> receives (user, pass, ttl)
JWT signature of requests (short term)
The private key is stored in Secret Manager; the service requests a short-lived signing-token and the local agent signs the payload (the key is not passed to the application as a string).
SSH certificates for admins
Issuing SSH-cert for 10 minutes via SSO (OIDC), without distributing permanent keys.
15) Safety around the edges
Logs/trails/metrics: sanitizers, filters for known keys/patterns; "secret" fields - masking in APM.
Dumps/Crash Reports: Cut by default; if necessary - encrypt and clean.
Client applications/mobile: minimize offline secrets, use platform storage (Keychain/Keystore), device binding, TLS-pinning with emergency rolling.
16) Compliance
PCI DSS: prohibit storing PAN/secrets without encryption; strict access control and rotation.
ISO 27001/SOC 2 - Asset Management, Logging, Access Control, Reconfiguration Requirements
GDPR/local regulators: minimization, access as needed, audit.
17) Processes and runbook
Commissioning
1. Inventory of secrets (repositories, CIs, images, runtime, backups).
2. Classification and tags (owner, environment, tenant, rotation-policy).
3. Vault/Cloud SM + KMS/HSM integration.
4. Set up output by workload identity (OIDC/SPIRE).
5. Enable dynamic secrets for DB/Cloud/PKI.
6. Auto-rotation and hot-reload; alerts on expiration.
7. Setting up leak scanners and Data Catalog/ET.
Emergency scenarios
Suspected leak: access stop list, immediate rotation, revoke certificates/keys, re-issue tokens, enable increased auditing, RCA.
Secret Manager is not available: local cache in memory with low TTL, function degradation, restriction of new connections, manual break-glass steps.
Root key compromise: key-hierarchy regeneration, rewrap of all DEKs, checking all exposures for the risk window.
18) Checklists
Before selling
- Secrets removed from code/images; leak scanners included.
- Dynamic mechanisms are enabled for critical secrets.
- Delivery via sidecar/CSI/tmpfs with hot-reload, no durable envvars.
- IAM/ABAC policies configured, bound to workload identity.
- Auto-rotation and dual versions (N, N-1) for compatibility.
- Metrics/alerts/audits enabled; degradation tests passed.
Operation
- Monthly Report: Owners, TTL, Expired Secrets, Unused.
- Periodic rotations and penetration tests of leakage paths (logs, dumps, artifacts).
- crypto-agility plan and emergency replacement of CA/roots.
19) FAQ
Q: Is Secret Manager without KMS enough?
A: For the basic level - yes, but it is better to use envelope encryption: KEK in KMS/HSM, secrets - wrapped. This simplifies feedback and compliance.
Q: What to choose - static or dynamics?
A: The default is dynamics. Leave static only where there are no supported providers, and burn TTL up to days/hours + automatic rotation.
Q: How to safely throw secrets into the microservice?
О: Workload identity → mTLS к Secret Manager → sidecar/CSI → файлы в tmpfs + hot-reload. No logs, no envvars "forever."
Q: Can I keep secrets in Kubernetes Secret?
A: Only with etcd encryption enabled with KMS provider and strict policies. Prefer external storage and CSI.
Q: How do you "crypto-erase" a tenant's access?
A: Revoke/block its policies in Secret Manager, invalidate all leases, key rotation/regeneration; when using KMS - disable unwrap of the corresponding KEK.
- "At Rest Encryption"
- "In Transit Encryption"
- "Key Management and Rotation"
- "S2S Authentication"
- "Sign and Verify Requests"