TLS certificates and automatic renewal
Why do you need it?
TLS encrypts traffic "kliyent↔servis," confirms the authenticity of the server (and with mTLS - client), and also protects against spoofing. The main risks: certificate delays, weak keys, incorrect trust chain, manual procedures. The purpose of the article is to describe the architecture in which certificates are always relevant and rotations pass unnoticed by users.
Basic concepts
CA/Signatory: certification authority (public or internal).
Chain (fullchain): leaf certificate + intermediate + root (usually root in client repositories).
SAN (Subject Alternative Name): list of domains/IP for one certificate (multi-SAN).
Wildcard: `.example. com '- convenient for many subdomains, requires DNS validation.
OCSP stapling: the server applies the latest revocation status; reduces latency and dependency on external OCSPs.
HPKP: obsolete/not used; instead, CT logs and key hygiene.
CT (Certificate Transparency): public issuance logs - important for controlling fake releases.
Crypto profile and keys
Algorithms:- ECDSA (P-256) - fast and compact; preferred for modern customers.
- RSA-2048/3072 - still compatible; can be held dual-cert (RSA + ECDSA).
- Key generation: only on the target side (do not transfer privateers over the network), protect access rights ('0600').
- HSM/KMS: for critical areas (payment/PII) store keys in HSM/KMS, enable audit operations.
- Lifetimes: Short certificates (90 days/30 days for internal) encourage frequent rotation and reduce the risk of compromise.
Architectural models of TLS management
1. Public CA via ACME (Let's Encrypt/Buypass/etc.)
Validation: HTTP-01 (via web server/Ingress) or DNS-01 (for wildcard/out-of-stream domains).
Pros: free/automated, broad trust. Cons: external dependencies.
2. Internal Corporate CA
Tools: HashiCorp Vault PKI, Smallstep (step-ca), Microsoft AD CS, CFSSL.
Pros: custom policies, mTLS, short TTL, release for internal domains. Cons: root distribution, trust management.
3. Hybrid
Public CA for external users; internal CA - for service-to-service (mTLS), inter-cluster channels and admins.
Automatic renewal patterns (renew)
General principles
Renewal threshold: start at '≤ 30' days before expiry; for critical services - at '≤ 45' days.
Zero-downtime: issue a new certificate, atomic replacement, smooth reload without breaking connections.
Double hold (blue/green): store the current and next cert; switching - via symlink or versioned secret.
Alerting: 45/30/14/7/3/1 day warnings; a separate alert during the failure of the ACME challenge.
ACME clients and their application
certbot / acme. sh/lego: light agents on VM/bare-metal.
cert-manager (Kubernetes): operator working with Issuer/ClusterIssuer; automates release/renew and writes to Secret.
step-ca/Vault Agent: automatic release/rotation with short TTLs, sidecar patterns for updating keys and chains.
Processes for Kubernetes
cert-manager (Issuer example for Let's Encrypt HTTP-01 via Ingress):yaml apiVersion: cert-manager. io/v1 kind: ClusterIssuer metadata:
name: le-http01 spec:
acme:
email: devops@example. com server: https://acme-v02. api. letsencrypt. org/directory privateKeySecretRef:
name: le-account-key solvers:
- http01:
ingress:
class: nginx
Certificate request:
yaml apiVersion: cert-manager. io/v1 kind: Certificate metadata:
name: app-cert namespace: prod spec:
secretName: app-tls dnsNames:
- app. example. com issuerRef:
name: le-http01 kind: ClusterIssuer privateKey:
algorithm: ECDSA size: 256 renewBefore: 720h # 30 дней
Hot swapping in NGINX-Ingress occurs automatically when'Secret 'is updated. Add 'ssl-ecdh-curve: secp256r1' and enable OCSP stapling via/ConfigMap annotations.
Processes for VM/Bare-metal
Certbot (HTTP-01):bash sudo certbot certonly --webroot -w /var/www/html -d example. com -d www.example. com \
--deploy-hook "systemctl reload nginx"
Periodic 'certbot renew' via systemd timer.
For wildcard, use DNS-01 (plugin provider) and similar '--deploy-hook'.
bash export CF_Token="" # example for Cloudflare acme. sh --issue --dns dns_cf -d example. com -d '.example. com' \
--keylength ec-256 --ecc \
--reloadcmd "systemctl reload nginx"
NGINX Atomic Replacement
Keep'fullchain. pem` и `privkey. pem 'under stable paths (symlink to versioned files), then' nginx -s reload '.
Internal PKI and mTLS
HashiCorp Vault PKI (sample role):bash vault secrets enable pki vault secrets tune -max-lease-ttl=87600h pki vault write pki/root/generate/internal common_name="Corp Root CA" ttl=87600h vault write pki/roles/service \
allowed_domains="svc. cluster. local,internal. example" allow_subdomains=true \
max_ttl="720h" require_cn=false key_type="ec" key_bits=256
Auto-release: via Vault Agent Injector (K8s) or sidecar; the application re-reads cert from the/FS-watcher file.
Short TTL: 24-720 hours, which encourages frequent rotation and reduces the value of the stolen key.
mTLS: issue client certificates for specific services/roles; at the input - mutual TLS in ingress/sidecar-proxy.
Safe operation
Sharing secrets: private keys - only on the host/pod, access according to the principle of least privileges.
File rights: '600' for key; owner - process user.
Grace period: Set 'renewBefore' to be sufficient to account for DNS/ACME/provider failures.
OCSP Stapling: turn on at the fronts; monitor the freshness of the response (usually 12-72 hours).
HSTS: turn on gradually (without 'preload' at the start), making sure the correct HTTPS delivery of all content.
Dual-cert (RSA + ECDSA): improves compatibility and performance; Give ECDSA to modern customers.
Monitoring and SLO
Metrics and checks:- Days before expiration (gauge) for each domain/secret; SLO: "no cert from <7 days to expiry."
- Chain validity (linting), SAN compliance with the required domains/IP.
- OCSP stapling status (freshness of response).
- Percentage of successful/unsuccessful ACME Challenges.
- Leitency TLS handshakes, protocol versions/ciphers (audit).
- Warn: 30 days until expiration.
- Crit: 7 days/failure 'renew'.
- Page: 72 hours/invalid chain in the prod/no OCSP stapling.
Incidents and rollbacks
Certificate delay: temporarily reissue and deploy manually, fix RCA (why renew did not work, DNS blocking/API restrictions).
Key compromise: immediate reissue/revocation, rotation of secrets, access audit, rotation of DNS provider/ACME account tokens.
Incorrect chain: urgent deposit of the correct 'fullchain', forced reload of fronts.
Lock-in to DNS provider: keep the backup validation path (HTTP-01) or secondary DNS.
Auto-renewal implementation checklist
1. Select the model (public CA via ACME/internal PKI/hybrid).
2. Define the crypto profile: ECDSA-P256, if necessary dual-cert with RSA-2048.
3. Configure the automatic agent (cert-manager, certbot, acme. sh, Vault Agent).
4. Organize zero-downtime replacement (symlink pattern, hot-reload ingress/NGINX/Envoy).
5. Turn on OCSP stapling and HSTS (in stages).
6. Add alert dates and challenge statuses; prescribe SLO.
7. Document the break-glass and manual release processes.
8. Conduct "fake" exercises: broken DNS-01, ACME fall, expired root/intermediate.
9. Review access to private keys, rotate DNS provider tokens and ACME accounts.
Features for iGaming/fintech
PCI DSS/PII: strict Cipher Suites, forced TLS 1. 2+/1. 3, turn off weak ciphers/compression, session resumption without security compromises.
Domain segmentation: separate certificates for payment subdomains and admins; for content providers - isolated chains.
Audit and logging: record release/recall/rotation; sign CI/CD artifacts.
Multiregionality: local Issuers to regions so as not to depend on cross-regional failures.
Sample Configurations
NGINX (RSA+ECDSA, OCSP stapling)
nginx ssl_protocols TLSv1. 2 TLSv1. 3;
ssl_ciphers HIGH:!aNULL:!MD5;
ssl_ecdh_curve secp256r1;
ssl_certificate /etc/nginx/certs/app_ecdsa/fullchain. pem;
ssl_certificate_key /etc/nginx/certs/app_ecdsa/privkey. pem;
ssl_certificate /etc/nginx/certs/app_rsa/fullchain. pem;
ssl_certificate_key /etc/nginx/certs/app_rsa/privkey. pem;
ssl_stapling on;
ssl_stapling_verify on;
add_header Strict-Transport-Security "max-age=31536000" always;
OpenSSL: CSR (ECDSA-P256)
bash openssl ecparam -name prime256v1 -genkey -noout -out privkey. pem openssl req -new -key privkey. pem -out csr. pem -subj "/CN=app. example. com" \
-addext "subjectAltName=DNS:app. example. com,DNS:www.example. com"
CFSSL: profile and issuance
json
{
"signing": {
"profiles": {
"server": {
"usages": ["digital signature","key encipherment","server auth"],
"expiry": "2160h"
}
}
}
}
bash cfssl gencert -profile=server ca. json csr. json cfssljson -bare server
FAQ
Do I need a wildcard?
If new subdomains often appear, yes (via DNS-01). Otherwise, use multi-SAN for explicit domains.
What to choose: cert-manager or certbot?
Kubernetes → cert-manager. VM/microservices out of K8s → certbot/lego/acme. sh. Internal PKI → Vault/step-ca.
Can TTL be reduced to a day?
For internal mTLS, yes, if automation/sidecar guarantees rotation and applications can hot-reload.
How to secure DNS-01?
Separate token/minimal access to the zone, key rotation, restrict IP API access, audit.
Total
Reliable TLS management is a combination of the correct crypto profile, automated release and renewal, zero-downtime rotations, observability, and clear incident-response procedures. Build an ACME/PXI pipeline, add strict alert and regularly train "emergency" scenarios - and expired certificates will no longer be the source of night pagers.