SOC Threat and Alert Monitoring
Brief Summary
Effective SOC is built on three pillars: completeness of telemetry, quality detection and operational discipline (prioritization, escalation, post-incident and improvement). The goal: to quickly identify intruders by behavioral and signature indicators, respond within SLO and minimize false positives without losing coverage.
SOC monitoring architecture
SIEM - event reception, normalization and correlation; dashboards, search, alerting.
UEBA - user/host behavioral analytics, baseline profiles, and anomalies.
SOAR - automation of response: enrichment of alerts (TI, CMDB), orchestration of containment actions.
TI (Threat Intelligence) - IOC/TTP/critical vulnerability feeds; context for rules and enrichment.
Storage - "hot" 7-30 days for investigations, "cold" 90-365 + for compliance/retrospective.
Log sources (minimum sufficient)
Identity and access:- IdP/SSO (OIDC/SAML), MFA, PAM, VPN/ZTNA, directories (AD/AAD).
- EDR/AV, Sysmon/ETW (Windows), auditd/eBPF (Linux), MDM (mobiles).
- Firewalls (L3/L7), WAF/WAAP, balancers (NGINX/Envoy), DNS, proxy, NetFlow/sFlow/Zeek.
- CloudTrail/Activity Logs, KMS/Key Vault, IAM events, Kubernetes (audit, API server), container security.
- Admin audit, access to PII/payments, DDL/rights, critical business events (withdraw, bonus, payout).
- Phishing/spam detection, DLP, URL clicks, attachments.
Normalization: single format (for example, ECS/CEF), mandatory fields: 'timestamp', 'src/dst ip', 'user', 'action', 'resource', 'result', 'request _ id/trace _ id'.
Threat taxonomy and ATT&CK mapping
Build rules and dashboards in the MITRE ATT&CK section: Initial Access, Execution, Persistence, Privilege Escalation, Defense Evasion, Credential Access, Discovery, Lateral Movement, C2, Collection/Exfiltration/Impact.
For each tactic - minimum detections and control panels "coverage vs. fidelity."
Alert policy and prioritization
Severity:- P1 (Critical): active C2, successful ATO/token theft, encryption, payment exfiltration/PII.
- P2 (High): implementation in infrastructure/cloud, escalation of privileges, bypassing MFA.
- P3 (Medium): suspicious anomaly, repeated unsuccessful attempts, rare behavior.
- P4 (Low): noise, hypotheses, TI matches without confirmation.
- Escalation: P1 - immediately on-call (24 × 7), P2 - during working hours ≤ 1 hour, the rest - through queues.
- Roll-up: Aggregate alerts by object/session to avoid "storm."
SLI/SLO/SLA SOC
SLI: detection time (MTTD), confirmation time (MTTA), time to containment (MTTC), proportion of false positive (FP) and missed (FN) on scenario clusters.
SLO (examples):- MTTD P1 ≤ 5 min; MTTC P1 ≤ 30 min.
- FP-rate according to high-severity rules ≤ 2 %/day.
- Coverage of key ATT&CK techniques ≥ 90% (presence of at least one detection).
- SLA (external): coordinate with business (e.g. P1 notification of owners ≤ 15 min).
Detection rules: signatures, heuristics, behavior
Sigma (example: suspicious access to the admin panel outside the country)
yaml title: Admin Panel Access Outside Allowed Country id: 5c6c9c1d-8f85-4a0e-8c3a-1b7cabc0b001 status: stable logsource:
product: webserver detection:
selection:
http. request. uri: /admin http. response. status: 200 filter_country:
geoip. country: /^(?!UA PL GE)$/
condition: selection and filter_country level: high fields: [user, ip, geoip. country, user_agent, trace_id]
falsepositives:
- Service connections from SOC/VPN (add allow-list)
KQL (example: surge of failed logins + different accounts from the same IP)
kusto
SigninLogs where ResultType!= 0 summarize fails=count(), users=dcount(UserPrincipalName) by bin(TimeGenerated, 10m), IPAddress where fails > 50 and users > 5
Application (SQL, off-schedule PII access)
sql
SELECT user_id, count() AS reads
FROM audit_pii
WHERE ts > now() - interval '1 hour'
AND user_id NOT IN (SELECT user_id FROM roster WHERE role IN ('DPO','Support'))
AND extract(hour from ts) NOT BETWEEN 8 AND 21
GROUP BY 1 HAVING count() > 5;
UEBA and Context
Basic activity profiles by user/role/service (clock, ASN, device).
Anomalies: rare IP/ASN, new device, unusual API sequences, sharp change in activity time.
Risk score events = signals (TI, anomaly, resource sensitivity) × weights.
SOAR and response automation
Enrichment: TI reputation of IP/domain/hash, CMDB (who owns the host/service), HR (employee status), IAM role.
Actions: host isolation (EDR), IP/ASN/JA3 blocking, temporary withdrawal of tokens/sessions, forced rotation of secrets, prohibition of withdrawal of funds/freezing of bonuses.
Guard rails: for critical actions - two-factor apparatus; TTL on locks.
SOC Processes
1. Triage: context checking, deduplication, TI reconciliation, primary ATT&CK classification.
2. Investigation: collection of artifacts (PCAP/EDR/logs), hypotheses, timeline, damage assessment.
3. Containment/Eradication: isolation, key/token revocation, patching, locks.
4. Recovery: cleanliness control, rotation, recurrence monitoring.
5. RCA/Lessons: Post-Incident, Update Rules/Dashboards, Add Test Cases.
Tuning and quality of detections
Shadow mode for new rules: read, but not block.
Region pack: a library of "good/bad" events for CI rule tests.
FP remediation: exclusions by pathway/role/ASN; the "evil by default" rule is only after canaries.
Drift monitoring: change in baseline activity → adaptation of thresholds/models.
Dashboards and reviews
Operational: active alerts, P1/P2, attack map (geo/ASN), "top talkers," TI-match tape.
Tactical: ATT&CK coverage, FP/FN trends, MTTD/MTTC, noisy sources.
Business: incidents by product/region, impact on KPIs (conversion, Time-to-Wallet, payment failures).
Storage, privacy and compliance
Retention: at least 90 days of "warm" logs, ≥ 1 year archive where required (fintech/regulators).
PII/secrets: tokenization/masking, role access, encryption.
Legal requirements: incident reporting, retention of decision chains, clock consistency (NTP).
Purple Team and Coverage Check
Threat hunting: TTP hypotheses (e.g. T1059 PowerShell), ad-hoc queries in SIEM.
Purple Team: Red + Blue joint sprints - running TTP, checking triggers, finalizing rules.
Auto tests of detections: periodic re-play of reference events (atomic tests) in non-prod and "shadow" prod.
iGaming/fintech specificity
Critical domains: login/registration, deposits/conclusions, promo, access to PII/fin. reports.
Scenarios: ATO/credential stuffing, card testing, bonus abuse, insider access to payments.
Rules: velocity to '/login ', '/withdraw', idempotency and HMAC of webhooks, mTLS to PSP, detection to access tables with PAN/PII.
Business triggers: a sharp increase in payment failures/chargeback, anomalies in conversions, bursts of "zero" deposits.
Examples of runbooks (abbreviated)
P1: Confirmed ATO and withdrawals
1. SOAR blocks the session, recalls refresh tokens, freezes pins (TTL 24 h).
2. Notify the owner of the product/finance; start password reset/2FA-rebind.
3. Check neighboring accounts by device/IP/ASN column; expand the block by clusters.
4. RCA: add repeat detections, increase velocity threshold to '/withdraw '.
P2: Execution on server (T1059)
1. EDR isolation, memory/artifact removal.
2. Inventory of the latest deposits/secrets; key rotation.
3. IOC fleet hunting; checking C2 in DNS/Proxy.
4. Post-incident: Rule "Parent = nginx → bash" + Sigma for Sysmon/Linux-audit.
Frequent mistakes
SIEM overload with noise without normalization and TTL.
Unmapped detections on ATT&CK → blind spots.
No SOAR/enrichment - long MTTA, manual routines.
Ignoring UEBA/behavior - skipping "slow" insiders.
Rigid global TI blocks without TTL → cut business traffic.
Lack of regression tests of rules.
Implementation Roadmap
1. Log inventory and normalization (ECS/CEF), "minimum set."
2. ATT&CK coating matrix and basic high-risk detections.
3. SLO and queues: P1-P4, on-call and escalation.
4. SOAR playbooks: enrichment, containment actions, TTL blocks.
5. UEBA and risk scoring: profiles, anomalies, drift monitoring.
6. Purple Team/Detect tests: shadow mode, canaries, regression pack.
7. Reporting and compliance: retention, privacy, business dashboards.
Result
Mature SOC is complete telemetry + qualitative detections + response discipline. Link rules to MITRE ATT&CK, automate enrichment and containment in SOAR, measure the result with SLO metrics, regularly check coverage on the Purple Team - and your monitoring will be noise resistant, respond quickly to real threats and maintain business metrics.