Bot detection and anti-fraud logic

Brief Summary

Effective protection against bots and fraud is a combination of layers: signal collection (client, network, device, behavior), real-time risk scoring, rules (deterministic) + ML models (probabilistic), graph analysis of connections and strict escalation processes. The goal is to block harm while preserving UX and conversion.

Threats and vectors

Bots and scrapers: registration, login search, farm promotional codes, promotion of balances, auto-creation of applications/rates.
Account Takeover (ATO): credential stuffing, phishing, session theft.
Payment fraud: stolen cards, limit testing, chargeback farming.
Bonus abuse: multiaccounting, "families" of devices/addresses, proxies/emulators.
Affiliate/CPA abuse: fake registrations/deposits, click fraud.

Anti-bots/anti-fraud stack architecture

Layers and components:

1. Sensors and telemetry: front-JS/SDK (human signals), mobile SDK, network/HTTP metrics, backend events.

2. Feature Store (online/offline): normalization, aggregates per T + N windows (1 min, 1 h, 24 h).

3. Real-time engine: rules + ML inference (low latency), orchestration of challenges.

4. Graph engine: user connections by devices, payments, IP/ASN, cookies, addresses.

5. Incident storage and markup: active model training, RCA.

6. Answer orchestrator: block/challenge/freeze/limit/manual check.

7. Observability/SLO: quality metrics (TP/FP/FN), decision time, impact on conversion.

Signals and fingerprints

Client and Device

Device fingerprint: User-Agent derivations, platform/CPU/GPU, Canvas/WebGL rendering, fonts, timezone, language, sensors; rotation resistance.
Browser dynamics: mouse/touch events, input speed/rhythm, focus/blur, scrolling, transition sequences, idle patterns.
Mobile metrics: jailbreak/root, emulator features, debug flags, SDK signals.
Network: IP/ASN/geo, proxy/VPN/hosting-ASN, IP shift frequency, RTT stability, JA3/TLS fingerprints.

Behavior and business context

Velocity metrics (registrations/logins/deposits/rates per window).
Anomalies of time zones/locales/currencies, mismatch of geo device.
Repeating path/query patterns, form sequences (typical of scripts).
Economics of action: LTV mismatch, unnatural promo/inference combinations.

Graph Analysis (Families and Clusters)

Tops: users, devices, IP/ASN, payment tools, addresses, cookies.

Ribs: "logged in with," "paid through," "shared the device," "matched fingerprint."

Examples of rules:

'k-core ≥ 3'users per payment instrument → manual verification.

💡 X connectivity component created in <24 h → promo freeze and KYC review.

High centralization by IP-node (Gini-index) in the registration area → anti-boat challenge.

Rules (deterministic) and scoring (ML)

Characteristics of the hybrid approach

Rules: fast and explainable (CUS/compliance, head-on block).
ML: catches "grey areas" and new patterns; work in shadow mode before enabling actions.

Typical rules (example pseudocode)

yaml
- id: ATO_LoginBurst when:
path: "/login"
failures_last_10m_by_ip > 20 distinct_accounts_last_10m_by_ip > 5 action: challenge_mfa

- id: Bonus_MultiAccount when:
promo_code = "WELCOME100"
devices_shared_with_accounts >= 2 first_deposit_time_delta < 10m action: freeze_bonus_and_review

- id: Payment_CardTesting when:
card_decline_rate_30m_by_ip > 0. 6 unique_cards_attempted_30m_by_ip > 5 action: block_24h_and_notify

ML features (with examples)

Time: frequencies/intervals, seasonality by hour/day.
Categorical: ASN, country, device, browser.
Graph: node degree, clustering coefficient, IP node/device pagerank.
Technical: session length, entropy of input data, rarity of click sequences.
Financial: average check, variance, time-to-within, share of payment refusals.

Response orchestration

Soft: JS-challenge, proof-of-work, revalidation of e-mail/phone, speed limit/quota.
Strong: MFA/JIT-KYC, temporary funds/bonus freeze, temporary ban.
Adaptive: threshold growth at high-risk (TOR/hosting ASN), grace lists for VIP/partners.
UX principles: invisible checks by default; explicit challenges - risk only.

Anti-fraud for promo and gaming

Promo-integration: limits on promo per-device/per-payment-instrument; promo bundle with KYC status.
Multiaccounting: device/IP graphs, similarity of behavioral trajectories; "family" → reward limit/freeze.
Boosting winnings: abnormal correlation of bets between related accounts → investigation.
iGaming KPI: conversion protection (registratsiya→depozit), Time-to-Wallet; do not "choke" the legit players.

Payment anti-fraud (in short)

3-D Secure/multifactor: dynamic by risk.
mTLS/signature of PSP webhooks: mandatory.
Idempotence: key on withdrawal/deposit operations.
Payment signals: BIN/issuer, AVS/CVV results, failure rate, geo-discrepancy.

Data, fichester, aggregation windows

Online aggregates (low-latency): 1/5/15 minutes for velocity, uniqueness, failures.
Near-real-time: 1-24 hours for promo and bonus logic.
Offline features: 7-90 days to train models.
Data quality: event deduplication, re-delivery protection, validation schemes.

Observability, SLO and quality metrics

Technical SLI/SLO:

p95 decision making (anti-fraud) ≤ 50 ms on critical paths (login, deposits).
Scoring engine availability ≥ 99. 95 %/month
The proportion of "incognito" events without features ≤ 0. 1%.

Anti-fraud quality:

TP/FP/FN for ATO scenarios/promo/payments; business-cost FP.
Conversion impact (Δ registratsii→depozit, Δ checkout success).
Hit-rate challenges (how many challenges confirm the risk).
Drift monitoring (features/scores/latency).

Privacy and compliance

Data minimization: store exactly what you need; PII - tokenize/encrypt.
Transparency: explainability of decisions (especially in case of failures and restrictions).
GDPR/PCI DSS: data domain segmentation, access by role only; logging access and rule changes.
Ethics and bias: regular audit of feature/thresholds for discrimination.

Operations and Incidents

Runbooks: ATO spike, card-testing, promo storm, SDK degradation.
Feature flags: fast weakening/strengthening of rules, switching models, "kill-switch" challenges.
Teachings: replay of historical attacks, "gray" campaigns, sudden drift of signs.
RCA/markup: mark and return border cases to training-dataset (active learning).

Examples of artifacts

1) SQL scoring aggregates (concept)

sql
-- velocity of logins by IP in 10 minutes
SELECT COUNT() AS logins_10m
FROM auth_events
WHERE ip =:ip AND ts > now() - interval '10 minutes';

-- unique accounts by device_id in 24 hours
SELECT COUNT(DISTINCT user_id) AS accounts_24h
FROM sessions
WHERE device_id =:device_id AND ts > now() - interval '24 hours';

2) Rule in OPA/Rego (simplified)

rego package antifraud. login

default action:= "allow"

high_risk_ip {
input. ip. asn in {"AS9009, ""AS14061,"" AS16509"} # example input. metrics. failures_10m_by_ip > 20 input. metrics. distinct_accounts_10m_by_ip > 5
}

action:= "challenge_mfa" { high_risk_ip }

3) Pseudocode of challenge orchestration

python risk = score(features) # 0..1 if risk >= 0. 9: block()
elif risk >= 0. 7: challenge("MFA")
elif risk >= 0. 5: throttle(rate="low")
else: allow()

Common errors

Bet only on captcha: bots bypass it; need a multifactorial signal stack.
Long scoring delays: UX breaks, failure grows.
Global IP/ASN bans forever: cuts legit traffic; use TTL and revision.

No graph: Multi-accounts remain "invisible."

Tough rules without canaries/shadow: FP surge in sales.
Zero feedback cycle: models are not retrained, rules are not updated.

Implementation Roadmap

1. Inventory of risk paths: registration, login, promo, deposits/conclusions.
2. Signal and SDK collection: front-JS/mobile, network, server events; single scheme.
3. Online fichestore: 1/5/15/60 minute windows; deduplication and SLA feature.
4. Basic rule profile: velocity + anomalies + simple graph heuristics.
5. ML in shadow mode: compare ROC/PR, evaluate business effect, include partially.
6. Graph analysis: family clustering, auto-marking with manual confirmation.
7. Orchestration of answers: matrix (risk×stsenary→deystviye), A/B control on UX.
8. Observability and SLO: dashboards of quality and technique, alerting, post-incident test case pools.
9. Privacy/compliance: PII minimization, tokenization, role access, reporting.

Result

A strong anti-fraud system is a multi-layered and adaptive circuit where sensors and behavior turn into features, decisions are made by a hybrid of rules and ML, and the connection graph reveals families of abuse. Add real-time orchestration of responses, observability with SLO and privacy - and you balance security, UX and business metrics even under pressure from well-organized bots and fraud networks.