Common network benchmarks
1) Why do we need "general benchmarks"
Disparate metrics = disparate results and "honesty" disputes. Common benchmarks are standardized scenarios, loads, measurement techniques, and reporting forms that allow:- compare domains/nodes/providers by single SLO;
- Manage network settings (rates, quotas, limits) based on facts
- identify regressions before incidents in the product;
- make incentives (bonuses/penalties) and trust transparent.
2) Taxonomy of metrics
2. 1 Performance
Latency: p50/p95/p99, tails, cold-start.
Throughput: msgs/s, tx/s, GB/s (DA/storage), RPS (API).
Availability: SLO success, share of timeouts/retrays.
Ordering & Exactly-Once: out-of-order %, duplicate ratio.
2. 2 Reliability and stability
SLA breaks/1k events, MTBF/MTTR, QoS degradation.
Backpressure-efficiency: stabilization time after burst.
2. 3 Safety
Integrity/order theft incidents (bridge, x-domain).
Authentication/authorization quality: percentage of rejected/false tolerances.
Anti-fraud signals: TPR/FPR behavioral patterns.
2. 4 Economics
Cost-to-Serve/request, margin/message, revenue/DA byte.
Resource efficiency: CPU/GPU-util, IOPS/GB, egress/request.
Fairness: "noisy neighbor" index, quota allocation.
2. 5治理 and Processes
Parameter-convergence speed, success of recoilless releases,
processing time of proposals, share of votes with R-modifier.
3) Traffic profiles and QoS classes
Q4 (critical commands): small messages, strict deadlines.
Q3 (ordered flows): key-partitioning, order guarantee.
Q2 (exactly-once effective): idempotency + deadup.
Q1 (at-least-once): telemetry, mass events.
For each class, we set reference profiles: message size, frequencies, proportion of synchronous/asynchronous calls, bursts, correlations.
4) Bench Suite
1. Messaging Core: 1→N и N→1; growth of RPS to saturation; measurement of p95 and duplicate ratio.
2. Low-Latency API: read/write mix, cold/warm cache, limits and degradation.
3. DA/Storage: Publication batches, Throughput/GB metering and finalities.
4. X-Domain/Bridge: proofs, finality, challenge periods, losses/redeliveries.
5. ML-Inference Edge: POP latency/skip, overload degradation.
6. Batch & Stream: ETL windows, consumer lags, backpressure efficiency.
7. Security & Abuse: synthetic fraud patterns, anti-fraud load, FPR/TPR.
8. Failover/Chaos: AZ/pool off, stopcocks, SLO return time.
5) Measurement methodology
5. 1 Replicability
Fixed versions of schemas/SDK/configs; "seeded" load generators.
Warm-up ≥ N minutes; measurements in the stable phase ≥ M minutes.
Trace/span and log correlation.
5. 2 Honesty and anti-gaming
Split setup phase and blind-run (hidden load profile).
Hidden control tasks (checking cache "wrappers "/special optimizations for signatures).
Set of black tests: unexpected fields, microsplices, "rare" sizes.
5. 3 Formulas
SuccessRate = 1 − (timeouts + errors)/requests
TailAmplification = p99/p50, Headroom = (cap − current)/cap
Cost/Req = Σ (resource bid )/successful _ requests
FairnessIndex (Jain) for quotas/bands.
6) SLO and reference targets (benchmarks)
Q4 API: p95 ≤ 200 ms, success ≥ 99. 99%, errors ≤ 1/10⁴.
Messaging Q3: violation of the order of ≤ 10⁻⁶/soobshch., p95 ≤ 500 ms.
DA publications: finality ≤ 3 × T _ block, Throughput ≥ X GB/h.
Bridge: false confirmations = 0; MTTR abnormalities ≤ 1 h.
Stream: lag ≤ 2×window; drop = 0 for critical topics.
Batch: Window jabs fit into the T_window with a margin ≥ 20%.
7) Artifacts and report format
Passport of the run: versions, configs, date/time, geo.
Graphs: latency (pXX), throughput, lags, resource utilization.
SLO mapping tables: pass/fail + delta to reference.
Capital regressions: list with RCA and fix plan.
Economy: Cost-to-Serve, margin/message, hotspot-nodes.
Conclusion: "Ready for release/Tuning needed/Blocker" status.
8) Relationship with tariffs and limits
If TailAmplification grows → automatically lower quotas or increase the price of "noisy" tenants.
Nodes with SLA breaks lose their share of rewards (slashing) before recovery.
Domains with stable quality receive a reduced take-rate (quality bonus).
9) Observability of benchmarks
End-to-end tracing of all benchmark requests.
DLQ/Replay for failed events and idempotence confirmation.
Дашборды: BenchRun Live, Tail Heatmap, Backpressure Monitor, Bridge Risk, DA Throughput.
10) i治理 processes
Pre-release gate: can only be released when'SLO _ pass> = target threshold'and there are no security locks.
Change Impact: Each significant configuration/version passes a short smoke-bench.
Sunset-SLO: temporarily increased requirements for pilots; auto-rollback by date.
R-modifier of votes: in disputes about the metric, participants with a high R-reputation for quality have more weight.
11) Benchmark launch playbook
1. Collection of requirements: critical path circuits, QoS classes, business SLOs.
2. Profile design: message sizes, R/W mix, bursts, x-domain share.
3. Load tools: generators, data fixes, synthetic fraud patterns.
4. Observability: tracing, metrics, policy logs, error budget.
5. Benchmark targets: SLOs, economic thresholds, fairness corridors.
6. Pilot run: calibration, bottleneck detection, fix.
7. Regularization: nightly/weekly benchi + reporting in kaznacheystvo/治理.
8. Incidents: chaos supplements, post mortems, test updates.
12) Anti-gaming and measurement ethics
Prohibition of "special optimizations for the bench signature" without improving real production traffic.
Blind loads, random "noise" parameters, control events.
Public reports with methodology; arbitration committee for controversial cases.
13) Typical "red flags"
p95 is stable, but p99. 9 sharply growing → hidden competition for resources.
Throughput is high, but duplicate ratio ↑ → incorrect idempotency.
Good latency, but Cost/Req does not converge → cross-dependency/double entry.
Low lag, but DLQ depth is growing → errors in retras/quarantine.
14) Benchmarking Program KPI
Coverage: the proportion of critical paths with regular benches ≥ X%.
On-time report ≤ Y hours after the run.
Quality: number of regressions caught before the pre-incident; mean delta to SLO after fix.
Economy: Cost-to-Serve decline/inquiry and "noisy neighbour" numbers.
治理: rate of reactions on bench regression; transparency of public reports.
15) Delivery checklist
- Fixed load profiles and QoS classes
- Configured Trace, Metrics, DLQ/Replay
- SLOs/thresholds and fairness corridors defined
- Anti-gaming protection and blind tests enabled
- Report format and release gate process described
- Regular (nightly/weekly) runs
- Integrated chaos/failover unit
- Public post-mortems and performance test improvement
16) Glossary
Bench Suite: a set of reference scenarios and load profiles.
TailAmplification: p99/p50 ratio (tail strength).
FairnessIndex (Jain) -Resource uniformity metric.
DLQ/Replay: quarantine and reprocessing events.
SLO/SLA: target service levels/contractual guarantees.
Blind-run: a hidden run against anti-gaming.
Bottom line: common benchmarks turn network performance and stability into manageable parameters, linking technology and i治理 economics. Standardized scenarios, transparent reports and anti-gaming policies ensure comparability of results, member trust and ecosystem evolution without guesswork and "magic."