Shared Computing Resources
1) What are "shared computing resources"
Shared computing resources (GPUs) are a logically single pool of CPU/GPU/memory/disk/network/DA (data availability) provided to multiple roles (developers, node operators, data/content providers, analysts, ML teams) through standardized interfaces, policies and incentive economics. The goal is to increase utilization, reduce costs and provide predictable performance in multi-lease and inter-chain scenarios.
2) Resource taxonomy
Calculations: CPU (general purpose), GPU (training/inference), NPU/TPU (ML accelerators).
Memory and disks: RAM, local NVMe, object/block storage, caches (Redis/KeyDB).
Network: bandwidth, egress/ingress, QoS classes, private channels.
Data and DA: quotas for publications, replications, snapshots and storage of evidence.
Service limits: number of pods/containers, open files, descriptors, GPU micro division (MIG).
3) Workload types
Online/low latency: API, matchmaking, game/fintech circuits, cross-chain messaging.
Streaming/real-time: event processing, anti-fraud, telemetry, real-time analytics.
Batch: ETL/ELT, reporting, periodic calculations, preparation of features.
ML/AI: learning (GPU-intensive), inference (low latency/high conversion).
Storage and caches: OLTP/OLAP, lakehouse, CDN/edge cache.
SLOs, priorities, isolation and tariffs are set for each class.
4) Orchestration and planning
Sheduling by priority and QoS class: EDF/LLF for "deadlines," priority queues, guaranteed "minimums."
Resource requests: 'requests/limits' for CPU/Memory, GPU quotas and shares, preemptible/spot pools for savings.
Anti-noise: cgroup/compensation "noisy neighbor," NUMA pinning, network policies.
Topology and locality: data and computation co-location, affinity/anti-affinity, edge binding.
Autoscaling: horizontal (HPA), vertical (VPA), cluster (CA), autopilot for GPU/DA batches.
5) Multi-tenancy and isolation
Уровни: namespace→project→org (budget/quotas/ACL).
Isolation: containers, VM, sandboxes (gVisor/Firecracker), network (VPC/NetworkPolicy), storage (CSI policies).
Noise reduction policies: IOPS/egress limits, fair-share planning, dedicated tiers for critical services.
Error/resource budgets: per-tenant error budget and resource budget with auto-degradation.
6) QoS, Prioritization and SLO/SLA
QoS classes: Q4 (critical-pealtime), Q3 (ordered), Q2 (exactly-once-effective), Q1 (at-least-once), Q0 (best effort).
SLO examples: p95 latency API ≤ 200 ms (Q4), GPU waiting queue ≤ 2 minutes (Q3), batch to window T ≤ 30 minutes (Q1).
Contract QoS→resursy: guaranteed quotas and emergency "stop cranes" are assigned to each class.
7) Economics and monetization (billing/incentives)
Charging units: vCPU-sec, GiB-hours RAM, GPU-minutes, GB-storage-month, GB-egress, DA-byte/publication.
Tariff plans: pay-as-you-go, subscriptions with quotas and overspending, reservations (commit), spot/preemptible with discounts.
RevShare for hardware providers/data centers: share of turnover, SLA bonuses/fines.
Power marketplace: node/cluster listing, quality ratings, GPU slot auctions.
- U-token - payment of quotas/limits, discounts.
- S-token - pledges for SLA nodes/pools (slashing for downtime/violations).
- R-token - reputation of the provider/tenant (price/priority modifier).
- RNFT contracts - individual contracts "resurs↔obyazatelstvo" (limits, price, term, KPI, output).
8) Kernel contracts and services
Resource Registry: resource types, machine/GPU classes, accessible zones/edge-POP.
Quota Manager: quotas/limits per tenant/project, budget egress/IOPS/DA.
Scheduler/Placement: pods/jobs/pools, priorities, locality, anti-noise.
Billing & Metering: unit meters, tariffs, overspending, budget alerts.
Rewards Router: distribution of payments to providers, penalties for SLA breaks.
Compliance Gate: regions, personal data/personal data, age/CCM restrictions, export reports.
Observability Hub: metrics/trails/logs, DLQ for job, replays.
9) Safety and compliance
Authentication/authorization: mTLS/OIDC, ABAC/RBAC, "least privileges."
Network segmentation: VPC, private-link, service mash with traffic policies.
Data: at-rest/in-transit encryption, key rotation, masking/dummy data for tests.
GPU/CPU isolation: disabling direct access, DMA/IOMMU control, side-channel protection.
Compliance: audit log, regional localization of data, retention/deletion policies, ZK gaps for audits without disclosure.
10) Observability and performance management
Metrics: uCPU%, GPU-util, RAM/Cache hit, IOPS/throughput disk, p95 RTT/egress network, GPU/Batch-lag queue.
SLO/SLA-dashboards: "health" by QoS classes and tenants, error budgets.
Profiling: flamegraph snapshots, hot path analysis, automatic size recommendation.
Alerts: exceeding lags, overheating of GPU queues, egress explosion, "noisy neighbor" flags.
11) Anti-fraud and abuse
Sybil/bot load: S-pledges, R-reputation, behavioral signatures.
Egress abuse/network scanning: rate limits/IDS, quarantine segments.
Pharming spot discounts: anti-arb politicians, cooling, limits on "jumping" between pools.
Dishonest providers: control of declared specifications, synthetic samples, slashing and "black lists" RNFT.
12) Inter-chain scenarios (multi-chain/edge)
Transfer of access rights: RNFT rights and quotas are transferred through instant messaging, reputation (R) remains in the trust domain.
DA quotas and publications: charging per byte/frequency, finality/temporary locks.
Edge computing: POP nodes with local buffers, "pushing" the inference closer to the user.
X-domain dedup and idempotency: global 'x _ job _ id', seen tables at the ends, challenge periods.
13) Capacity Planning and Sustainability
Capacity planning: consumption trends, seasonality, stocks of N weeks, "red lines" p95.
Game-days and stress tests: GPU/egress/DA overload, AZ/POP shutdown, degradation scenarios.
Degradation by design: graceful fallback (less accurate models/cache), Q4/Q3 priorities.
Green efficiency: recycling, carbon-aware sheduling, cooling/energy cost, transferring batch to green windows.
14) Metrics and KPIs of the OVR ecosystem
Disposal: CPU/GPU busy%, RAM/Cache hit, IOPS/GB storage use.
Efficiency: cost-to-serve/request, spot disposal, margin/minute GPU.
Quality: p50/p95 latency by class, SLA breaks/1000 requests, queue/start time job.
Fairness: "noisy neighbor" index, share of incidents by tenant, allocation of quotas.
Economics: income/resource-unit, NRR/GRR according to plans, share of repeated revenue.
Safety: frequency of isolations, egress anomalies, reputational slash events.
15) 治理 (management) of resources
Parameter-proposals: change in tariffs/quotas/corridors through voting.
R-modifier: reputation limits the influence of "raw capital" in sensitive changes.
Sunset clauses: temporary promotions/discounts with auto-rollback.
Public reporting: quarterly reports of the OVR treasury, SLA audit.
16) Launch playbook
1. Mapping needs. Task classes, SLO, data locality.
2. Pools design. Machine classes, GPU tiers, storage/network levels, edge-POP.
3. Policies and quotas. QoS classes, budgets, egress/IOPS/DA limits.
4. Economics. Tariffs, spot/reserves, incentives to providers, RNFT contracts.
5. Safety and compliance. mTLS/OIDC, encryption, audit logs, geo-policies.
6. Observability. KPI/SLO dashboards, alerts, profiling.
7. Pilot and scaling. One class of tasks (for example, inference) → an extension to batch/streaming.
8. Incidents and post-mortems. Game-days, replays, policy/tariff adjustments.
17) Delivery checklist
- QoS/SLOs defined for all task types
- Quotas/limits and fair-share planning included
- Configured spot/preemptible pools and anti-arb policies
- Implemented RNFT contracts, billing and Rewards Router
- Isolation, encryption and compliance reporting provided
- Recycling/quality/economics dashboards available
- Accidents worked out: stop cranes, degradation, post-mortems
- Rights multi-chain transitions, DA quotas, edge distribution configured
18) Glossary
OVR (shared computing resources): A single pool of capacity for an ecosystem.
RNFT: contract- "relationship" for rights to resources/limits/deadlines.
S-token: collateral against SLA/provider/node liability.
R-token: non-transferable reputation for quality/reliability.
DA: data availability layer (publication/storage of evidence).
Spot/Preemptible: cheap but interruptible resources with renewal policies.
Bottom line: shared computing resources turn the ecosystem into a self-balancing computing factory, where recycling is high, quality is predictable, incentives are aligned, and security and compliance are built into the protocol. Proper orchestration, i治理 economics allow you to scale multi-lease loads without losing productivity and trust.