Data flows between nodes
(Section: Ecosystem and Network)
1) Essence and goals
Data flows between nodes are managed channels of events, states, and artifacts between ecosystem roles (validators/readers/indexers/bridges/gateways/storages/analytics). Objectives:- Predictability: Stable SLOs by delay/success/freshness.
- Reliability: resistance to losses, duplicates, reorgs.
- Security and compliance: encryption, signatures, residency.
- Scalability: geo-distribution, partitioning, QoS.
2) Flow taxonomy
1. Control Plane: configs, phicheflags, routing/limit policies.
2. Data Plane - event: domain events ('deposit.', 'payout.', 'bridge.').
3. Data Plane - stream: long-lived streams (gRPC/WebSocket) for signals and live metrics.
4. Batch/Backfill: downloads of historical slices, replays, snapshots.
5. Replication/anti-entropy: state sync, merclization, CRDT streams.
6. Telemetry/observability: logs/metrics/trails side-band, do not interfere with the main UX.
Each type has QoS classes and its own retray/order rules.
3) Topologies and routing
Hub-and-Spoke: regional hubs as tires; plays - role nodes.
Mesh/P2P: partial mesh for replication/gossip.
Edge-Tiered: thin edge gateways (rate-limit/cache) → thick regional clusters.
Geo-Routing: Anycast/Latency-Aware LB + residency rules.
Key - partitioning: 'partition _ key = chainId' tenant 'topic' entityId 'gives predictable order and scale.
4) Transport and formats
HTTP/2/3, gRPC/QUIC - low latency, multiplexing, keepalive.
Kafka/Pulsar/NATS - queues with persistence/parties/consumer groups.
WebSocket - push events and live feeds.
Formats: Protobuf/Avro (schemes with evolution), JSON for external APIs.
Hash addressing and Merkle receipts for integrity verification.
5) Order, delivery and finalization
Delivery model:- At-least-once (default; idempotency/deadup required).
- Exactly-once effect via Outbox/Inbox + idempotent consumer.
- Order: guaranteed within the party; inter-party order is not guaranteed.
- Finalization: statuses' observed → confirmed (K) → finalized → invalid (reorg) '; for optimistic - dispute window.
6) Idempotence and dedup
Idempotence key for events:- `idempotency_key = ${chainId}|${block}|${tx}|${logIndex}|${type}`
- Upsert by key, TTL of the deduplication window ≥ 72 hours.
- For a conflict, payload is the "source of truth" policy (priority, version, signature).
- For HTTP requests, the header is'Idempotency-Key '+ response log.
7) Queues, backpressure and quotas
Queues: parties by key; DLQ for "poisonous" messages.
Backpressure: credits/tokens, max-inflight limit, circuit-breaker.
Quotas/QoS: P0 (critical), P1 (product), P2 (bulk). Split pools/RPS limits/bytes/s/subscriptions.
Admission control: early refusal of "expensive" requests, guard by range/size.
8) Consistency and data models
Read-you-write within the party/node.
Eventual Consistency between regions/parties.
CRDT for conflict-free replication of some sets (counters, sets).
Snapshots + logs for fast bootstrap and deterministic replay.
9) Security and trust
mTLS between nodes, key pinning, rotation.
Message/webhook signatures, timestamp and anti-replay windows.
Encryption on the go/at rest; segregation of regional keys.
PII-minimization: tokenization, prohibition of personal data in labels/metrics.
10) Efficiency: packaging, compression, cache
Batching: grouping of small messages to reduce overhead.
Compression: zstd/gzip with safe dictionaries.
Cash: negative answers and "hot" directories; TTL and disability by event.
11) Data diagrams (references)
Flow/lot register
sql
CREATE TABLE streams (
name TEXT PRIMARY KEY,
partitions INT,
qos TEXT, -- P0 P1 P2 retention_days INT,
schema_version TEXT
);
CREATE TABLE offsets (
stream TEXT, partition INT, consumer_group TEXT,
offset BIGINT, updated_at TIMESTAMPTZ,
PRIMARY KEY (stream, partition, consumer_group)
);
Event log (idempotent upsert)
sql
CREATE TABLE events_core (
id UUID PRIMARY KEY,
idempotency_key TEXT UNIQUE,
ts TIMESTAMPTZ,
partition_key TEXT,
type TEXT,
payload JSONB,
status TEXT, -- observed confirmed finalized invalidated signature TEXT
);
DLQ/Quarantine
sql
CREATE TABLE dlq (
id UUID PRIMARY KEY,
stream TEXT, partition INT, offset BIGINT,
reason TEXT, payload JSONB, ts TIMESTAMPTZ
);
12) Policies (YAML)
QoS and Limits
yaml qos:
P0: { ack_timeout_ms: 2000, retries: 3, backoff_ms: [100,400,800], rps_per_org: 1500 }
P1: { ack_timeout_ms: 5000, retries: 2, rps_per_org: 800 }
P2: { best_effort: true, rps_per_org: 200 }
limits:
max_message_bytes: 1048576 max_stream_subscriptions_per_client: 20
Finalization and windows
yaml finality:
eth-mainnet: { k: 12 }
polygon: { k: 256 }
optimistic: { k: 0, challenge_minutes: 20 }
Routing/Residency
yaml routing:
prefer_local_region: true fallback: [nearest_healthy, master_hub]
residency:
eu: ["eu"]
uk: ["uk"]
13) Observability: SLI/SLO
SLI (core):- Latency p95/p99 (ingress→egress, per-stream/QoS).
- Success Rate / Drop Rate.
- Queue Lag p95 and consumer lag by party.
- Freshness p95 (ingest→consume).
- Reorg/Invalid Rate (if onchain).
- Dedup Efficiency (% of takes absorbed idempotently).
- Geo-Hit Ratio (locally serviced).
- P0 latency p95 ≤ 400 ms; Success ≥ 99. 95%; Queue-lag p95 ≤ 2 с; Freshness p95 ≤ 60 с.
- Dedup efficiency ≥ 99%; DLQ ≤ 0. 1% of traffic.
Dashboards: Streams Core/Lag & Freshness/QoS & Errors/Geo/Security (mTLS/signatures).
14) Consumer patterns
Outbox/Inbox: atomic publishing and idempotent application.
Exactly-once effect: Store the last applied key and version.
Watermarks: late data.
Idempotent Side-Effects: external queries with only key and response log.
15) Degradation modes
Finalized-only mode: we issue only finalized events.
Cache-only for reference books, freezing heavy methods.
Throttle P2 and "diet mode" for streams (reduced refresh rate).
Read-only for secondary APIs.
16) Downtime-free releases and migrations
Blue-Green/Canary by Flows and Consumers.
Schema-first: add fields only; MAJOR - parallel versions of topics.
Offset migrations: shadow consumers, lag/success comparison, switching.
17) Operating Regulations
Daily: SLO report (latency/success/lag/freshness), signature audit, DLQ check.
Weekly: revision of batches/quotas, DR test (bootstrap from snapshot), Dedup Efficiency analysis.
Monthly: chaos tests (loss/jitter, broker failure, reorg-burst), revision of finality windows.
Before release: canary ≥120 min, SLO gates, rollback plan.
18) Playbook incidents
A. Queue-Lag/Consumer-Lag explosion
1. Increase consumers/KEDA; 2) redistribute parties; 3) freeze P2 and bulk jobs; 4) analysis of "hot" keys.
B. Growth of p95 Latency P0
1. P2-throttle, P0 prioritization; 2) scale gateways/brokers; 3) cache for reference books only; 4) outlier-ejection.
C. High DLQ/dubbing
1. Check the idempotence key/TTL; 2) strengthen dedup; 3) limit the noisy producer; 4) replay after fix.
D. Drift schemes/contracts
1. Enable strict-mode (cut off invalid ones); 2) notify the producer; 3) release the adapter; 4) update linters.
E. Violation of residency/signatures
1. Export/channel unit; 2) rotation of keys/serts; 3) audit and post-mortem; 4) updating policies.
19) Implementation checklist
1. Define the stream types and the partitioning key.
2. Enable idempotence/dedup and finalization with K/dispute windows.
3. Configure queues, QoS, quotas, and backpressure.
4. Run mTLS/Signatures and Residence Policy.
5. Enter schemas/registers (streams, offsets, dlq) and SLI/SLO telemetry.
6. Organize canary/blue-green and downtime-free circuit migrations.
7. Work out degradation modes and incident playbooks.
20) Glossary
Backpressure - input load control (credits/tokens/limits).
DLQ - "dead queue" for problem messages.
CRDT - data structures with conflict resolution without coordination.
Finality - irreversibility of the event/state.
Exactly-once effect - repeat-safe result over at-least-once delivery.
Watermark - Mark processing progress for late events.
Outlier-ejection - exclusion of degraded instances from the pool.
Bottom line: data flows between nodes are not just a "queue and listener," but a systemic discipline of order, finalization, idempotency, security and observability. Standard partitioning keys, QoS/quotas, strict schemes and SLOs, together with degradation modes and playbooks, give the ecosystem stable data transmission channels at scale and under audit.