Merge data from different circuits
(Section: Ecosystem and Network)
1) Why do you need a merger
Cross-chain merge combines events/states from different chains, bridges and services into a single consistent data model for financial reporting, analytics, anti-fraud, observability and product scenarios. Objectives:- A single source of truth (canonical facts) in the presence of motley logs.
- Resistance to reorg and delays: correct finalization and recalculation.
- Comparability of metrics between networks and assets.
- Transparent lineage and quality control for audits and regulators.
2) Data sources and classes
1. Onchain: blocks, transactions, contract logs, headers, states.
2. Bridges/trailers: applications, receipts, evidence, finalization statuses.
3. L2/DA layers: batches, publications, proofs, challenge windows.
4. PSP/KYC/KYB/AML: payment statuses, checks, sanction hits.
5. Product events: onboarding, deposits/payments, gaming and behavioral events.
6. Directories: networks, assets, decimals, chainId, addresses, SDK versions.
For each source, the owner, scheme, update log, finalization window, evidence format and SLO are recorded.
3) Fusion pipeline architecture
Ingest (agents/indexers/webhook) → Raw/Bronze (unchangeable raw materials) → Clean/Silver (normalization and dedup) → Merge/Core/Gold (canonical facts and connections) → Marts (finance/product/risk/operating system) → Serve (OLAP/API/search).
Key properties: idempotency, schema versioning, replay/backfill, late data handling.
4) Canonical schemes (simplified)
4. 1 Events (YAML)
yaml event:
id: uuid observed_at: timestamp # when saw event_at: timestamp # when happened (by source)
chain_id: string # 'eth-mainnet' 'polygon'...
block_height: long tx_hash: string log_index: int type: string # transfer bridge. lock bridge. mint...
status: string # observed confirmed finalized invalid src: string # address/peer-id/org _ id dst: string asset: string # canonical character (USDC)
amount: decimal usd_value: decimal # normalization at the rate on the meta observed_at: object # gas, fee, contract, sdk_version...
idempotency_key: string # chainId block tx logIndex type proof_ref: string # proof/anchor reference
4. 2 Translations and Bridges (SQL)
sql
CREATE TABLE bridge_transfers (
id TEXT PRIMARY KEY,
src_chain TEXT, dst_chain TEXT,
asset TEXT, amount NUMERIC,
created_at TIMESTAMPTZ,
finalized_at TIMESTAMPTZ,
status TEXT, -- requested inflight finalized failed reversed src_tx TEXT, dst_tx TEXT,
proof_ref TEXT, meta JSONB
);
4. 3 Asset/Network Directory (YAML)
yaml catalog:
assets:
- symbol: USDC decimals: { eth-mainnet: 6, polygon: 6 }
contracts: { eth-mainnet: "0xA0b8...", polygon: "0x2791..." }
networks:
- id: eth-mainnet k_confirmations: 12
- id: polygon k_confirmations: 256
5) Finalization, reorgs and statuses
Состояния: `observed → confirmed(K) → finalized → invalidated(reorg)` (+ `challenged` для optimistic).
Politicians:- K-confirmations by network/asset/risk.
- Delayed Finalization for large amounts.
- Reorg handling: automatic disability and replay.
- Proof coverage: percentage of records with rods/anchors ≥ target SLO.
6) Normalization of time and currencies
Time: all timestampts in UTC, store 'observed _ at' and 'event _ at'.
FX/asset prices: conversion of'usd _ value' at the rate of 'observed _ at' (or 'event _ at' - for reporting, defined by policy).
Decimals/scale: Strict canonization of quantities for comparability.
Time zones in reports: resolved during selection (showcase), not in core.
7) Identity and deduplication
Deduplication base key:- `idempotency_key = chainId|block_height|tx_hash|log_index|type`
- Duplicates from multiple indexers - upsert by idempotency_key.
- In case of a payload conflict, the policy of truth (source priority/version/time) is triggered.
- The deduplication window is stored ≥ 48-72 hours for "wandering" repetitions.
8) Entity Resolution
Addresses → actors: wallet/contract → user/organization/role.
Cross-chain links: hard-link (signature/kyc), soft-link (behavior/graph).
Pseudonymization: stable PID/ORG_ID; The PII is stored by the data controller.
9) Merger rules and priorities (Policy)
1. The source of truth on the fact of translation is the online event 'finalized' + proof.
2. The source of truth for aggregates is the core of the'transfers' bridge _ transfers' table, not the "raw material."
3. Time conflict (event_at vs observed_at) - by report policy (finance - event_at; operating system - observed_at).
4. Amount/Asset Conflict - Halt the hold and quarantine until the asset catalog is reconciled.
5. Bridge bundles - both side receipts (src/dst) + receipt pairing are required.
10) Pseudo queries and algorithms
10. 1 Rolling up events into a canonical "operation"
sql
WITH base AS (
SELECT e.,
CONCAT(e. chain_id,' ',e. block_height,' ',e. tx_hash,' ',e. log_index,' ',e. type) AS idem
FROM raw_events e
)
INSERT INTO core_events AS c (id, observed_at, event_at, chain_id, block_height,
tx_hash, log_index, type, status, src, dst, asset, amount, usd_value, meta, idempotency_key, proof_ref)
SELECT gen_random_uuid(), observed_at, event_at, chain_id, block_height,
tx_hash, log_index, type, status, src, dst, asset, amount, usd_value, meta, idem, proof_ref
FROM base
ON CONFLICT (idempotency_key) DO UPDATE
SET status = EXCLUDED. status,
usd_value = COALESCE(EXCLUDED. usd_value, core_events. usd_value),
proof_ref = COALESCE(EXCLUDED. proof_ref, core_events. proof_ref),
meta = core_events. meta EXCLUDED. meta;
10. 2 Match of bridge pairs (istochnik↔tsel)
sql
INSERT INTO bridge_transfers (id, src_chain, dst_chain, asset, amount, created_at, status, src_tx, proof_ref)
SELECT
CONCAT('br:', e. tx_hash) AS id,
e. chain_id, b. dst_chain, e. asset, e. amount, e. event_at, 'inflight', e. tx_hash, e. proof_ref
FROM core_events e
JOIN bridge_book b ON e. type='bridge. lock' AND e. asset=b. asset AND e. chain_id=b. src_chain
ON CONFLICT (id) DO NOTHING;
UPDATE bridge_transfers bt
SET finalized_at = e. event_at,
dst_tx = e. tx_hash,
status = 'finalized'
FROM core_events e
WHERE e. type='bridge. mint'
AND bt. status='inflight'
AND bt. asset=e. asset
AND bt. src_chain=bridge_book. src_chain
AND bt. dst_chain=bridge_book. dst_chain
AND abs(e. amount - bt. amount) < 1e-9;
10. 3 Reorg processing
sql
UPDATE core_events
SET status='invalidated'
WHERE chain_id=$1 AND block_height BETWEEN $2 AND $3
AND status IN ('observed','confirmed','finalized');
-- Reassembly of aggregates (example)
CALL recompute_materialized_views($1, $2, $3);
11) Circuit and evolution management
Versioning: 'schema _ version' in the dataset header, migrations are logged.
Compatibility policy is'BACKWARD 'for events (add fields only).
Data Contracts with sources: tests of contracts in CI, linters of schemes.
12) Data quality: SLI/SLO
SLI (example):- Freshness p95: lag ingest→Gold (min).
- Completion% is the percentage of records that have reached'Finalized 'within the window.
- Correction%: valid schemes/signatures/proofs.
- Proof Coverage%: share of canonical records with proof/anchors.
- Dedup Efficiency: Proportion of takes absorbed idempotently.
- Reorg Handling Success%: correctly disabled and replays.
SLO (landmarks): Freshness ≤ 3 min (stream )/15 min (batch); Completeness ≥ 99. 7%; Correctness ≥ 99. 9%; Proof Coverage ≥ 99. 0%; Reorg Success ≥ 99. 9%; Merge MTTR (incident) ≤ 30 min.
13) Dashboards (layouts)
Merge Ops (реал-тайм/час): Freshness, Queue lag, Dedup rate, Finalized %, Reorg spikes, Error-budget burn.
Proof & Finality: proof coverage, p95 finality per chain, challenge/reorg события.
Catalog Health: discrepancies between asset mappings, decimals, SDK versions.
Quality & Drift: completeness/correctness, schema drift, late data.
Finance Lens: GTV, Net Flow, TVL by circuit/bridge ('finalized' only).
14) Configurations (YAML)
Finalization windows
yaml finality:
eth-mainnet: { k: 12, delayed_for_usd_gt: 100000 }
polygon: { k: 256 }
optimistic-L2:
k: 0 challenge_minutes: 20 delayed_for_usd_gt: 50000
Merge and Priority Policy
yaml merge_policy:
source_priority: [onchain, bridge, psp, product]
conflict:
time: { prefer: "event_at" }
amount: { action: "quarantine" }
proof_required_for: ["bridge_transfers", "payouts"]
quarantine_topics: ["asset_mismatch", "decimals_mismatch", "time_skew_gt_5m"]
Idempotence/dedup
yaml dedup:
key_template: "${chain_id} ${block_height} ${tx_hash} ${log_index} ${type}"
ttl_hours: 72
15) Privacy and compliance
PII minimization: PID/ORG_ID, PII ban in metrics/labels.
Data residency: region segregation (EU/ROW), encryption "at rest/on the road."
Right to delete: tombstone/redaction events with provable application.
Audit: immutable logs, hash anchoring, role access checking.
16) Operating Regulations
Daily: proof coverage reconciliation, chain finalization, bridge registry and config drift.
Weekly: revision of the asset catalog/decimals, correctness of FX normalization.
Monthly: reorg/replay tests, SLO check and performance stress test.
Change Management: timelock for merge policy changes, decision log.
17) Playbook incidents
A. Desynchron assets/decimals
Stop at the corresponding assets, roll back the catalog, recalculate the windows, report ≤ 24 hours.
B. The fall of Proof Coverage
Relaunch Merclization/Anchoring, Log Up, Manual Sampling of 100 Cases, Report.
C. Reorg/Challenge Peaks
Enlarge'k '/dispute window, enable delayed finalization for large amounts, notify interested parties.
D. Explosion of takes/repeats
Tighten TTL dedup/key, limit "noisy" sources, enable quarantine-circuit.
E. Time skew
NTP/PTP synchronization, window recalculation, temporary'prefer: observed_at' policy shift.
18) Implementation checklist
1. Capture sources, finalization windows and evidence.
2. Implement canonical event schema and idempotency key.
3. Configure dedup and merge policy with quarantine contour.
4. Raise asset/network register and FX normalization.
5. Implement replay/backfill and late data processing.
6. Define SLI/SLO and quality dashboards.
7. Run regular anchoring and audit logs.
8. Conduct a pilot with reorg/bridge delay simulations and capture the MTTR.
19) Glossary
Finality - irreversibility of the state/event.
Reorg - reassembly of the chain with cancellation of part of the blocks.
Idempotency - resistance to redelivery.
Proof Coverage - the proportion of records with valid evidence.
Entity Resolution - single entity address/account mapping.
Delayed Finalization - deferred acceptance into aggregates for high-risk amounts.
Quarantine is an isolated stream for conflict/suspicious records.
Bottom line: correct merging of inter-chain data is a manageable discipline: canonical scheme, finalization and proof, strict idempotence, transparent merging policy, and observable quality. By following this framework, the ecosystem receives a single, verifiable and sustainable layer of data - the basis for auditing, analytics and safe scaling of products.