Exactly-once vs At-least-once
1) Why even discuss semantics
Delivery semantics determine how often the recipient will see the message when crashes and retraces:- At-most-once - without repetition, but loss is possible (rarely acceptable).
- At-least-once - do not lose, but duplicates are possible (default of most brokers/queues).
- Exactly-once - each message is processed exactly once in terms of the observed effect.
The key truth: in a distributed world without global transactions and synchronous consistency, a "clean" end-to-end exactly-once is unattainable. We build effectively exactly-once: we allow repetitions on transport, but we make the processing idempotent so that the observed effect is "as if once."
2) Failure pattern and where duplicates occur
Replays appear due to:- Losses ack/commit (producer/broker/consummer "did not hear" confirmation).
- Re-election of leaders/replicas, recoveries after network breaks.
- Timeouts/retreats in any areas (kliyent→broker→konsyumer→sink).
Consequence: you cannot rely on the "uniqueness of delivery" of transport. Manage effects: writing to the database, debiting money, sending a letter, etc.
3) Exactly-once in providers and what it really is
3. 1 Kafka
Gives bricks:- Idempotent Producer (`enable. idempotence = true ') - prevents duplicates on the producer's side when retracting.
- Transactions - atomically publish messages in several batches and commit consumption offsets (read-process-write pattern without "gaps").
- Compaction - stores the last value by key.
But the "end of the chain" (sink: DB/payment/mail) still requires idempotency. Otherwise, the handler's double will cause an effect double.
3. 2 NATS / Rabbit / SQS
The default is at-least-once with ack/redelivery. Exactly-once is achieved at the application level: keys, deadstore, upsert.
Conclusion: Exactly-once transport ≠ exactly-once effect. The latter is done in the handler.
4) How to build effectively exactly-once over at-least-once
4. 1 Idempotency key
Each command/event carries a natural key: 'payment _ id', 'order _ id # step', 'saga _ id # n'. Handler:- Checks "already seen?" - Dedup-stor (Redis/DB) with TTL/Retsch.
- If you saw, repeats the previously calculated result or does no-op.
lua
-- SET key if not exists; expires in 24h local ok = redis.call("SET", KEYS[1], ARGV[1], "NX", "EX", 86400)
if ok then return "PROCESS" else return "SKIP" end
4. 2 Upsert in base (idempotent cink)
Entries are made via UPSERT/ON CONFLICT with version/amount checking.
PostgreSQL:sql
INSERT INTO payments(id, status, amount, updated_at)
VALUES ($1, $2, $3, now())
ON CONFLICT (id) DO UPDATE
SET status = EXCLUDED.status,
updated_at = now()
WHERE payments.status <> EXCLUDED.status;
4. 3 Transactional Outbox/Inbox
Outbox: a business transaction and an event-to-publish entry occur in the same database transaction. The background publisher reads the outbox and sends to the broker → there is no discrepancy between the state and the event.
Inbox: for incoming commands, save 'message _ id' and the result before execution; reprocessing sees the record and does not repeat side effects.
4. 4 Consistent chain processing (read→process→write)
Kafka: the transaction "read the offset → wrote down the results of the → commit" in one atomic block.
Without transactions: "first write down the result/Inbox, then ack"; with crash, the duplicate will see Inbox and end with no-op.
4. 5 SAGA/offsets
When idempotency is impossible (the external provider wrote off the money), we use compensating operations (refund/void) and idempotent external APIs (repeated 'POST' with the same 'Idempotency-Key' gives the same result).
5) When at-least-once is enough
Updates of caches/materialized views with key-based compaction.
Counters/metrics where re-increment is acceptable (or store deltas with version).
Notifications where the secondary letter is not critical (it is better to put a key anyway).
Rule: if the double does not change the business meaning or we can easily find → at-least-once + partial protection.
6) Performance and cost
Exactly-once (even "effectively") costs more: additional records (Inbox/Outbox), storing keys, transactions, diagnostics are more difficult.
At-least-once is cheaper/simpler, better at throughput/p99.
Evaluate: price of double × probability of double vs cost of protection.
7) Sample configurations and code
7. 1 Kafka producer (idempotence + transactions)
properties enable.idempotence=true acks=all retries=INT_MAX max.in.flight.requests.per.connection=5 transactional.id=orders-writer-1
java producer.initTransactions();
producer.beginTransaction();
producer.send(recordA);
producer.send(recordB);
// также можно atomically commit consumer offsets producer.commitTransaction();
7. 2 Inbox console (pseudo code)
pseudo if (inbox.exists(msg.id)) return inbox.result(msg.id)
begin tx if!inbox.insert(msg.id) then return inbox.result(msg.id)
result = handle(msg)
sink.upsert(result) # идемпотентный синк inbox.set_result(msg.id, result)
commit ack(msg)
7. 3 HTTP Idempotency-Key (external APIs)
POST /payments
Idempotency-Key: 7f1c-42-...
Body: { "payment_id": "p-123", "amount": 10.00 }
Repeated POST with the same key → the same result/status.
8) Observability and metrics
'duplicate _ attempts _ total '- how many times a double was caught (according to Inbox/Redis).
'idempotency _ hit _ rate '- the proportion of repetitions "saved" by idempotency.
'txn _ abort _ rate '(Kafka/DB) - the share of rollbacks.
'outbox _ backlog '- publication lag.
'exactly _ once _ path _ latency {p95, p99} 'vs' at _ least _ once _ path _ latency '- overhead.
Audit logs: a bunch of 'message _ id', 'idempotency _ key', 'saga _ id', 'attempt'.
9) Test playbooks (Game Days)
Send replay: producer retrays with artificial timeouts.
Crash between "sink and ack": Make sure Inbox/Upsert prevent a double.
Re-delivery: increase redelivery in broker; check dedup.
Idempotency of external APIs: repeated POST with the same key is the same answer.
Lead Change/Network Break: Check Kafka Transactions/Consumers Behavior.
10) Anti-patterns
Rely on transport: "we have Kafka with exactly-once, so you can without keys" - no.
No-op ack prior to recording: ackled but sink dropped → loss.
Lack of DLQ/jitter retreats: endless replays and storm.
Random UUIDs instead of natural keys: nothing to deduplicate.
Mixing Inbox/Outbox with index-free production tables: hot locks and p99 tails.
Business transactions without idempotent API at external providers.
11) Selection checklist
1. Double price (money/legal/UX) vs protection price (latency/complexity/cost).
2. Is there a natural event/operation key? If not, come up with a stable one.
3. Sink supports Upsert/versioning? Otherwise - Inbox + compensation.
4. Do you need global transactions? If not, segment into SAGA.
5. Need a replay/long retention? Kafka + Outbox. Need fast RPC/low latency? NATS + Idempotency-Key.
6. Multi-tenancy and quotas: key/space isolation.
7. Observability: idempotency and backlog metrics are included.
12) FAQ
Q: Is it possible to achieve "mathematical" exactly-once end-to-end?
A: Only in narrow scenarios with one consistent store and transactions all the way. In the general case, no; use effectively exactly-once through idempotency.
Q: Which is faster?
A: At-least-once. Exactly-once adds transactions/key storage → above p99 and cost.
Q: Where to store idempotence keys?
A: Quick stop (Redis) with TTL, or Inbox table (PK = message _ id). For payments - longer (days/weeks).
Q: How to choose TTL dedup keys?
A: Minimum = maximum re-delivery time + operational margin (typically 24-72 hours). For finance - more.
Q: Do I need a key if I have compaction by key in Kafka?
A: Yes. Compaction will reduce storage, but will not make your sync idempotent.
13) Totals
At-least-once - basic, reliable transport semantics.
Exactly-once as a business effect is achieved at the processor level: Idempotency-Key, Inbox/Outbox, Upsert/versions, SAGA/compensation.
The choice is a compromise of cost ↔ risk of duplication ↔ ease of operation. Design natural keys, make bruises idempotent, add observability and regularly play game days - then your pipelines will be predictable and safe even in a storm of retras and failures.