Message order guarantees

1) What is "order" and why is it needed

The order of messages is a "what should be processed before" relationship for events of one entity (order, user, wallet) or for the entire stream. It is important for invariants: "status A before B," "balance before write-off," "version n before n + 1."

In distributed systems, the global total order is expensive and rarely needed; a local per-key order is usually sufficient.

2) Types of guarantees of order

1. Per-partition (local order in the log section) - Kafka: the order within the party is preserved, between parties - no.
2. Per-key (ordering key/message group) - all messages with one key are routed into one "thread" of processing (Kafka key, SQS FIFO MessageGroupId, Pub/Sub ordering key).
3. Global total order - the entire system sees a single order (distributed journal/sequencer). Expensive, degrades availability and throughput.
4. Causal order - "event B after A if B observes effect A." Reachable through metadata (versions, Lamport-times/vector clocks) without a global sequencer.
5. Best-effort order - the broker tries to maintain order, but in case of failures, permutations are possible (often in NATS Core, RabbitMQ with several consumers).

3) Where order breaks down

Parallel consumers of the same queue (RabbitMQ: several consumers per queue → interleaving).
Retrays/re-deliveries (at-least-once), 'ack' timeouts, re-queuing.
Rebalance/feilover (Kafka: party/leader move).
DLQ/reprocessing - the "poisonous" message goes to DLQ, the next ones go further → a logical break.
Multi-region and replication - different delays → misalignment.

4) Key order design

The key forms the "ordering unit." Recommendations:

Use natural keys: 'order _ id', 'wallet _ id', 'aggregate _ id'.
Watch for "hot keys" - one key can "block" the flow (head-of-line blocking). If necessary, split the key: 'order _ id # shard (0.. k-1)' with deterministic reconstruction of the order on the sink.
In Kafka - one key → one part, the order will be preserved within the key.

Example (Kafka, Java):

java producer. send(new ProducerRecord<>("orders", orderId, eventBytes));

(Key = 'orderId' guarantees local order.)

5) "Order vs. Bandwidth"

Strong guarantees often conflict with throughput and availability:

One consumer per queue maintains order but reduces concurrency.
At-least-once + concurrency improves performance, but requires idempotency and/or reordering.
Global order adds hop to the sequencer → ↑latentnost and risk of failure.

Compromise: per-key order, parallelism = number of parties/groups, + idempotent bruises.

6) Control of order in specific brokers

Kafka

Order within the party.
Observe'max. in. flight. requests. per. connection ≤ 5` с `enable. idempotence = true 'so that the producer's retrays do not change order.
Consumer group: one party → one worker at a time. Repeated deliveries are possible → keep sequence/version in the business layer.
Read-process-write transactions maintain read/write/crumb offset consistency, but do not create a global order.

Production minimum (producer. properties):

properties enable. idempotence=true acks=all retries=2147483647 max. in. flight. requests. per. connection=5

RabbitMQ (AMQP)

The order is guaranteed in one queue for one consummer. With several consumers of messages can come "mixed."

For order: one consummer or prefetch = 1 + ack when finished. For concurrency, separate queues by keys (sharding exchanges/consistent-hash exchange).

NATS / JetStream

NATS Core - best-effort, low latency, order may be disturbed.
JetStream: ordering within stream/sequence; during redeliveries, rearrangements on the console are possible → use sequence and recovery buffer.

SQS FIFO

Exactly-once processing (effectively, due to deduplication) and order within MessageGroupId. Concurrency - the number of groups within a head-of-line group.

Google Pub/Sub

Ordering key gives the order within the key; in case of errors, publishing is blocked until restored - watch out for backpressure.

7) Patterns of preserving and restoring order

7. 1 Sequence/versioning

Each event carries a'seq '/' version'. Consummer:

takes an event only if'seq = last_seq + 1 ';
otherwise - puts in the wait buffer before the arrival of the missing ('last _ seq + 1').

Pseudocode:

pseudo if seq == last+1: apply(); last++
else if seq> last + 1: buffer [seq] = ev else: skip//take/repeat

7. 2 Buffers and windows (stream processing)

Time-window + watermark: we accept out-of-order within the window, according to watermark we "close" the window and arrange it.
Allowed lateness: channel for late arrivals (recompute/ignore).

7. 3 Sticky-routing by key

The hash (key)% shards hash routing sends all key events to a single worker.
In Kubernetes - maintain a session (sticky) at the queue/sherds level, not on the L4 HTTP balancer.

7. 4 Actor-model/" one stream per key "

For critical aggregates (wallet): the actor processes sequentially, the rest of the parallelism - the number of actors.

7. 5 Idempotence + reordering

Even with the restoration of order, repetitions are possible. Combine UPSERT by key + version and Inbox (see Exactly-once vs At-least-once).

8) Work with "poisonous" messages (poison pills)

Maintaining order is faced with the task: "how to live if one message is not processed?"

Strict order: key flow blocking (SQS FIFO: entire group). The solution is by-key DLQ: we transfer only the problem key/group to a separate queue/manual parsing.
Flexible order: we allow skipping/compensation; we log and continue (not for finance/critical aggregates).
Retray policy: limited 'max-deliver' + backoff + avidempotent effects.

9) Multi-region and global systems

Cluster-linking/replication (Kafka) does not guarantee an interregional global order. Give priority to local per-key order and idempotent bruises.
For truly-global order, use a sequencer (central log), but this affects availability (CAP: minus A for network breaks).
Alternative: causal order + CRDT for some domains (counters, sets) - no strict order is needed.

10) Observability of order

Метрики: `out_of_order_total`, `reordered_in_window_total`, `late_events_total`, `buffer_size_current`, `blocked_keys_total`, `fifo_group_backlog`.
Логи: `key`, `seq`, `expected_seq`, `action=apply|buffer|skip|dlq`.
Tracing: attributes of spans' order _ key ',' partition ',' offset ',' seq ', references to retrays.

11) Anti-patterns

One queue + many consumers without sharding on the key - the order breaks down immediately.
Retrai through the re-public in the same queue without idempotency - doubles + out-of-order.
The global "just in case" order is an explosion of latency and value with no real benefit.
SQS FIFO one group for all - full head-of-line. Use MessageGroupId per key.
Ignoring "hot keys" - one "wallet" slows everything down; divide the key into sub-keys where possible.
Mixing critical and bulk streams in the same queue/group - mutual influence and loss of order.

12) Implementation checklist

Per-key/per-partition/causal/global?
Sequencing key and anti-hot key strategy designed.
Router configured: partitioning/MessageGroupId/ordering key.
Consoles are isolated by keys (sticky-routing, shard-workers).
Idempotency and/or Inbox/UPSERT on bruises are included.
Implemented sequence/version and reordering buffer (if necessary).
DLQ by key policy and backoff retrays.
Out-of-order, blocked_keys, late_events order and alert metrics.
Game day: rebalance, node loss, poisonous message, network delays.
Documentation: order invariants, window bounds, impact on SLAs.

13) Configuration examples

13. 1 Kafka Consumer (order violation minimization)

properties max. poll. records=500 enable. auto. commit = false # commit after successful butch isolation processing. level=read_committed

💡 Make sure that one worker processes entire parties, and your operations are idempotent.

13. 2 RabbitMQ (order by price of concurrency)

One consumer per queue + 'basic. qos(prefetch=1)`

For concurrency - several queues and hash-exchange:

bash rabbitmq-plugins enable rabbitmq_consistent_hash_exchange publish with header/key for consistent hash

13. 3 SQS FIFO

Set MessageGroupId = key. Concurrency = number of groups.
MessageDeduplicationId for protection against duplicates (in the provider window).

13. 4 NATS JetStream (ordered consumer, sketch)

bash nats consumer add ORDERS ORD-KEY-42 --filter "orders. 42. >" --deliver pull \
--ack explicit --max-deliver 6

key> Monitor the'sequence' and reordering buffer in the application.

14) FAQ

Q: Do I need a global order?
A: Almost never. Almost always enough per-key. The global order is expensive and hits affordability.

Q: What about the "poisonous" message under strict order?
A: Transfer only his key/group to DLQ, the rest - continue.

Q: Can you get order and scale at the same time?
A: Yes, key order + many keys/parts + idempotent operations and reordering buffers where necessary.

Q: Which is more important: order or exactly-once?
A: For most domains - key order + effectively exactly-once effects (idempotency/UPSERT). Transport can be at-least-once.

15) Totals

Order is a local guarantee around the business key, not an expensive global discipline. Design keys and parties, limit hot keys, use idempotence and, where necessary, sequence + reordering buffer. Watch out for out-of-order and blocked keys metrics, test crashes - and you get predictable processing without sacrificing performance or availability.

Message order guarantees

(Key = 'orderId' guarantees local order.)

RabbitMQ (AMQP)

NATS / JetStream

SQS FIFO

Google Pub/Sub

Get in Touch

Quick Contact

The video will be updated soon

We are currently very busy with projects