Message brokers
1) Why message brokers
The broker unties producers and consumers by time/speed/reliability:- Peak buffering and smoothing, backprescher.
- Read/write scaling independently.
- Observability and replay of events.
- Architectural patterns: event-driven, CQRS, event sourcing, outbox/inbox.
2) Basic models and terms
2. 1 Kafka (log model)
Topic → parties (ordered logs) → offsets from consumers.
Consumer Group: Read parallelism, party balancing.
Retention by time/volume; key compaction.
Semantics: minimum - at-least-once, with settings - effectively exactly-once (idempotent producers + transactions).
Order: Guaranteed within the party.
2. 2 NATS (subjects, low latency)
Subject (theme) with hierarchy and wildcards ('foo.', 'foo. >`).
Modes: pub/sub, queue-groups (fan-out with work distribution), request-reply (fast RPC).
Core NATS - ephemeral, ultra-low latency; JetStream - persistence/retention/repetitions.
Order: best-effort, no strong global guarantee; with JetStream - ordering on the stream, but rare reordering in case of failures is possible.
3) Delivery semantics and consistency
Idempotence and dedup are the responsibility of the application/bruise, even when "exactly-once" in Kafka.
4) Order, partitioning and keys
Kafka
The choice of message key determines the party → strong local order.
Ключи: `aggregate_id`, `tenant_id`, `order_id`. Avoid hot keys.
Balance: N parties ≈ reading parallelism level.
NATS
At Core, the queue-group does the balance.
JetStream Stream is shuffled by subjects; emphasis on broad fan out/fan in with low latency.
5) Retention, replay and compaction
Kafka
Retention: `retention. ms/bytes`.
Compaction: stores the "last value by key" (suitable for snapshots/caches/sagas).
Replay: Any consummer can "rewind" offsets.
JetStream
Streams: file/memo backends, storage policy by time/bytes/number of messages.
Consumers: pull/push, durable/ephemeral, filter by subject prefixes.
Replay: redelivery or reading from the beginning/offset-like (sequence).
6) Transactions, outbox and consistency
Kafka
Idempotent Producer (`enable. idempotence = true '): protection against duplicates.
Transactions: atomic recording of several batches + commit consumer-offsets → read-process-write pattern without "holes."
Transactional Outbox: a record of a business event and an outbox line in one database transaction, the worker publishes in Kafka.
NATS
There are no "cross-stream" transactions as in Kafka; use outbox/inbox and idempotent consumers (keys, deadstore).
7) RPC and request-response
Kafka is inconvenient for RPC (high overhead, order/answers are more difficult). Use asynchronous commands/events.
NATS: ideal for request-reply (milliseconds, correlation, timeouts).
go resp, err:= nc. Request("profile. get", []byte(`{"id":42}`), 200time. Millisecond)
8) Operation and topologies
8. 1 Kafka
Cluster: brokers + ZooKeeper (before old versions) or KRaft (new metadata).
Replication - Zone RF≥3, ISR/Controllers
Multi-region: MirrorMaker 2/Cluster Linking; asset-liability/asset-asset with conflict-policies.
Disk/network capacity: read from'throughput × retention × replicas'.
8. 2 NATS
Cluster: many nodes, super-cluster (geo-distribution), leafnodes for peripherals/edge.
JetStream: placement of streams by node sets (placement), replication (R = 1.. 5).
WAN: predictably low latency, easy federation.
9) Safety
Kafka
TLS (mTLS), SASL: SCRAM, OAuthBearer.
ACL on topics/groups/transactions.
Encryption "at rest" (OS/disks) + network policies.
NATS
nkey/JWT identities, operator-accounts, per-subject ACL.
mTLS between nodes and clients.
Tenant isolation (accounts) + limits.
10) Observability and performance metrics
Kafka
Брокер: `BytesIn/Out`, `RequestQueue`, `UnderReplicatedPartitions`, GC/FS stats.
Topic/part: 'logEndOffset', consumer lag (critical).
Producer/consummer: retrai, 'batch. size`, `linger. ms`, `fetch. min. bytes', errors.
Tools: JMX, Cruise Control (re-balance), Schema Registry.
NATS/JetStream
Server: conn/msgs/sec, RTT, CPU/mem, slow consumer detection.
JetStream: per stream/consumer — lag, redeliveries, acks, storage bytes.
Monitoring: built-in endpoint, nsc/adm-CLI, dashboards.
11) Performance and tuning
Kafka
Big butches and 'linger. ms'improve throughput and compress p99.
Compression (lz4/zstd) saves network/disk.
num. partitions by the number of consumers/cores, but do not overhead.
Drives: NVMe preferred, XFS/EXT4 with 'noatime'.
NATS
Small messages, many connections are the norm; keep queue groups "wide."
JetStream: tune `max_ack_pending`, pull vs push, size of batches.
Backpressure: `FlowControl`, `IdleHeartbeat`, server-side limits.
12) Integration patterns
Outbox/Inbox (in both Kafka and NATS).
SAGA: event orchestration; grandfather by 'saga _ id + step'.
Change Data Capture (CDC): Debezium → Kafka; in NATS - the "publisher from database triggers/logs" pattern.
Stream processing: Kafka Streams/Flink/Spark; in NATS - third-party processors/features, JetStream consumers.
Dead Letter Queue (DLQ) and retry policies (exponential backoff + jitter).
13) Configuration examples
13. 1 Kafka: Making a Topic and Producer
bash kafka-topics. sh --create --topic orders \
--partitions 12 --replication-factor 3 \
--config cleanup. policy=delete \
--config retention. ms=604800000 # 7d
properties producer. properties bootstrap. servers=broker:9092 acks=all enable. idempotence=true batch. size=65536 linger. ms=10 compression. type=zstd
13. 2 Kafka Streams: idempotent machining (sketch)
java builder. <String, Order>stream("orders")
.groupByKey()
.aggregate(/... /)
.toStream()
.to("orders-agg");
13. 3 NATS JetStream: stream + consumer (nats CLI)
bash nats stream add ORDERS --subjects "orders. " --retention limits \
--storage file --max-bytes 100GB --replicas 3 --discard old
nats consumer add ORDERS ORDERS-WORKERS --filter "orders. created" \
--deliver pull --ack explicit --max-deliver 6 --backoff "1s,5s,30s,2m"
13. 4 NATS Request-Reply (Go)
go nc, _:= nats. Connect("tls://nats:4222", nats. Secure(tlsConf))
sub, _:= nc. QueueSubscribe("calc. sum", "workers", func(m nats. Msg) {
//... process...
m. Respond([]byte("42"))
})
14) Kafka vs NATS pick: A quick guide
We need replay, long-term retention, compression, heavy stream processes → Kafka.
Need fast RPC, fan-out/fan-in with microlatency, simple operation, edge/IoT → NATS (Core).
We need persistence + fan-out, but without the heavy "log" platform → NATS JetStream.
Strict key and transaction order → Kafka.
15) Capacity planning (simplified)
Kafka
1. Throughput: 'inbound_MBps × RF × retention_days × 86400' → disks.
2. Batches: 'target _ concurrency' × stock 1. 5–2×.
3. Network: p99 + replication + producer compression.
NATS/JetStream
1. Messages/sec and average → throughput.
2. Retention×replicas → storage.
3. Consumers limits (ack-pending, redeliveries), CPU for serialization.
16) Safe operation: checklist
- TLS/mTLS enabled, secrets rotated.
- ACL/accounts/quotas (per-tenant).
- Idempotency on consumers, DLQ, and jitter retreats.
- Lag/throughput/error monitoring; alerts on URP (Kafka), redelivery storm (NATS).
- Capacity dashboards: partitions, storage, p99.
- Node/zone failure tests, game-days, replay/backfill.
- Schema Registry/JSON Schema keys are documented.
- Retention/compression/TTL policies are aligned with compliance.
- Broker/client versions are updated regularly; wire protocol compatibility verified.
17) Anti-patterns
Hot key (all events of the same ID) → one "boiling" stream. Shardy/buffer.
Retreats without idempotency → double effects.
Huge messages (MB-tens) → GC fragmentation/pauses. Store payload in object, send links.
Mixing RPC and streaming in Kafka → a complex life cycle/order.
JetStream as "long-term DWH" → off-label; store for a long time in object/column beds.
No DLQ → "poisonous" messages spin endlessly.
Forgotten retention → disks are full, cluster stop.
18) FAQ
Q: Can I do "exactly-once" at the end of the pipeline?
A: In practice - effectively yes: Kafka (idempotent producer + transactions) and idempotent sinks (key, upsert). In NATS - through idempotence/dedup in the application.
Q: What to choose for a million small RPCs/sec?
A: NATS Core: Microlatency, request-reply, light connections and queue-groups.
Q: Need compaction and snapshots of fortune?
A: Kafka с `cleanup. policy = compact ', key = aggregate/resource.
Q: How to deal with lag?
A: Increase the number of batches/workers, reduce processing time, batch and prefetch, optimize deserialization, vertically strengthen brokers/drives.
Q: Multi-region and DR?
A: Kafka - MirrorMaker 2/Cluster Linking, asset-liability with RPO≈sekundy. NATS — supercluster/leafnodes; JetStream mirroring/replicas by zone.
19) Totals
Kafka and NATS close different modes: Kafka - durable event logs, high throughput, transactionality and replay; NATS is an ultralight bus for low latency, RPC and simple fan-out, with JetStream for persistence. Make your choice from delivery semantics, order and retention, latency and operating costs. Design keys/parties, retention, DLQ and observability - and your event architecture will be predictable, scalable and reliable.