GH GambleHub

Webhooks: replays and acknowledgements

1) Basic delivery model

At-least-once (default) - The event will be delivered ≥1 times. Exactly-once guarantees are achieved by receiver idempotency.
Acknowledgement (ACK): only any 2xx (usually 200/204) from the recipient means success. Everything else is interpreted as a failure and leads to repetition.
Fast ACK: Respond 2xx after placing the event in turn, not after full business processing.

2) Event format and mandatory headings

Payload (example)

json
{
"id": "evt_01HXYZ",
"type": "order. created",
"occurred_at": "2025-11-03T18:10:12Z",
"sequence": 128374,
"source": "orders",
"data": { "order_id": "o_123", "amount": "49. 90", "currency": "EUR" },
"schema_version": 1
}

Sender Headers

'X-Webhook-Id: evt_01HXYZ' - unique event ID (use for deduplication).
'X-Webhook-Seq: 128374 '- monotone sequence (by subscription/theme).
`X-Signature: sha256=<base64(hmac_sha256(body, secret))>` — HMAC-подпись.
'X-Retry: 0,1,2... 'is the try number.
'X-Webhook-Version: 1 '- contract versioning.
(optional) 'Traceparent' - trace correlation.

Response from recipient

2xx - successfully accepted (there will be no further repetitions for this'id ').
410 Gone - endpoint deleted/inactive → sender terminates retries and deactivates subscription.
429/5xx/timeout - the sender repeats according to the retray policy.

3) Retries policy

Recommended backoff ladder (+ jitter)

'1s, 3s, 10s, 30s, 2m, 10m, 30m, 2h, 6h, 24h '(stop after the limit, for example 48-72 hours).

Rules:
  • Exponential backoff + random jitter (± 20-30%) to avoid "herd effect."
  • Quorum of errors for temporary failures (for example, retry if 5xx or network timeout).
  • Respect 429: set minimum 'min (Retry-After header, next backoff window)'.

Timeouts and sizes

Connection timeout ≤ 3-5 seconds; total response timeout ≤ 10 seconds

The size of the body under the contract (for example, ≤ 256 KB), otherwise 413 → the logic "chunking" or "pull URL."

4) Idempotency and deduplication

Idempotent application: processing repetitions of the same'id 'must return the same result and not change state again.
Dedup storage on the recipient's side: store '(X-Webhook-Id, processed_at, checksum)' with TTL ≥ retray windows (24-72 hours).
Compositional key: if several topics → '(subscription_id, event_id)'.

5) Order and "exactly-once effects"

It is difficult to guarantee strict order in distributed systems. Use:
  • Partition by key: the same logical set (for example, 'order _ id') is always in one "channel" of delivery.
  • Sequence: Reject events with the old 'X-Webhook-Seq' and put them in the "parking lot" before the missing ones arrive.
Exactly-once effects are achieved through:
  • log of applied operations (outbox/inbox pattern),
  • transactional upsert by 'event _ id' in the database,
  • sagas/compensations for complex processes.

6) Error resolution by status codes (Table)

Response codeValue for senderAction
2xxACK receivedWe consider delivered, stop retrai
4xx (except 410/429)Persistent error (payload/authorization)Put in DLQ, notify integration
410Endpoint deleted/deprecatedStop Retrays, Deactivate Subscription
408/429Temporary overload/timeoutRepeat by backoff/Jitter; consider 'Retry-After'
5xxTemporary server errorRepeat by backoff/Jitter
3xxDo not use redirects for webhooksTreat as Configuration Error

7) Channel security

HMAC signature of each message; check at the receiver with the "time window" (mitm and replay attacks).
mTLS for sensitive domains (LCC/payments).
IP allowlist of outgoing addresses, TLS 1. 2+, HSTS.
PII minimization: do not send unnecessary personal data; disguise in the logs.
Rotation of secrets: two valid keys (active/next) and the'X-Key-Id 'header to indicate the current one.

8) Queues, DLQs and Replays

Events must be written to the output queue/log on the sender side (for reliable replay).
If the maximum of retrays is exceeded, the event goes to DLQ (Dead Letter Queue) with the cause.
Replay API (for recipient/operator): resubmit by 'id '/time range/subject, with RPS restriction and additional signature/authorization.

Replay API Example (Sender):

POST /v1/webhooks/replay
{ "subscription_id": "sub_123", "from": "2025-11-03T00:00:00Z", "to": "2025-11-03T12:00:00Z" }
→ 202 Accepted

9) Contract and version

Version the event (the 'schema _ version' field) and the transport ('X-Webhook-Version').
Add fields only as optional; on deletion - minor migration and transition period (dual-write).
Document event types, examples, schemas (JSON Schemas), error codes.

10) Observability and SLO

Sender Key Metrics:
  • 'delivery _ success _ rate '(2xx/all attempts),' first _ attempt _ success _ rate'
  • `retries_total`, `max_retry_age_seconds`, `dlq_count`
  • `latency_p50/p95` (occurred_at → ack_received_at)
Recipient Key Metrics:
  • `ack_latency` (receive → 2xx), `processing_latency` (enqueue → done)
  • `duplicates_total`, `invalid_signature_total`, `out_of_order_total`
SLO examples:

99. 9% of events receive the first ACK ≤ 60 seconds (28d).

  • DLQ ≤ 0. 1% of the total; DLQ replay ≤ 24 hours.

11) Timing and network breaks

Use UTC in the time fields; synchronize NTP.
Send 'occurred _ at' and fix 'delivered _ at' to read the lag.
With long breaks, the network/endpoint → accumulate in the queue, limit growth (backpressure + quotas).

12) Recommended limits and hygiene

RPS per subscription (e.g. 50 RPS, burst 100) + concurrency (e.g. 10).
Max. body: 64-256 KB; for more - "notification + URL" and download signature.
Event names in 'snake. case 'or' dot. type` (`order. created`).
Strict idempotency of write operations of the receiver.

13) Examples: Sender and Receiver

13. 1 Sender (pseudocode)

python def send_event(event, attempt=0):
body = json. dumps(event)
sig = hmac_sha256_base64(body, secret)
headers = {
"X-Webhook-Id": event["id"],
"X-Webhook-Seq": str(event["sequence"]),
"X-Retry": str(attempt),
"X-Signature": f"sha256={sig}",
"Content-Type": "application/json"
}
res = http. post(endpoint, body, headers, timeout=10)
if 200 <= res. status < 300:
mark_delivered(event["id"])
elif res. status == 410:
deactivate_subscription()
else:
schedule_retry(event, attempt+1) # backoff + jitter, respect 429 Retry-After

13. 2 Receiver (pseudocode)

python
@app. post("/webhooks")
def handle():
body  = request. data headers = request. headers assert verify_hmac(body, headers["X-Signature"], secret)
evt_id = headers["X-Webhook-Id"]
if dedup_store. exists(evt_id):
return, "" 204 enqueue_for_processing (body) # fast path. dedup_store put(evt_id, ttl=723600)
return, "" 202 # or 204

14) Testing and chaos practices

Negative cases: invalid signature, 429/5xx, timeout, 410, large payloads.
Behavioral: out-of-order, duplicates, delays of 1-10 minutes, break for 24 hours.
Load: burst 10 ×; check for backpressure and DLQ persistence.
Contracts: JSON Schema, mandatory headings, stable event types.

15) Implementation checklist

  • 2xx = ACK, and quick return after enqueue
  • Exponential backoff + jitter, respect 'Retry-After'
  • Receiver IDempotency and X-Webhook-Id (TTL ≥ Retray)
  • HMAC signatures, secret rotation, optional mTLS
  • DLQ + Replay API, Monitoring and Alerts
  • Limits: Timeouts, RPS, Body Size
  • Order: partition by key or 'sequence' + "parking lot"
  • Documentation: schemas, examples, error codes, versions
  • Chaos tests: delays, duplicates, network failure, long replay

16) Mini-FAQ

Do I always need to answer 200?

Any 2xx counts as a success. 202/204 is normal practice for "accepted to queue."

Can replays be stopped?
Yes, a 410 response and/or via the sender's console/API (unsubscribe).

What about large payloads?
Send a "notification + secure URL," sign the download request and install TTL.

How to ensure order?
Partition by key + `sequence`; in case of discrepancy - "parking lot" and replay.

Total

Reliable webhooks are clear ACK (2xx) semantics, reasonable repeats with backoff + jitter, strict idempotence and deduplication, competent security (HMAC/mTLS), queues + DLQ + replays, and transparent observability. Fix the contract, enter limits and metrics, regularly run chaos scenarios - and your integrations will stop "pouring in" at the very first failures.

Contact

Get in Touch

Reach out with any questions or support needs.We are always ready to help!

Start Integration

Email is required. Telegram or WhatsApp — optional.

Your Name optional
Email optional
Subject optional
Message optional
Telegram optional
@
If you include Telegram — we will reply there as well, in addition to Email.
WhatsApp optional
Format: +country code and number (e.g., +380XXXXXXXXX).

By clicking this button, you agree to data processing.