Error Handling and Status Codes
1) Why standardize errors
A single error contract speeds up client debugging, reduces false retrays, and makes RCA playable. Good system:- predictably encodes the type of problem,
- gives the client valid prompts (what to do next),
- protects against leakage of internal parts,
- compatible with retras and idempotency.
2) Design principles
1. One error scheme for all services (REST/GraphQL/gRPC/webhooks).
2. Clear semantics of retrays: which codes to retract, which not.
3. Fail-closed on write operations: better 4xx/5xx than quiet inconsistency.
4. No leaks: do not disclose SQL, stacks, configs, internal IDs.
5. Trace - Always return 'trace _ id '/' correlation _ id'.
6. Localization of messages is optional, but codes and'reason 'remain stable.
3) Single format (Problem Details/JSON)
Recommended base format (RFC 7807 compliant):json
{
"type": "https://errors.example.com/auth/invalid-token",
"title": "Invalid access token",
"status": 401,
"code": "AUTH_INVALID_TOKEN",
"detail": "Token expired or signature invalid.",
"instance": "/api/v1/payments/12345",
"trace_id": "01HX3...ABC",
"hint": "Obtain a new token via OAuth2 refresh.",
"meta": {
"scope": "payments:write",
"policy": "deny-by-default"
}
}
Explanations:
- 'type'is a stable error class URL.
- 'code '- short domain machine code (stable between releases).
- 'hint '- what to do for the client (repeat, update token, change parameters).
- 'meta '- secure parts (without secrets and PII).
4) Status code map (minimum set)
Authentication/Authorization
400 Bad Request - structural validation/scheme.
401 Unauthorized - no/invalid token. Add'WWW-Authenticate '.
403 Forbidden - authenticated but no rights/policies denied.
404 Not Found - mask the existence of a resource without rights.
409 Conflict - version/state conflict (optimistic lock, idempotency).
451 Unavailable For Legal Reasons - Compliance/Jurisdiction Block.
Limits and protection
408 Request Timeout - The client is sending the body too slowly.
409/425 Too Early - prohibition of early repetition in 0-RTT/TLS 1. 3.
429 Too Many Requests - with 'Retry-After' and limit policy.
499 Client Closed Request - (at the perimeter/NGINX) the client disconnected the connection.
Data and Business Rules
422 Unprocessable Content - business validation passed the scheme, but the meaning is incorrect.
423 Locked - resource blocked (KYC review, AML freeze).
409 Conflict - double submission, race, status limit (for example, "already in process").
410 Gone - endpoint/resource deleted (deprecate complete).
Server
500 Internal Server Error - unknown error; not disclose details.
502 Bad Gateway - dependency returned error/proxying.
503 Service Unavailable - degradation/planned work; add'Retry-After '.
504 Gateway Timeout.
5) Retray and idempotency semantics
You cannot retract 400/ 401/403/404/422 (unless the customer has changed the request).
You can retract: 408/429/5xx/ 425/499/504 (with backoff + jitter).
Idempotency: For'POST ', enable'Idempotency-Key' (UUIDv4).
For a retry conflict, return 409 with'hint: "Use same Idempotency-Key or GET status" '.
Add'Idempotency-Replay: true'when returning a saved result.
HTTP/1.1 429 Too Many Requests
Retry-After: 3
RateLimit-Limit: 50
RateLimit-Remaining: 0
RateLimit-Reset: 1730641030
6) Input validation: field error structure
For 400/422, use an array of field errors:json
{
"type": "https://errors.example.com/validation",
"title": "Validation failed",
"status": 422,
"code": "VALIDATION_ERROR",
"trace_id": "01HX4...XYZ",
"errors": [
{"field": "amount", "rule": "min", "message": "Must be >= 10"},
{"field": "currency", "rule": "enum", "message": "Unsupported currency"}
]
}
7) Partial failures (batch/partial failure)
In batch endpoints, do not hide errors inside 200 without structure. Return 207 Multi-Status or 200 with an array of results, where each task has its own status:json
{
"status": "partial",
"succeeded": 8,
"failed": 2,
"results": [
{"id": "op1", "status": 201},
{"id": "op2", "status": 422, "error": {"code":"VALIDATION_ERROR","detail":"..."}}
]
}
8) Pagination and "blank" answers
Empty collection - 200 s' items: [] ', not 404.
End of page - 'next _ page _ token' is missing.
Incorrect token - 400 s' code: PAGINATION_CURSOR_INVALID'.
9) Webhooks: Reliable Delivery
Sign events (HMAC) and check before processing.
The response to successful processing is 2xx (best 204).
Receiver temporary failures - 5xx; the sender repeats (exponential backoff, jitter).
Deduplication by 'event _ id' and saving the result (idempotent consumer).
Invalid payload - 400/422 no retries.
10) Protocol conformance (gRPC/GraphQL)
gRPC → HTTP:- `INVALID_ARGUMENT` → 400
- `UNAUTHENTICATED` → 401
- `PERMISSION_DENIED` → 403
- `NOT_FOUND` → 404
- `ALREADY_EXISTS` → 409
- `FAILED_PRECONDITION` → 412/422
- `RESOURCE_EXHAUSTED` → 429
- `ABORTED` → 409
- `UNAVAILABLE` → 503
- `DEADLINE_EXCEEDED` → 504
json
{
"data": { "createPayment": null },
"errors": [{
"message": "Forbidden",
"extensions": { "code": "FORBIDDEN", "status": 403, "trace_id": "..." },
"path": ["createPayment"]
}]
}
It is recommended to use the corresponding HTTP code instead of 200 for critical errors.
11) Titles and customer tips
'Retry-After '- seconds/HTTP date (429/503/425/408).
'Warning '- soft degradation or deprecate ("199 - Feature X is depressed").
`Deprecation`, `Sunset`, `Link: <...>; rel = "deprecation" '- for controlled shutdown.
'Problem-Type '(custom) - fast error routing on the client.
'X-Trace-Id '/' Correlation-Id '- links logs/traces.
12) Message security
Do not repeat input secrets (tokens/signatures) in the response body.
Mask PAN/PII ('1234').
For 401/403 - do not disclose which attribute failed.
For 404, instead of "resource exists but not yours" - just 404.
13) Observability of errors
Metrics:- `http_errors_total{status, route, tenant}`
- 'error _ classes _ total {code} '(by'code' from the body)
- share 429, 5xx; 'p95 '/' p99' latency for erroneous answers separately
- 'retry _ after _ seconds _ bucket '- histogram of repetition tips
- associate the response with 'trace _ id', store 'code', 'type', 'status', 'route', 'tenant', no PII.
- spike '5xx _ rate> X%' at RPS> N;
- growth of 429 on critical routes;
- 'timeout/504'of dependencies;
- frequent 409/idempotency → a sign of racing.
14) Examples
14. 1,422 (business validation)
json
{
"type": "https://errors.example.com/payments/limit-exceeded",
"title": "Limit exceeded",
"status": 422,
"code": "PAYMENT_LIMIT_EXCEEDED",
"detail": "Daily withdrawal limit reached for KYC1.",
"hint": "Increase limits after KYC2 or try tomorrow.",
"trace_id": "01J5...XYZ"
}
14. 2,409 (idempotency)
HTTP/1.1 409 Conflict
Idempotency-Replay: true
json
{
"type": "https://errors.example.com/idempotency/replay",
"title": "Duplicate request",
"status": 409,
"code": "IDEMPOTENT_REPLAY",
"detail": "A request with the same Idempotency-Key was already processed.",
"hint": "Reuse the same Idempotency-Key and GET the operation status."
}
14. 3,429 (limits)
json
{
"type":"https://errors.example.com/rate/too-many-requests",
"title":"Too many requests",
"status":429,
"code":"RATE_LIMITED",
"detail":"Per-key rate limit exceeded.",
"hint":"Retry after the time specified in Retry-After header."
}
15) Antipatterns
Return 200 with body error text.
Mix different error formats between services.
Expand stack/SQL/table names/internal URLs in'detail '.
Use 'message' instead of stable 'code '/' type'.
Return 500 when an expected business error occurs (for example, "balance is insufficient").
Inconsistent semantics between REST/GraphQL/gRPC.
16) Specifics of iGaming/Finance
Clear codes for KYC/AML/sanctions: 'KYC _ REQUIRED', 'KYC _ REVIEW', 'AML _ LOCK', 'SANCTION _ BLOCKED'.
Jurisdictional restrictions: 451 with secure wording without listing.
Monetary write operations: 409/423 for competition and locks, 'hint' with a redo window.
Player limit invariants: Use 422 for responsible payment violations.
Audit: unchangeable solution logs (code, time, actor, trace_id).
17) Prod Readiness Checklist
- Single JSON error scheme, stable 'type '/' code'.
- HTTP ↔ gRPC/GraphQL mapping is consistent and documented.
- Retray semantics + 'Retry-After'; idempotency for write.
- PII/secret masking; 404 to hide resources.
- Error and alert metrics; correlation with 'trace _ id'.
- Deprecate policies: 'Deprecation', 'Sunset', 'Link'.
- Tests: negative/fuzz, version conflict, dependency drop, double-submit.
- Customer guide: Back-off examples and 409/422/429/5xx processing.
18) TL; DR
Standardize a single JSON error format with 'type '/' code '/' trace _ id', use the correct HTTP codes, distinguish between validation (400/422), (401/403/404 rights), conflicts/idempotency (409), and limits (429). Give clear 'Retry-After' and 'hint', mask sensitive data, log errors with 'trace _ id' and build alerts by 5xx/429/p99.