Backward compatibility
What is backward compatibility
Backward compatibility - the property of the system to accept and correctly process old clients/consumers when the system is updated. Simpler: you release a new version of the service/events, and existing integrations continue to work unchanged.
The key: don't break agreements. Any evolution is through adding, not remaking, an already released one.
Basic principles
1. Additive-first
New fields/methods/events are optionally added. Nothing existing is removed and does not change the meaning.
2. Minimum Warranty Contract (MGC)
Define a kernel - a set of fields/operations without which the script loses its meaning. The core is stable. Everything else is extensions.
3. Tolerant reader
Clients ignore unknown fields and correctly handle new enum (fallback) values.
4. Versions Policy
Breaking changes - only through the major line ('/v2 ',' payments. v2`, `event. v2`). Minor - additive.
5. Observability - part of contract
Client version, format, capability flags are visible in logs/tracks and metrics. This allows you to manage your migration.
Safe vs dangerous changes
Generally safe (BC-OK)
Add optional fields (JSON/Avro/Protobuf 'optional '/' nullable').
Add new endpoints/methods/events.
Enum extension with additional values (with tolerant reader).
Weakening of validation (maximization, addition of alternative formats).
Add non-meaningful headers/metadata.
Dangerous (Breaking)
Delete/rename fields, change the type or mandatory of existing fields.
Status/error code semantics change.
Re-use protobuf tags for other fields.
Changing the event partitioning key (breaks the order for the aggregate).
Tightening SLAs/timeouts, causing old customers to start falling.
By Interaction Styles
REST/HTTP + JSON
Additivity: new fields - 'optional', the server does not require them from old clients.
Versions: major - in transit ('/v2 ') or media type; minor - through extensions and '? include = '/'? fields ='.
Errors: uniform format; do not change codes/semantics without major.
ETag/If-Match: for safe updates without racing.
Idempotency: 'Idempotency-Key' for POST - old customers don't 'double' the effect on retreats.
gRPC / Protobuf
Tags are unchanged. Deleted tags cannot be reused.
New fields - 'optional '/' repeated'; default values are handled correctly by old code.
Streaming: do not change the order/obligation of messages within minor.
Errors - a stable set of statuses; new semantics → new method/service ('.v2').
Event-driven (Kafka/NATS/Pulsar) + Avro/JSON/Proto
Naming: 'domain. action. v{major}`.
Core vs Enriched: core stable; enrichment - individual types/themes ('.enriched').
Schema compatibility mode: more often BACKWARD; CI blocks incompatible changes.
Partitioning: key (for example, 'payment _ id') - part of the contract; change it - breaking.
GraphQL
Adding fields/types - OK; delete/rename - via '@ deprecated' and the migration window.
Do not raise "nullable → non-nullable" without major.
Monitor complexity/depth - limit change = contract change.
Patterns to help preserve BC
Reverse pyramid model: stabilize the core, expand optionally.
Capability negotiation: the client reports supported capabilities ('X-Capabilities '/handshake), the server adjusts.
Dual-run/dual-emit: Keep 'v1' and 'v2' at the same time during migration.
Adapters: proxies/gateways translate 'v1↔v2' requests for "heavy" clients.
Expand-and-contract (for DB): first add a new one, start writing/reading, only then delete the old one.
Governance and Process
1. Contract Catalog (Schema Registry): A single source of truth with compatibility policies.
2. Linters and diff checks in CI/CD: OpenAPI-diff, Buf-breaking, Avro/JSON Schema compatibility check.
3. CDC/Consumer-Driven Contracts: Provider is tested for real consumer contracts.
4. Golden samples: reference queries/responses/events for regression.
5. Change management: RFC/ADR on breaking, sunset plans, communication.
Deprekate and removal of old versions
Mark obsolete ('@ deprecated', descriptions, headers' Deprecation ',' Sunset ').
Migration window: pre-announced date, test bench, code examples.
Usage telemetry: Who else is on 'v1'? segment metrics/logs by version.
Dual-run to zero traffic, then delete.
Observability and operational metrics
Percentage of requests/messages by version.
Share of errors/timeouts for older clients after release.
The proportion of incompatible payload (validation by the scheme on the gateway/stream filters).
Consumer migration lag (how many more listen to 'v1').
Backward compatibility testing
Schema-diff: fail при remove/rename/type-change.
Contract tests: old SDKs/clients race against new implementation.
E2E canary: part of the old traffic to the new version, comparison of p95/p99, codes, retrays.
Event replay: Projections are collected by new logic from the old log without discrepancies.
Fault-injection: delays/partial responses - old clients do not fall.
Examples
REST (additive)
It was:json
{ "id": "p1", "status": "authorized" }
It became:
json
{ "id": "p1", "status": "authorized", "risk_score": 0. 12 }
Old clients, ignoring the'risk _ score', continue to work.
Protobuf (tags)
proto message Payment {
string id = 1;
string status = 2;
optional double risk_score = 3 ;//new field, safe
}
//Tags 1 and 2 cannot be changed/deleted without v2
Events (core + enrichment)
`payment. authorized. v1 '- kernel (minimum facts).
`payment. enriched. v1 '- parts; core consumers are not dependent on enrichment.
Antipatterns
Swagger-wash: the scheme has been updated, but the service behaves in the old way (or vice versa).
Hidden breaks: changed the meaning of the field/status without a version.
Re-use of protobuf tags: "quiet" data corruption.
Hard clients: fall in unfamiliar fields/enum; no tolerant reader.
Mega-endpoint: One all-in-one - any change becomes a potential scrap.
Pre-release checklist
- Changes are additive; the nucleus (MGC) is untouched.
- Linters/diff checks passed; there are no breaking flags.
- Client SDKs have been updated (or are not required for the additive extension).
- Enabled tolerant reader for clients; enum-fallback checked.
- Metrics/logs contain version and capability flags.
- For potential breakage there is '/v2 ', dual-run and sunset plan.
- Documentation/examples have been updated, there are golden sets.
FAQ
Backward vs forward - what's the difference?
Backward - new servers work with old clients. Forward - new clients work correctly with old servers (due to tolerant reader and neat defaults). Full circle - full compatibility.
Do I always need to do '/v2 'for big changes?
Yes, if invariants/types/keys/semantics break. Otherwise, keep the line and evolve additively.
What about enum?
Add new values without changing the meaning of the old ones. Clients must have fallback at an unknown value.
What if you have already "broken"?
Rollback, hot-fix adapter, 'v2' release with dual-run, communication and migration guide.
Total
Backward compatibility is the discipline of evolution: stabilize the kernel, expand additively, implement a tolerant reader, automate checks and maintain a conscious deprecate. This way you can quickly develop the platform without leaving customers under the rubble of "invisible" changes.