GH GambleHub

Schema registry and data evolution

Why do I need a schema registry

The Schema Registry is a centralized source of truth for data contracts (APIs, events, threads, messages, stores) that provides:
  • Predictable evolution: compatibility rules and automatic breakage checking.
  • Repeatability and transparency: the history of versions, who/when/why changed.
  • Standardization: uniform names, error formats, trace fields, PII labels.
  • Integration with CI/CD: blocking breaking changes before production.

The registry links Protocol-first and contract compatibility, making changes quick and secure.


Formats and applications

JSON Schema: REST/HTTP payloads, documents, configurations.
Avro: event buses (Kafka/Pulsar), compact/evolution via field ID.
Protobuf: gRPC/RPC, binary efficient, strict tags.
GraphQL SDL: type and directive schema, evolution via '@ deprecated'.
SQL DDL as an artifact: we fix agreed views (for example, external storefronts) - with caution.

💡 A single registry can store multiple types of artifacts at once, but with separate compatibility policies.

Compatibility modes

BACKWARD: New schemas read old data/messages. Suitable for a producer who extends payload additively.
FORWARD: old consumers read new data correctly (requires a tolerant reader).
FULL: combines both (stricter, more convenient for public contracts).
NONE: no checks - for sandboxes only.

Recommendations:
  • Events: more often BACKWARD (producer extends payload optional).
  • Public APIs: FULL or BACKWARD + strict tolerant reader on clients.
  • Internal prototypes: temporarily NONE, but not on trunk.

Safe (additive) vs. dangerous changes

Additive (OK):
  • Add an optional field/type.
  • Enum extension with new values ​ ​ (with tolerant reader).
  • Add alternate projection/event ('.enriched').
  • Easing constraints ('minLength', 'maximum' ↑, but not ↓).
Dangerous (break):
  • Delete/rename fields or change their type/mandatory.
  • Changing the semantics of statuses/codecs/order in threads.
  • Re-use of protobuf tags.
  • Changing the partitioning key in events.

Register organization

Naming and addressing

Groups/spaces: 'payments', 'kyc', 'audit'.
Names: 'payment. authorized. v1` (events), `payments. v1. CaptureRequest` (gRPC), `orders. v1. Order` (JSON Schema).
Major in name, minors in metadata/schema version.

Metadata

'owner '(command),' domain ',' slas' (SLO/SLA), 'security. tier` (PII/PCI), `retention`, `compatibility_mode`, `sunset`, `changelog`.

Lifecycle Management

Draft → Review → Approved → Released → Deprecated → Sunset.
Automatic validators/linters, manual design-review (API Guild), release notes.


Integration into CI/CD

1. Pre-commit: local linters (Spectral/Buf/Avro tools).
2. PR-pipeline: schema-diff → compatibility mode check; blocking breaking.
3. Artifact publish: push consistent schema to registry + generate SDK/models.
4. Runtime-guard (optional): Gateway/producer validates payload against the current scheme.

Example of steps in PR:
  • `openapi-diff --fail-on-breaking`
  • `buf breaking --against
    `
  • `avro-compat --mode BACKWARD`
  • generating golden samples and running CDC tests.

Evolution of schemes: practices

Additive-first: новые поля — `optional/nullable` (JSON), `optional` (proto3), default в Avro.
Reverse pyramid model: the core is stable, enrichment is nearby and optional.
Dual-emit/dual-write for major: we publish 'v1' and 'v2' in parallel.
Sunset plan: dates, uses, warnings, adapters.
Tolerant reader: clients ignore unknown fields and correctly handle new enum.


Examples of schemes and checks

JSON Schema (fragment, additive field)

json
{
"$id": "orders.v1.Order",
"type": "object",
"required": ["id", "status"],
"properties": {
"id": { "type": "string", "format": "uuid" },
"status": { "type": "string", "enum": ["created", "paid", "shipped"] },
"risk_score": { "type": "number", "minimum": 0, "maximum": 1 }
},
"additionalProperties": true
}
💡 Added'risk _ score'as optional → BACKWARD is compatible.

Avro (default for compatibility)

json
{
"type": "record",
"name": "PaymentAuthorized",
"namespace": "payment.v1",
"fields": [
{ "name": "payment_id", "type": "string" },
{ "name": "amount", "type": "long" },
{ "name": "currency", "type": "string" },
{ "name": "risk_score", "type": ["null", "double"], "default": null }
]
}

Protobuf (do not reuse tags)

proto syntax = "proto3";
package payments.v1;

message CaptureRequest {
string payment_id = 1;
int64 amount = 2;
string currency = 3;
optional double risk_score = 4; // additive
}
// tag=4 зарезервирован под risk_score, его нельзя менять/удалять без v2

Event register and partitioning

Naming events: 'domain. action. v{major}` (`payment. captured. v1`).
The partitioning key is part of the contract ('payment _ id', 'user _ id').
Core vs Enriched: '.v1' (core) and '.enriched. v1 '(details).
Registry compatibility: modes at theme/type level; CI refuses incompatible changes.


Migration Management

Expand → Migrate → Contract (REST/gRPC):

1. Add fields/tables 2) start writing/reading new fields; 3) delete old after sunset.

  • Dual-emit (Events): parallel to'v1 '/' v2', consumer/projection migration, then removal of 'v1'.
  • Replay: reassembling projections from the log to a new diagram (only with compatibility and migrators).
  • Adapters: gateways/proxies that translate 'v1↔v2' for complex clients.

Safety and compliance

PII/PCI labels in the diagram: 'x-pii: true', 'x-sensitivity: high'.
Access policies: who can publish/modify schemes (RBAC), sign releases.
Cryptography: signature of schema versions, immutable audit logs (WORM).
Right to be forgotten: specify fields that require encryption/crypto erasure; guidance in the registry.


Observability and audit

Dashboards: number of changes, types (minor/major), share of rejected PRs, version usage.
Audit trail: who changed the scheme, links to PR/ADR, related release.
Runtime metrics: percentage of messages that failed validation; compatibility incidents.


Tools (sample stack)

OpenAPI/JSON Schema: Spectral, OpenAPI Diff, Schemathesis.
Protobuf/gRPC: Buf, buf-breaking, protoc linters.
Avro/Events: Confluent/Redpanda Schema Registry, Avro-tools, Karapace.
GraphQL: GraphQL Inspector, GraphQL Codegen.
Registers/catalogs: Artifact Registry, Git-based registry, Backstage Catalog, custom UI.
Documentation: Redocly/Stoplight, Swagger-UI, GraphiQL.


Anti-patterns

Swagger-wash: the scheme does not reflect the reality of the service (or vice versa).
Disabled compatibility check: "urgent" → the product breaks.
Reusing protobuf tags: silent data corruption.
Single compatibility mode "for everything": different domains require different modes.
Raw CDCs as public schemes: leaking the DB model out, the impossibility of evolution.


Implementation checklist

  • Defined artifact format and compatibility mode by domain.
  • Linters and schema-diff are configured in CI, PR is blocked when breaking.
  • Enabled for clients' tolerant reader; 'additionalProperties = true' (where applicable).
  • Major changes go through RFC/ADR, there is a sunset plan and dual-emit/dual-write.
  • Circuits are marked with PII/PCI and access levels; auditing is enabled.
  • Version usage and compatibility failures dashboards.
  • Generating SDK/models from the registry is part of the pipeline.
  • Documentation and golden samples updated automatically.

FAQ

Is it possible without a registry to store schemes in Git?
Yes, but the registry adds compatibility APIs, search, metadata, centralized policy, and on-the-fly validation. The best option is Git as storage + UI/policies on top.

How do I choose compatibility mode?
Look at the direction of change: if the producer expands payload - BACKWARD. For public API/SDK - FULL. For fast prototypes - temporarily NONE (not on trunk).

What to do if necessary breaking?
Preparing v2: dual-emit/dual-run, sunset-dates, adapters, telemetry of use, migration guides.

Do I need to validate payload in runtime?
For critical domains, yes: this prevents junk messages and speeds up diagnostics.


Result

The schema registry turns the evolution of data from risky improvisation into a manageable process: uniform interoperability rules, automated validations, comprehensible versions and a transparent history. Add to it the discipline of additive-first, tolerant reader, dual-emit and sunset - and your contracts will develop quickly, without breakdowns and night incidents.

Contact

Get in Touch

Reach out with any questions or support needs.We are always ready to help!

Start Integration

Email is required. Telegram or WhatsApp — optional.

Your Name optional
Email optional
Subject optional
Message optional
Telegram optional
@
If you include Telegram — we will reply there as well, in addition to Email.
WhatsApp optional
Format: +country code and number (e.g., +380XXXXXXXXX).

By clicking this button, you agree to data processing.