GH GambleHub

Schema registry and data evolution

Why do I need a schema registry?

The Schema Registry is a centralized source of truth for data contracts (APIs, events, threads, messages, stores) that provides:
  • Predictable evolution: compatibility rules and automatic breakage checking.
  • Repeatability and transparency: the history of versions, who/when/why changed.
  • Standardization: uniform names, error formats, trace fields, PII labels.
  • Integration with CI/CD: blocking breaking changes before production.

The registry links Protocol-first and contract compatibility, making changes quick and secure.

Formats and applications

JSON Schema: REST/HTTP payloads, documents, configurations.
Avro: event buses (Kafka/Pulsar), compact/evolution via field ID.
Protobuf: gRPC/RPC, binary efficient, strict tags.
GraphQL SDL: type and directive schema, evolution via '@ deprecated'.
SQL DDL as an artifact: we fix agreed views (for example, external storefronts) - with caution.

💡 A single registry can store multiple types of artifacts at once, but with separate compatibility policies.

Compatibility modes

BACKWARD: New schemas read old data/messages. Suitable for a producer who extends payload additively.
FORWARD: old consumers read new data correctly (requires a tolerant reader).
FULL: combines both (stricter, more convenient for public contracts).
NONE: no checks - for sandboxes only.

Recommendations:
  • Events: more often BACKWARD (producer extends payload optional).
  • Public APIs: FULL or BACKWARD + strict tolerant reader on clients.
  • Internal prototypes: temporarily NONE, but not on trunk.

Safe (additive) vs. dangerous changes

Additive (OK):
  • Add an optional field/type.
  • Enum extension with new values ​ ​ (with tolerant reader).
  • Add alternate projection/event ('.enriched').
  • Easing constraints ('minLength', 'maximum' ↑, but not ↓).
Dangerous (break):
  • Delete/rename fields or change their type/mandatory.
  • Changing the semantics of statuses/codecs/order in threads.
  • Re-use of protobuf tags.
  • Changing the partitioning key in events.

Register organization

Naming and addressing

Groups/spaces: 'payments', 'kyc', 'audit'.
Names: 'payment. authorized. v1` (events), `payments. v1. CaptureRequest` (gRPC), `orders. v1. Order` (JSON Schema).
Major in name, minors in metadata/schema version.

Metadata

'owner '(command),' domain ',' slas' (SLO/SLA), 'security. tier` (PII/PCI), `retention`, `compatibility_mode`, `sunset`, `changelog`.

Lifecycle Management

Draft → Review → Approved → Released → Deprecated → Sunset.
Automatic validators/linters, manual design-review (API Guild), release notes.

Integration in CI/CD

1. Pre-commit: local linters (Spectral/Buf/Avro tools).
2. PR-pipeline: schema-diff → compatibility mode check; blocking breaking.
3. Artifact publish: push consistent schema to registry + generate SDK/models.
4. Runtime-guard (optional): Gateway/producer validates payload against the current scheme.

Example of steps in PR:
  • `openapi-diff --fail-on-breaking`
  • `buf breaking --against
    `
  • `avro-compat --mode BACKWARD`
  • generating golden samples and running CDC tests.

Evolution of schemes: practices

Additive-first: новые поля — `optional/nullable` (JSON), `optional` (proto3), default в Avro.
Reverse pyramid model: the core is stable, enrichment is nearby and optional.
Dual-emit/dual-write for major: we publish 'v1' and 'v2' in parallel.
Sunset plan: dates, uses, warnings, adapters.
Tolerant reader: clients ignore unknown fields and correctly handle new enum.

Examples of schemes and checks

JSON Schema (fragment, additive field)

json
{
"$id": "orders. v1. Order",
"type": "object",
"required": ["id", "status"],
"properties": {
"id": { "type": "string", "format": "uuid" },
"status": { "type": "string", "enum": ["created", "paid", "shipped"] },
"risk_score": { "type": "number", "minimum": 0, "maximum": 1 }
},
"additionalProperties": true
}
💡 Added'risk _ score'as optional → BACKWARD is compatible.

Avro (default for compatibility)

json
{
"type": "record",
"name": "PaymentAuthorized",
"namespace": "payment. v1",
"fields": [
{ "name": "payment_id", "type": "string" },
{ "name": "amount", "type": "long" },
{ "name": "currency", "type": "string" },
{ "name": "risk_score", "type": ["null", "double"], "default": null }
]
}

Protobuf (don't overuse tags)

proto syntax = "proto3";
package payments. v1;

message CaptureRequest {
string payment_id = 1;
int64 amount = 2;
string currency = 3;
optional double risk_score = 4; // additive
}
//tag = 4 is reserved for risk_score and cannot be changed/deleted without v2

Event register and partitioning

Naming events: 'domain. action. v{major}` (`payment. captured. v1`).
The partitioning key is part of the contract ('payment _ id', 'user _ id').
Core vs Enriched: '.v1' (core) and '.enriched. v1 '(details).
Registry compatibility: modes at theme/type level; CI refuses incompatible changes.

Migration Management

Expand → Migrate → Contract (REST/gRPC):

1. Add fields/tables 2) start writing/reading new fields; 3) delete old after sunset.

  • Dual-emit (Events): parallel to'v1 '/' v2', consumer/projection migration, then removal of 'v1'.
  • Replay: reassembling projections from the log to a new diagram (only with compatibility and migrators).
  • Adapters: gateways/proxies that translate 'v1↔v2' for complex clients.

Safety and compliance

PII/PCI labels in the diagram: 'x-pii: true', 'x-sensitivity: high'.
Access policies: who can publish/modify schemes (RBAC), sign releases.
Cryptography: signature of schema versions, immutable audit logs (WORM).
Right to be forgotten: specify fields that require encryption/crypto erasure; guidance in the registry.

Observability and audit

Dashboards: number of changes, types (minor/major), share of rejected PRs, version usage.
Audit trail: who changed the scheme, links to PR/ADR, related release.
Runtime metrics: percentage of messages that failed validation; compatibility incidents.

Tools (sample stack)

OpenAPI/JSON Schema: Spectral, OpenAPI Diff, Schemathesis.
Protobuf/gRPC: Buf, buf-breaking, protoc linters.
Avro/Events: Confluent/Redpanda Schema Registry, Avro-tools, Karapace.
GraphQL: GraphQL Inspector, GraphQL Codegen.
Registers/catalogs: Artifact Registry, Git-based registry, Backstage Catalog, custom UI.
Documentation: Redocly/Stoplight, Swagger-UI, GraphiQL.

Antipatterns

Swagger-wash: the scheme does not reflect the reality of the service (or vice versa).
Disabled compatibility check: "urgent" → the product breaks.
Reusing protobuf tags: silent data corruption.
Single compatibility mode "for everything": different domains require different modes.
Raw CDCs as public schemes: leaking the DB model out, the impossibility of evolution.

Implementation checklist

  • Defined artifact format and compatibility mode by domain.
  • Linters and schema-diff are configured in CI, PR is blocked when breaking.
  • Enabled for clients' tolerant reader; 'additionalProperties = true' (where applicable).
  • Major changes go through RFC/ADR, there is a sunset plan and dual-emit/dual-write.
  • Circuits are marked with PII/PCI and access levels; auditing is enabled.
  • Version usage and compatibility failures dashboards.
  • Generating SDK/models from the registry is part of the pipeline.
  • Documentation and golden samples updated automatically.

FAQ

Is it possible without a registry to store schemes in Git?
Yes, but the registry adds compatibility APIs, search, metadata, centralized policy, and on-the-fly validation. The best option is Git as storage + UI/policies on top.

How do I choose compatibility mode?
Look at the direction of change: if the producer expands payload - BACKWARD. For public API/SDK - FULL. For fast prototypes - temporarily NONE (not on trunk).

What to do if necessary breaking?
Preparing v2: dual-emit/dual-run, sunset-dates, adapters, telemetry of use, migration guides.

Do I need to validate payload in runtime?
For critical domains, yes: this prevents junk messages and speeds up diagnostics.

Total

The schema registry turns the evolution of data from risky improvisation into a manageable process: uniform interoperability rules, automated validations, comprehensible versions and a transparent history. Add to it the discipline of additive-first, tolerant reader, dual-emit and sunset - and your contracts will develop quickly, without breakdowns and night incidents.

Contact

Get in Touch

Reach out with any questions or support needs.We are always ready to help!

Telegram
@Gamble_GC
Start Integration

Email is required. Telegram or WhatsApp — optional.

Your Name optional
Email optional
Subject optional
Message optional
Telegram optional
@
If you include Telegram — we will reply there as well, in addition to Email.
WhatsApp optional
Format: +country code and number (e.g., +380XXXXXXXXX).

By clicking this button, you agree to data processing.