Multi-tenant core
A multi-tenant core is the base layer of a platform/product that serves many independent clients (tenants) on shared resources with guaranteed isolation, managed limits and secure customization. A well-designed core reduces TCO, speeds onboarding, simplifies releases and delivers predictable quality for every tenant.
1) Tenant model and isolation boundaries
Definitions
Tenant/Org/Account - a logical organization with its own users, data, policies, and limits.
Isolation: The ability to prevent one tenant from affecting the data, performance and security of another.
Insulation levels
1. Data: individual databases/schemas/tables, encryption with the "tenant key," filters' tenant _ id '.
2. Calculations: CPU/RAM/IO quotas, tenant worker pool or weighted queues.
3. Network: segmentation, private endpoints/VPN, lists of permits by tenant.
4. Operations: migrations, backups, DR and incidents with boundaries of impact "per tenant."
Multi-tenancy patterns
Silo (hard isolation): individual clusters/databases per tenant. Maximum security, high price.
Pool: shared infrastructure with logical isolation; better efficiency, higher risk of "noisy neighbor."
Bridge/Hybrid: hybrid - common control plane + selectively "silo" for VIP/regulated customers.
2) Identification and routing of tenant requests
Tenant Resolution
By domain: 'https ://{ tenant} .example. com`
Along the way: '/t/{ tenant }/... '
By title: 'X-Tenant-Id', 'X-Org' (signature verification)
By token: stamps' tenant _ id ',' org _ id ',' plan ',' scopes'
Routing
The L7 gateway (API gateway/Ingress) extracts' tenant _ id ', enriches the context (' plan ', limits, region), writes to trails/logs.
Function services accept a read-only context; decisions on the route and limits are made by the gateway/policy engine.
3) Data and schemas: strategies
Storage options
Shared-schema, row-level: one set of tables, 'tenant _ id' field, strict RLS (Row-Level Security).
Shared-DB, per-schema: one DBMS, a separate scheme per tenant; balance between controllability and insulation.
Per-DB/cluster: separate database/cluster per tenant; more expensive, easier for sovereign claims.
Key Practices
Everywhere explicitly pass' tenant _ id'and include it in compound keys/indexes.
RLS/DBMS-level access policies + double-lock service validation.
Encryption: key hierarchy (root KMS → key-envelope per tenant → DEK per object).
Archive/retention and "right to be forgotten" are managed by tenant-level policies.
4) Settings, features and versions
Tenant Configurations
Table/storage 'tenant _ config' (plan, quotas, feature flags, localization, SLA).
Priorities of configs: default → plan → tenant → environment → user.
Config caching with short TTL and disability by event.
Feature flags and compatibility
Enabling functions point (per-tenant/per-cohort), canary rolling.
API versioning: stable contract + adapters at the border (back/forward-compatible formats).
5) Limits, quotas and billing
Consumption policies
Rate limiting: 'requests/sec' per tenant/route, "token-bucket" with plan priorities.
Quotas: storage capacity, number of objects, messages/min, jobs/hour.
Fairness: "weighted schedule" of queues + isolation of workers for VIP.
Billing
Counters by 'tenant _ id' (usage metrics) → → invoice aggregators.
Usage snapshots on the border (idempotency and event loss protection).
Models: fixed plan + consumption, post-pay/pre-pay, discounts "tiered."
6) Security and access
Authentication/Authorization
OIDC/SAML with the marks' tenant _ id ',' roles', 'scopes'.
RBAC/ABAC - Tenant level roles (Owner/Admin/Reader), project/department attributes
Service-to-service with mTLS and restricted tokens.
Trust boundaries
Request acceptance policies: header signature verification, nonce/timestamp, source binding.
Secrets and keys: rotation per-tenant, individual KMS contexts, audit of key operations.
Multi-region & data residency: pinning a tenant to a region, controlled cross-regional flows.
7) Observability "by tenants"
Trace and Metrics
Required tags are 'tenant _ id', 'plan', 'region', 'endpoint', 'status'.
SLI/SLO per tenant: `availability`, `p95 latency`, `error budget`.
Dashboards and alerts by segment (VIP/regulated/new).
Logs and Audits
Activity logs (who/what/when/where) with unchangeable storage and retention according to tenant policy.
Pre-aggregation of events for cheap storage, restoration of detail "by click."
8) Performance and "noisy neighbor"
Anti-noise measures
Limits on the level of queues/workers, CPU-shares and IO-proportion per tenant.
Cache separation: key prefixes' tenant: {id}:... ', TTL by plans, protection against "cache stampede."
Indexes and query plans based on 'tenant _ id' selectivity.
Cold starts and "warm" pools
Pre-warm-up for VIP/peak windows.
Elastic pools of workers based on metric signals (backpressure/autoscaling).
9) Upgrades and migrations without downtime
Strategy
Backward-compatible migrations (expand → migrate → contract).
Migrations "by tenants": batches with progress control, "pause/rollback" for a specific 'tenant _ id'.
Sampling and "canary" migrations on a subset of tenants.
DR and Incidents
DR plan with RTO/RPO per tenant; partial "read-only mode" without global downtime.
Isolation of the incident: fusing by 'tenant _ id', extinguishing the "hot" tenant does not affect the rest.
10) APIs and protocols
REST/gRPC with mandatory tenant context (in stamps/headers).
Events (event-driven): topics with naming 'tenant. {id} .event', filters and ACLs for subscriptions.
Global entry points: the L7 gateway validates the context, applies limits, encrypts the PII according to the tenant's policy.
11) Tenant life cycle
1. Provisioning: creating a tenant record, generating keys/configs, linking a region.
2. Activation: release of the OIDC/SAML client, creation of roles/policies, primary quotas.
3. Operation: monitoring, billing, flag/plan updates.
4. Suspend/throttling: freeze with data retention/export.
5. Deletion/export: retention, mothballed backups, crypto-shredding.
12) Mini-reference architecture (verbal scheme)
Edge (API gateway): TLS/mTLS, extraction 'tenant _ id', limits, auditing.
Control Plane: catalog of tenants, configs, feature flags, billing, politics.
Data Plane (services): stateless services, queues, quota workers; Redis/kv prefixed by tenant.
Storage: RLS-DB/individual schemes/DB; KMS with keys per tenant; object storage with envelope encryption.
Observability: tracing/metrics/logs with tag 'tenant _ id', alerts per plan.
Admin: isolated operations (migrations/backups) on a subset of tenants.
13) Pre-sale checklist
- A single way to define tenant _ id at the border and in services.
- RLS/ACL policies are tested with tests and "negative scripts."
- Quotas/limits/billing are validated on real loads; there is protection against "billing drops."
- Observability and SLO per tenant; alerts for VIP/regulated.
- Migrations are compatible, there is a partial rollback and loaner batches.
- DR scenarios with RTO/RPO per tenant and regular exercises.
- Tenant key encryption, key rotation, and key audit.
- Documentation of API contracts/events and versioning policies.
14) Typical errors
Global migrations "in one fell swoop" without the ability to stop on a problem tenant.
Hidden dependency on'tenant _ id'in cache/queues → data leakage/queue crossing.
Context mixing (admin operations accidentally without 'tenant _ id').
Absence of "double lock": only service check without RLS in the database.
A single limiter for the entire cluster → "noisy neighbor" and SLO violation.
Non-transparent billing without idempotency and audit trail.
15) Quick strategy selection
Strict isolation/regulation: Silo (separate databases/clusters), region-lock.
Balanced efficiency: Shared-DB per schema + RLS, keys per tenant.
High real-time traffic: common queues with "weighted" quotas and dedicated workers for VIP.
Many customizations: feature flags + API adapters, storing configs by priority.
Conclusion
The multi-tenant core is the discipline of engineering boundaries: explicit definition of 'tenant _ id', strict isolation on all layers, managed quotas and transparent billing, plus observability and release compatibility. Following the described patterns allows you to scale the product without sacrificing safety, quality and speed of change for each tenant.