Tenant isolation and limits

Tenant isolation and limits are the foundation of multi-tenant architecture. Purpose: So that the actions of one tenant never affect the data, security and SLO of another, and resources are distributed fairly and predictably. Below is a practical map of solutions from the data level to computing planning and incident management.

1) Threat model and targets

Threats

Data leakage between tenants (logical/by cache/via logs).
"Noisy neighbor": degradation of performance due to spikes in one client.
Privilege escalation (access policy error).
Billing drift (mismatch of use and charges).
Cascade fail-safe scenarios (incident of one leads to downtime of many).

Objectives

Strict isolation of data and secrets.
Marginal limits/quotas and fair planning.
Transparent auditing, observability and billing.
Incident localization and rapid recovery per tenant.

2) Insulation levels (end-to-end model)

1. Data

'tenant _ id'in keys and indexes, Row-Level Security (RLS).
Encryption: KMS hierarchy → tenant key (KEK) → data keys (DEK).
Separate schemes/DB with high requirements (Silo), a common cluster with RLS for efficiency (Pool).
Retention policies and "right to forget" per tenant, crypto-shredding keys.

2. Calculations

CPU/RAM/IO quotas, worker pools per tenant, weighted queues.
GC/heap isolation (JVM/Runtime containers/settings), parallelism limits.
Per-tenant autoscaling + backpressure.

3. Network

Segmentation: private endpoints/VPC, ACL by 'tenant _ id'.
Rate limiting and per-tenant connection caps at the border.
Protection against DDoS/bots, taking into account plan/priority.

4. Operations and Processes

Tenant migrations, backups, DR, feature-flags.
Incidents - "micro-blast-radius": fusing by 'tenant _ id'.

3) Access control and tenant context

AuthN: OIDC/SAML; tokens carry 'tenant _ id', 'org _ id', 'plan', 'scopes'.
AuthZ: RBAC/ABAC (roles + attributes of project, department, region).
Context at the border: the API gateway extracts and validates the tenant context, supplements with limits/quotas, writes to trails.
Principle of "double lock": checking in the + RLS service/database policy.

4) Data: schemes, cache, logs

Schemes:

Shared-schema (row-level): maximum efficiency, strict RLS is required.
Per-schema: isolation/operability tradeoff.
Per-DB/cluster (Silo): for VIP/regulated.

Cache: key prefixes' tenant: {id}:... ', TTL by plans, cache-stampede protection (lock/early refresh).

Logs/metadata: full pseudonymization of PII, filters by 'tenant _ id', prohibition of "gluing" logs of different tenants.

5) Limiting traffic and operations

Basic mechanics

Token Bucket: smoothed bursts, parameterization 'rate '/' burst'.
Leaky Bucket: Stabilization throughput.
Fixed Window/Sliding Window: simple/exact quotas on the time window.
Concurrency limits: caps for simultaneous requests/jabs.

Where to apply

At the border (L7/API gateway) - basic protection and "quick failure."

In the core (in services/queues) - for the second circuit and "fair share."

Policies

By tenant/plan/endpoint/type of operation (public APIs, heavy exports, admin actions).
Priority-aware: VIP gets more 'burst' and weight in arbitration.
Idempotency-keys for safe retreats.

Sample profiles (concepts)

Starter: 50 req/s, burst 100, 2 parallel exports.
Business: 200 req/s, burst 400, 5 exports.
Enterprise/VIP: 1000 req/s, burst 2000, dedicated workers.

6) Quotas and fair planning (fairness)

Resource quotas: storage, objects, messages/min, jobs/hour, queue size.
Weighted Fair Queuing/Deficit Round Robin: "Weighted" access to shared workers.
Per-tenant worker pools: rigid isolation for noisy/critical customers.
Admission control: failure/degradation before execution when quotas are exhausted.
Backoff + jitter: exponential delays to keep bursts out of sync.

7) Observability and billing per tenant

Required tags are 'tenant _ id', 'plan', 'region', 'endpoint', 'status'.
SLI/SLO per tenant: p95/p99 latency, error rate, availability, utilization, saturation.
Usage metrics: CPU operation/byte/second counters → aggregator → invoices.
Billing idempotence: snapshots at the border, protection against double write-offs/loss of events.
Dashboards in segments: VIP/regulated/new tenants.

8) Incidents, degradation and DR "by tenant"

Fusing by 'tenant _ id': emergency shutdown/throttling of a specific tenant without affecting the rest.
Graceful Degradation: read-only mode, sandbox queues, deferred tasks.
RTO/RPO per tenant: recovery and loss targets for each plan.

Drill: Regular "game days" with noisy tenant cut off and DR checked

9) Compliance (residency, privacy)

Pinning tenant to the region; clear cross-regional flow rules.
Key/data access audit, admin logging.
Manage retention and data export per tenant.

10) Mini reference: how to put it together

Request flow

1. Edge (API gateway): TLS → extract'tenant _ id '→ token validation → apply rate/quotas → put trails.
2. Political engine: context 'tenant _ id/plan/features' → decision about route and limits.
3. Service: checking rights + labels' tenant _ id '→ working with database under RLS → cache with prefix.
4. Usage-collection: counters of operations/bytes → aggregator → billing.

Data

Schema/DB by strategy (row-level/per-schema/per-DB).
KMS: tenant keys, rotation, crypto-shredding on deletion.

Computing

Queues with weights, pools of workers per tenant, caps by concurrency.
Autoscaling by per-tenant metrics.

11) Pseudo-politics (for orientation)

yaml limits:
starter:
req_per_sec: 50 burst: 100 concurrency: 20 exports_parallel: 2 business:
req_per_sec: 200 burst: 400 concurrency: 100 exports_parallel: 5 enterprise:
req_per_sec: 1000 burst: 2000 concurrency: 500 exports_parallel: 20

quotas:
objects_max: { starter: 1_000_000, business: 20_000_000, enterprise: 100_000_000 }
storage_gb: { starter: 100,   business: 1000,    enterprise: 10000 }

12) Pre-sale checklist

Single source of truth 'tenant _ id'; everywhere is thrown and logged.
RLS/ACL enabled at DB level + service check (double lock).
Encryption keys per tenant, crypto-shredding documented.
Limits/quotas at the border and inside; tested bursts and "burst."
Fair-queuing and/or dedicated VIP workers; caps на concurrency.
Per-tenant SLOs and alerts; dashboards by segment.
Usage-collection is idempotent; billing rollup verified.
DR/incidents are localized to the tenant; fusing by 'tenant _ id' works.
Cash/logs are divided by tenant; PII masked.
Migration/backup/export procedures are tenant-based.

13) Typical errors

RLS disabled/bypassed by the "service" user - risk of leakage.
Single global limiter → "noisy neighbor" and SLO violation.
Shared caches/queues without prefixes → data intersection.
Billing counts by logs that are lost at peaks.
Lack of tenant fusion - cascade falls.
Migrations "in one fell swoop" without the ability to stop the problematic 'tenant _ id'.

14) Quick strategy selection

Regulated/VIP: Silo data (per-DB), dedicated workers, strict quotas and residency.
Mass SaaS: Shared-schema + RLS, strong limits at the border, fair-queuing inside.
Load "noisy/pulsating": large 'burst' + hard concurrency-caps, backpressure and priorities according to plans.

Conclusion

Isolation of tenants and limits are about boundaries and justice. Clear 'tenant _ id' through the stack, RLS and encryption on data, limiting and quotas at the border and in the core, fair scheduler, observability and localization of incidents - all this together gives security, predictable quality and transparent billing for each tenant, even with aggressive platform growth.

Tenant isolation and limits

Objectives

Where to apply

Policies

Sample profiles (concepts)

Computing

Conclusion

Get in Touch

Quick Contact

The video will be updated soon

We are currently very busy with projects