MongoDB and flexible data schemas

(Section: Technology and Infrastructure)

Brief Summary

MongoDB is a document-oriented storage with flexible circuits (BSON), fast inserts, horizontal scaling and powerful Aggregation Pipeline. In iGaming, it is great for player profiles, flexible CRM cards, event logs, telemetry, materialized stream projections, game catalogs, and cached front views. For monetary invariants (wallets/ledger), the SQL/CP contour is more often left; MongoDB is appropriate as a read-model and high-performance document storage.

Where MongoDB makes the most of iGaming

Player profiles and settings: structure variables (locale settings, preferences, KYC metadata).
Content/games/provider catalogs: quick card reading, filters, tagging, full text.
Events/telemetry/logs: high TPS, time windows, TTL storage.
Materialized Views (CQRS): quick screens (leaderboards, recent actions, aggregates).
Personalization/features online ML: KV patterns in collections, short TTL.

Principles of a flexible scheme: discipline instead of chaos

MongoDB is not "without a schema" - the schema lives in code and validation.

Recommended:

1. Scheme as contract: JSON Schema Validation in collections.

2. Versioning documents with'schemaVersion 'field.

3. Strict mandatory fields (id, search keys), "tail" of rare attributes - optional.

4. Constrains array dimensions and nesting (for indexes and RAM).

5. Migrations in the background: updates by 'schemaVersion', shedulers, back fills.

Example: JSON Schema Validation

js db. createCollection("player_profiles", {
validator: {
$jsonSchema: {
bsonType: "object",
required: ["playerId", "createdAt", "schemaVersion"],
properties: {
playerId: { bsonType: "string" },
createdAt: { bsonType: "date" },
schemaVersion: { bsonType: "int", minimum: 1 },
locale: { bsonType: "string" },
kyc: {
bsonType: "object",
properties: {
status: { enum: ["pending", "verified", "rejected"] },
doc: { bsonType: "object" }
}
}
}
}
}
});

Data model and document design

Design "on demand": 1 screen/endpoint = 1 document or a small set of documents.
Denormalization: Include small subdocuments attached (for example, mini-cards of game providers).

Links vs embedding:

Embedding - for closely related and infrequently updated fragments.
References ('ref') - for large size/frequent updates/reuse.
Size limit: document ≤ 16 MB; large binaries - GridFS/object storages.
Audit/metadata: 'createdAt', 'updatedAt', 'traceId', 'tenantId', 'idempotencyKey'.

Indexes: read quality and latency stability

Index types and practices:

B-Tree (primary)

Compound: The order of the fields corresponds to frequent predicates and sorts.
Prefix rule: prefix options work for '(tenantId, playerId, createdAt)'.
Sort: Consider 'sort' at the end of the index (for example, 'createdAt: -1').

js db. bets. createIndex(
{ tenantId: 1, playerId: 1, createdAt: -1 },
{ name: "idx_bets_tenant_player_created_desc" }
);

Partial / Sparse

Accelerate frequent subsets ('status: "pending"'), reduce size.

js db. withdrawals. createIndex(
{ playerId: 1, createdAt: -1 },
{ partialFilterExpression: { status: "pending" } }
);

TTL

For telemetry/logs/temporary features - automatic expiration.

js db. events. createIndex({ expireAt: 1 }, { expireAfterSeconds: 0 });

Text/autocomplete

'text'for full text (language restrictions); for auto-completion - 'n-gram '/trigram through fields and regex approaches or Atlas Search.

Index antipatterns

The "all" index → a drop in write speed.
Low cardinality without partial → low selectivity.
Duplicate compounds.
Index fields inside giant arrays without limits.

Aggregation Pipeline: Quick Screens and Reports

Use '$ match' → '$ sort' → '$ limit' as early stages; project indexes under '$ match/$ sort'.
'$ lookup' for controlled joynes (soft, in reasonable volumes).
'$ facet' for multiple metrics; '$ unionWith' - merge collections.
'$ merge '/' $ out' - materialize results in the collection (read-models).

Example: last player bets + vault:

js db. bets. aggregate([
{ $match: { tenantId: "eu-1", playerId: "p123" } },
{ $sort: { createdAt: -1 } },
{ $limit: 100 },
{ $group: {
_id: "$playerId",
lastBets: { $push: { amount: "$amount", ts: "$createdAt", game: "$gameId" } },
totalAmount: { $sum: "$amount" }
} }
]);

Transactions, consistency and idempotency

Single-document atomic - free atomicity; complex invariants - think document partitioning.
Multi-document transactions (ACID) - available with replica sets, but more expensive in latency; apply pointwise.

Write Concern / Read Concern:

'w: "majority" 'for critical records (latency cost);
'readConcern: "majority" 'for consistent reading.
Idempotency: unique keys on 'idempotencyKey '/' pspTx', UPSERT operations ('$ setOnInsert', '$ inc').

Example UPSERT:

js db. wallet. updateOne(
{ playerId: "p123" },
{ $inc: { balanceCents: -5000 }, $set: { updatedAt: new Date() } },
{ upsert: true, writeConcern: { w: "majority" } }
);

💡 For monetary invariants, SQL is usually preferred. In Mongo - only with strict key/idempotence discipline and limited transactions.

Sharding and key selection

MongoDB shards by shard key. The choice is critical:

Load distribution: high cardinality key and uniform distribution (for example, '(tenantId, playerId)').
Avoid monotony: 'createdAt' as the only key → "hot" shard.

Ranges vs hash:

Hashed - distributes records more evenly.
Ranged is better for range queries, but watch for hot tails.
Tag ranges for regulation/localization (EU/LatAm/TR).

Example:

js sh. enableSharding("igaming");
db. bets. createIndex({ tenantId: 1, playerId: 1, _id: "hashed" });
sh. shardCollection("igaming. bets", { tenantId: 1, playerId: 1, _id: "hashed" });

Antipatterns:

Shard-key by low cardinality ('status') - skew of shards.
Frequent '$ lookup' between shardy collections without co-sharding one key at a time.
Changeable shard key (difficult and expensive to change).

Replica sets, reads, and read-after-write policy

Replica set = HA and transaction basis.

Read Preference:

'primary'for critical read-after-write;
'primaryPreferred '/' secondary '- for analytics/non-critical.
Read/Write concern coordinate with SLO and latency budget.

Change Streams, CDC and Integrations

Change Streams: subscription to inserts/updates/deletes - convenient for:

Cache layer synchronization (Redis)
CRM triggers/notifications,
downloads to OLAP (ClickHouse/Pinot),
reactive screens.
Outbox pattern: for critical domains, publish events to a separate collection, which the connector then reads and translates to the bus (Kafka). This increases the predictability of integrations.

Observability and SLO

SLO: p99 card reading ≤ 10-20 ms; inserting events ≤ 20-40 ms; leutency difference between shards within X%; availability ≥ 99. 9%.
Metrics: op-latency, queue depth,% of umps per secondary, cache/WT statistics, page faults, lock-waits, number of open cursors/connections.
Profiling: 'system. profile ',' explain ("executionStats") ', collection/index locks.
Alerts: growth of WT cache pressure, slow operations, growth of requests not included in the index, backlog of secondary requests, chunk migrations/balancer.

Performance and tuning

WiredTiger Cache: by default ~ 50% RAM - validate for profile.
Compression: snappy/zstd for collections, zstd for logs - CPU/IO balance.
Batch inserts and bulkWrite for telemetry.
Projection ('{field: 1}') so as not to drag "thick" documents.
Limit/Skip: Avoid large 'skip' → use cursor/marker pagination ('createdAt/_ id').
Capped collections for "ring" logs.

Safety and compliance

Auth/RBAC: roles on the collection/database, minimum required privileges.
TLS in transit, encryption on disk (FLE/at-rest).
PII policies: masking/aliasing, separate collections for sensitive fields.
Multi-tenancy: prefixes/individual databases/collections, filters by 'tenantId', you can RLS-like layers in the application.
Audit: Enable auditing of operations on critical collections.

Backups, PITR and DR

Snapshots of volumes + oplog backups for Point-in-Time Recovery.
Replica set in another region for DR; regular recovery exercises.
Control of oplog growth for insertion peaks (PSP webhooks/tournaments).
In shard clusters - consistent backups with a config server.

Integration with the rest of the architecture

CQRS: teams hit SQL (money), events → Materialized Views in MongoDB.
Event-Streaming: Kafka/Pulsar as a bus, Mongo - sink/source via connectors and Change Streams.
Redis: nearby as an ultra-low latency layer (caches/counters).
OLAP: upload to ClickHouse/Pinot for long scans and BI.

Implementation checklist

1. Fix domains: what goes in Mongo (flexible/high TPS/projection), what remains in SQL.
2. Define schema contracts: JSON Schema Validation, 'schemaVersion'.
3. Design indexes for real queries; add TTL for noisy data.
4. Select shard key (high cardinality, uniformity); if necessary - zone-sharding.
5. Set up a replica set, Read/Write Concern for SLO; read-after-write policy.
6. Enable observability and profiling, alerts to/WT cache/oplog indexes.
7. Organize backups + PITR, DR cluster and regular exercises.
8. Connect Change Streams/Outbox to synchronize caches and buses.
9. Limit document size and nesting; Implement pagination by cursor.
10. Separate policies for PII/tenants, encryption, audit.

Anti-patterns

"No scheme" in the product: lack of validation and versions → chaos.
Shard key in time/monotonous - hot shard and unstable p99.
Joynes' $ lookup'on huge sets without indexes/pagination.
Use transactions everywhere - lost productivity.
Lack of TTL/retentions for logs → growth in volume and cost.
Store critical monetary invariants only in Mongo without strict idempotency.

Summary

MongoDB is a powerful tool for flexible iGaming domains: profiles, directories, telemetry, projections and personalization. The key to success is a contract scheme and validation, thoughtful indexing, a well-chosen shard key, conscious Read/Write Concern, Change Streams for integrations and strict operational discipline (observability, backups, DR). Combined with the SQL core and streaming bus, this gives the platform fast interfaces and stability for tournament peaks.