Storage and NAS
Brief Summary
Storage is a combination of media (SSD/NVMe/HDD), networks (Ethernet/IB/FC), protocols (NFS/SMB/iSCSI/S3) and software (ZFS, Ceph, NetApp, TrueNAS, etc.) that provides performance, reliability, security and economics. The correct choice depends on the I/O profile (random/sequential, block/file/object), RPO/RTO and SLO latency/throughput requirements.
Storage taxonomy
DAS (Direct Attached Storage) - disks "near the server." Minimal latency, no network bottlenecks, but harder to share resources.
SAN (Storage Area Network) - block storage via FC/iSCSI/NVMe-oF. High performance, shared LUNs, centralized management.
NAS (Network Attached Storage) - file balls over NFS/SMB. Convenient for shared directories, logs, artifacts, media content.
Object storage - S3-compatible APIs (Ceph RGW/MinIO/clouds). For backups, logs, archives, media, model artifacts.
Hyperconverged solutions (HCI) - Combine computation and storage (Ceph, vSAN, StarWind, etc.) for horizontal scale.
Access protocols
File:- NFSv3/v4 - Unix/Posix environment, statefull locks, Kerberos/KRB5i/KRB5p.
- SMB 3. x - Windows/AD domains, encryption/signatures, multichannel, DFS.
- iSCSI - over Ethernet, LUN, many paths (MPIO), convenient for virtualization/database.
- FC/NVMe-oF - low latency, specialized factories/maps.
- S3 API - object versions, lifecycle, WORM/Compliance mode, multipart upload.
- DB/virtual machines → block (iSCSI/NVMe-oF).
- Shared Folders/CI Artifacts → NFS/SMB.
- Logs/backups/media/models → S3-compatible object.
Data and coding: RAID, ZFS, Erasure Coding
RAID
RAID1/10 - low latency and high IOPS/random-read/write.
RAID5/6 - capacity savings, but write penalty.
ZFS - copy-to-write (CoW), pools and vdev, ARC/L2ARC cache, ZIL/SLOG for sync operations, snapshots/replica and built-in integrity (checksums).
Erasure Coding (EC) in Distributed Systems (Ceph/MinIO): Reed-Solomon code 'k + m' - savings over 3x replication with acceptable write performance degradation.
- Hot random loads (metadata, small files) → RAID10/ZFS mirrors on NVMe.
- EC cold/archive →, large HDDs, aggressive caches.
- For sync records (NFS export) - dedicated SLOG on reliable low-patent NVMe (PLP).
Performance: IOPS, throughput, latency
IOPS are important for small random operations (DB/metadata).
Bandwidth - for large files (videos, backups).
Latency p95/p99 - critical for databases, queues, cache APIs.
Queues and concurrency: multithreading on the client, 'rsize/wsize' for NFS, 'rw, queue _ depth' for iSCSI.
Network: 25/40/100 GbE (or IB) + RSS/RPS, jumbo MTU inside the data center.
Caching and ticking
ARC/L2ARC (ZFS) - RAM and NVMe read cache; SLOG - history of sync records.
Write-back/Write-through controller caches - careful, battery/supercapacitor only (BBU/PLP).
Tiering: NVMe (hot) → SSD (warm) → HDD (cold) → object (archive). Migration policies and lifecycle.
Snapshots, clones, replication, DR
Snapshots (CoW): instant points for rollback/backup; store directories "inside" the storage, not just in the hypervisor.
Replication: synchronous (RPO≈0, above latency), asynchronous (RPO = N min).
Clones: economical dev/test environments.
DR-schemes: 3-2-1 (three copies, two types of carriers, one - off-site), regular DR-exercises; RPO/RTO objectives.
Safety, compliance and multi-tenancy
Authentication/authorization: LDAP/AD, Kerberos/NTLMv2 for SMB, AUTH_SYS/KRB for NFSv4.
Isolation: VLAN/VRF, export-policies, tenant-quotas/quotas.
Encryption: at rest (LUKS/ZFS native/SED) and in flight (NFS-krb5p/SMB encryption/TLS for S3).
WORM/Compliance for legally significant data (S3 Object Lock, SnapLock analogues).
Audit: immutable access logs, integration with SIEM.
Monitoring and SLO
Metrics:- By pools/volumes: usage, 'p95/p99 latency', IOPS, throughput, queue depth, cache hit, write amplification.
- Over the network: drops, retransmitts, PPS, MTU mismatch.
- By media: media errors, wear-level, temperature, SMART.
- By replication/snapshots: lag/age, task success, duration.
- NFS ball for CI: p95 latency ≤ 3 ms, availability ≥ 99. 95%.
- Database LUN: p99 write ≤ 1. 5 ms, synchronous replica within the region; RPO 0, RTO ≤ 5 min.
- Object: p95 PUT ≤ 50 ms, p95 GET ≤ 30 ms, 11 × 9 durability (via EU/replication).
- Pool filling> 80/90/95%, cache hit drop, write-amp growth, disk degradation, network drawdown, replication log> threshold.
Backup and Archive
Snapshots + remote replica + separate backup to object/tape.
Retention policies: day/week/month.
Immutability: S3 Object Lock (Governance/Compliance), "air-gap" (feed/disabled accounts).
Recovery tests - regularly.
Practical templates (minimal)
Exporting NFS (example)
/pool/projects 10. 0. 0. 0/16(rw,async,no_root_squash,sec=krb5p)
SMB share (smb fragment. conf)
[media]
path = /pool/media read only = no vfs objects = acl_xattr, recycle ea support = yes kernel oplocks = no smb encrypt = required
ZFS: creating pool and dataset
bash zpool create tank mirror nvme0n1 nvme1n1 zfs set atime=off compression=lz4 tank zfs create tank/projects zfs set recordsize=1M tank/projects # большие файлы zfs set recordsize=16K tank/db # БД/мелкие I/O
iSCSI (ideas)
Enable ALUA/MPIO, correct timeouts, and queue depth on clients.
Spread iSCSI networks and client traffic, use Jumbo MTU inside the hundred-yard factory.
Capacity and performance planning
Working set and growth rate.
Margin for IOPS and throughput of 30-50% for peaks and rebalance.
Consider write amplification (RAID/EC/CoW) and metadata.
For the object - the cost of requests and outgoing traffic, storage classes (standard/IA/glacier-like).
Operations and Updates
Rolling updates of controllers/OS/firmware.
Scrubs/Resilver windows and priorities.
Rebild balancing: limiting I/O on recovery so as not to "kill" the prod.
Runbooks on degradation/loss of nodes/networks.
Implementation checklist
- File/Block/Object + RPO/RTO/SLO access profile selected.
- Load and price coding scheme (RAID/ZFS/EC).
- Networks: individual VLANs/VRFs, intra-fabric MTU 9000, MPIO/ALUA.
- Cache/ticking: ARC/L2ARC/SLOG or similar mechanisms.
- Snapshots/replica/backup: schedules, immutability, DR exercises.
- Monitoring: pools/media/network/replication metrics, alerts.
- Access/security: ACL, Kerberos/AD, encryption, auditing.
- Quotas/limits for tenants and SLA/SLO directories.
- Documentation and runbooks, test recovery.
Common errors
Pool overflow> 80% in ZFS/EC systems → a sharp increase in latency.
One controller/one network without MPIO/protection.
Hot and cold working sets are mixed in one class of carriers.
No SLOG for NFS sync loads → unpredictable latency.
Backups only "inside" the same array/account → loss in case of an accident/compromise.
Lack of regular scrub and SMART monitoring.
Ignoring small I/O patterns: large'recordsize 'for DB.
iGaming/fintech specific
Transactional databases and wallets: individual NVMe pools, RAID10/ZFS mirrors, synchronous replica to zone B, independent sealed.
Logs/raw events and anti-fraud features: object + lifecycle + cheap classes, indexes/showcases - on SSD.
Content and media (providers): NAS + CDN, aggressive cache, deduplication.
Reporting and PII: WORM/immutability, encryption, access auditing, geo-localization of data.
Peak events: warm-up caches, I/O limits, p99 latency control on pools.
Total
Reliable storage is the correct class partitioning (file/block/object), adequate coding scheme (RAID/ZFS/EC), fast network, cache/tying, snapshots + replica + backup, hard SLOs and automated operations. By following these principles, you get predictable performance, high resiliency, and a transparent storage economy - with security and regulatory considerations.