Time synchronization
Why do you need it?
A single and accurate time is the basis for organizing events, correctly correlating logs/trails, signing transactions and reproducibility of reporting. For platforms with cash flows, it is a matter of compliance and trust: "who was first," "when the result was recorded," "which seed was used."
Basic concepts
UTC vs TAI: UTC contains leap second inserts; TAI - without them. MOST systems operate at UTC.
Leap second: second insert/delete. Support/mitigation (smear) is critical for seamless operation.
Stratum (NTP): level of distance from the standard (0 - atom/GNSS, 1 - servers, 2 + - clients).
PTP роли: Grandmaster (GM) → Boundary Clock (BC) / Transparent Clock (TC) → Slave.
PPS: pulse-per-second for precise alignment from GNSS/generator.
Servo: algorithm that corrects the frequency/phase of local clocks (chrony/ptp4l/phc2sys).
When NTP when PTP
NTP (Chrony): millisecond/hundredth millisecond accuracy; WAN/Internet; simple and reliable.
PTP (IEEE 1588): sub-millisecond and up to microseconds with hardware mark; requires network discipline (L2/multicast/QoS).
Hybrid: NTP/Chrony feeds reference to PTP-GM; further into the data center - PTP with HW-timestamp.
Time sources and resilience
GNSS (GPS/GLONASS/Galileo/BeiDou) + PPS as primary reference.
OCXO/TCXO (generators) for holdover when satellites are lost.
Backup references: two independent GNSS receivers, different antennas/cables, jamming barriers.
Secondary NTP pools: external trusted providers and private servers (via VPN).
Grandmaster x2 with BMC (Best Master Clock) and manual failover plan.
PTP network architecture
Profiles: Default, Telecom (G.8275. x), Power. For data centers, Default or vendor profiles are more common.
Transparent Clock (TC) - the switch adds a correction field - improves accuracy.
Boundary Clock (BC): switch/router - client to the highest and master to the lower segment.
QoS: PTP multicast/unicast prioritization, queue minimization.
Isolation: dedicated VLAN/VRF for time; no L3-NAT on the PTP path.
Security: NTS for NTP, PTP protection
NTP: use NTS (Network Time Security, RFC 8915) - TLS authentication of time servers. Symmetric keys (classic auth) are allowed inside the perimeter. Autokey is obsolete.
PTP: native MAC/authentication is hardly used; compensate with network isolation, ACL, MACsec/IPsec on the L2/L3.
GNSS: jamming/spoofing protection - signal quality monitor, DOP surveillance, geo-filters, anomaly detection.
Leap second treatment and lubrication
Leap-announce: NTP/Chrony announces the upcoming insert of the second.
Smear: day stretch on ± 0. 5 s (or other window), avoiding the step. Google-like smear is convenient for abandoning the "jump," but all services must follow a single policy (or isolate contours).
SLO for time (examples)
Offset p95 client ↔ reference ≤ 1. 0 ms (data center NTP circuit), p99 ≤ 5 ms.
PTP with HW-timestamp: offset p95 ≤ 20 μ s, p99 ≤ 100 μ s inside the domain.
Jitter (stddev) ≤ 0. 2 ms (NTP) / ≤ 5 μs (PTP-HW).
Clock step events = 0; only slew (smooth correction) in the production class.
Drift at holdover OCXO: ≤ 1 ppm (control and alert).
Engineering Practices (NTP/Chrony)
Why Chrony: converges better on a "noisy" network, resistant to packet loss/asymmetry, flexible NTS.
Minimal'chrony. conf '(server):conf
Sources (top-level servers)
server ntp1. example iburst nts server ntp2. example iburst nts
Local GNSS with PPS (if any)
refclock SHM 0 poll 4 refid GNSS refclock PPS /dev/pps0 poll 4 refid PPS lock GNSS
Access restrictions allow 10. 0. 0. 0/8 deny all
makestep adjustment policy 0. 1 3 rtcsync log tracking measurements statistics
Verification and monitoring:
bash chronyc tracking chronyc sources -v chronyc sourcestats -v
Clients: specify at least two servers; include 'makestep' for an early start and 'maxslewrate' as needed.
Engineering Practices (PTP/linuxptp)
Hardware timestamp (HW-TS): Requires NIC/drivers with PHC (PHC = PTP Hardware Clock).
Check:bash ethtool -T eth0 grep timestamp phc2sys -l
ptp4l (slave/GM/BC) - an example of a config:
conf
[global]
twoStepFlag 1 time_stamping hardware tx_timestamp_timeout 30 logging_level 6 clock_class 248 clock_accuracy 0x20 priority1 128 priority2 128 delay_mechanism E2E network_transport L2 dsptp_domain 0
[eth0]
delay_filter moving_average delay_filter_length 10 announceReceiptTimeout 3 syncReceiptTimeout 3
PHC bundle → system clock:
bash
PHC NIC -> system clock (slew)
phc2sys -s /dev/ptp0 -c CLOCK_REALTIME -O 0 -E ntpshm -w
For Boundary/Transparent clocks: use firmware/images of BC/TC-enabled switches and enable their profiles; monitor correction field in pmc:
bash pmc -u -b 0 "GET TIME_STATUS_NP"
Kubernetes, Virtualization and Containers
Nodes are K8s synchronized like regular hosts. Containers use host time.
For PTP: PTP Operator/DaemonSet (for example, 'linuxptp-daemonset') on dedicated nodes with HW-TS; 'NodeFeatureDiscovery' for marking NIC with PHC.
Workload isolation with time sensitivity (RNG/game events): tains/tolerations → nodes with better synchronization.
In virtualization, disable the aggressive "virtual" drift proofreaders of the hypervisor, use one discipline of time (either guest NTP/PTP or from the hypervisor).
Network and QoS
Separate time-VLAN/VRF, keep delays and jitter minimal.
For PTP E2E - avoid pathway asymmetries; for P2P - use link-local delay.
Enable jumbo MTU end-to-end only if agreed everywhere; otherwise, a standard MTU, but a stable queue.
Route NTP over UDP/123, allow NTS-TLS ports; for PTP, the correct multicast ACLs (224. 0. 1. 129/130).
Monitoring and alerts
What to measure:- Offset, jitter, frequency drift, corrections/sec
- Для PTP: `offsetFromMaster`, `meanPathDelay`, `grandmasterIdentity`, `stepsRemoved`.
- For GNSS: SNR, DOP, visible satellites, PPS jitter.
- 'chrony'export to Prometheus (chrony-exporter), text logs → Loki.
- 'linuxptp'statistics (' ptp4l -m '), metrics via node-exporter textfile.
- Network counters: drops/retransmit/queue-len on time-VLAN.
- NTP offset p95> 1 ms for 5 min.
- PTP offsetFromMaster > 25 μs (p95) 5 мин.
- Loss of GNSS/PPS> 1 min (switch to holdover).
- Grandmaster change (BMC) outside the planned window.
- RTC ↔ system clock> boot threshold difference.
Operations and Updates
Start/Stop - first restore the network/GNSS/PPS → GM → BC/TC → clients.
Leap-second: announce in advance, check smear policy and compatibility.
Updates: firmware NIC/switches, 'linuxptp/chrony' - staged with offset control.
Runbooks: loss of GNSS, GM replacement, PTP domain relocation, cluster misalignment, VLAN crashes.
Implementation checklist
- SLOs (offset/jitter) for services and logs are defined.
- Two Independent Time Sources (GNSS + NTP), two GM, IUD/Manual Feilover Plan.
- Dedicated time-VLAN/VRF, QoS, ACL/MACsec; BC/TC PTPs are enabled.
- Everywhere a single leap policy (smear/step is prohibited in the sale).
- Chrony с NTS; ptp4l/phc2sys - on nodes with PHC, settings servv.
- Monitoring of offset/jitter/GM/GNSS losses, alerts and dashboards.
- Runbooks: loss of GNSS, GM failover, leap-second, drift-hunt.
- Audit documentation - sources, configs, SLO reports, GM shift log.
Common errors
One unprotected time server; mixing public pools and private pools without control.
PTP via "noisy" L3 routes/asymmetry, no BC/TC.
No NTS/Isolation - NTP spoofing/PTP spoofing capability.
Different leap policies in subsystems → a "crack" in time between services.
Ignore monitoring drift/holdover, sudden step corrections.
Dual discipline virtual machines (host + guest) → discrepancies.
iGaming/fintech specific
Legally significant time stamps: store offsets and synchronization statuses in transaction/event logs (to prove validity).
Event order: The cross-service correlator uses monotonic logical clocks + UTC labels, not just "walls."
Tournaments/matches: fix start/stop via single source of time (PTP-domain/NTP-server), TTL-cache on the fronts, offset check before the "whistle."
RNG/seed initialization: initialize from crypto sources, and use time only as a component, checking offset within SLO.
Reporting/regulators: periodic time SLO reports and GM/source shift log.
Mini playbooks
1) Fast cluster time audit
1. 'chronyc tracking' on each node → collect offset/jitter.
2. 'ptp4l -m '/' pmc' on PTP nodes → check GM, delay, stepsRemoved.
3. Verify leap policy, make sure of uniformity.
2) Loss of GNSS
1. Go to holdover (OCXO) alert.
2. Connect an external NTP over VPN as a temporary reference.
3. Check antenna/cable/receiver; replacement plan.
3) Grandmaster change
1. Check priority BMC; manually raising the second GM.
2. Offset control at aircraft/clients; if necessary, restart phc2sys.
3. Time series offset incident report.
Result
A reliable time loop is a stable reference (GNSS + PPS + OCXO), a correct PTP network architecture (BC/TC/QoS/isolation), secure NTP with NTS, consistent leap policy, slew correction discipline, and SLO observability (offset/jitter/holdover). Record everything in runbooks, regularly check offsets and learn from exercises - and your time will remain accurate even when everything else "trembles."