Time synchronization

Why do you need it?

A single and accurate time is the basis for organizing events, correctly correlating logs/trails, signing transactions and reproducibility of reporting. For platforms with cash flows, it is a matter of compliance and trust: "who was first," "when the result was recorded," "which seed was used."

Basic concepts

UTC vs TAI: UTC contains leap second inserts; TAI - without them. MOST systems operate at UTC.
Leap second: second insert/delete. Support/mitigation (smear) is critical for seamless operation.
Stratum (NTP): level of distance from the standard (0 - atom/GNSS, 1 - servers, 2 + - clients).
PTP роли: Grandmaster (GM) → Boundary Clock (BC) / Transparent Clock (TC) → Slave.
PPS: pulse-per-second for precise alignment from GNSS/generator.
Servo: algorithm that corrects the frequency/phase of local clocks (chrony/ptp4l/phc2sys).

When NTP when PTP

NTP (Chrony): millisecond/hundredth millisecond accuracy; WAN/Internet; simple and reliable.
PTP (IEEE 1588): sub-millisecond and up to microseconds with hardware mark; requires network discipline (L2/multicast/QoS).
Hybrid: NTP/Chrony feeds reference to PTP-GM; further into the data center - PTP with HW-timestamp.

Time sources and resilience

GNSS (GPS/GLONASS/Galileo/BeiDou) + PPS as primary reference.
OCXO/TCXO (generators) for holdover when satellites are lost.
Backup references: two independent GNSS receivers, different antennas/cables, jamming barriers.
Secondary NTP pools: external trusted providers and private servers (via VPN).
Grandmaster x2 with BMC (Best Master Clock) and manual failover plan.

PTP network architecture

Profiles: Default, Telecom (G.8275. x), Power. For data centers, Default or vendor profiles are more common.
Transparent Clock (TC) - the switch adds a correction field - improves accuracy.
Boundary Clock (BC): switch/router - client to the highest and master to the lower segment.
QoS: PTP multicast/unicast prioritization, queue minimization.
Isolation: dedicated VLAN/VRF for time; no L3-NAT on the PTP path.

Security: NTS for NTP, PTP protection

NTP: use NTS (Network Time Security, RFC 8915) - TLS authentication of time servers. Symmetric keys (classic auth) are allowed inside the perimeter. Autokey is obsolete.
PTP: native MAC/authentication is hardly used; compensate with network isolation, ACL, MACsec/IPsec on the L2/L3.
GNSS: jamming/spoofing protection - signal quality monitor, DOP surveillance, geo-filters, anomaly detection.

Leap second treatment and lubrication

Leap-announce: NTP/Chrony announces the upcoming insert of the second.
Smear: day stretch on ± 0. 5 s (or other window), avoiding the step. Google-like smear is convenient for abandoning the "jump," but all services must follow a single policy (or isolate contours).

SLO for time (examples)

Offset p95 client ↔ reference ≤ 1. 0 ms (data center NTP circuit), p99 ≤ 5 ms.
PTP with HW-timestamp: offset p95 ≤ 20 μ s, p99 ≤ 100 μ s inside the domain.
Jitter (stddev) ≤ 0. 2 ms (NTP) / ≤ 5 μs (PTP-HW).
Clock step events = 0; only slew (smooth correction) in the production class.
Drift at holdover OCXO: ≤ 1 ppm (control and alert).

Engineering Practices (NTP/Chrony)

Why Chrony: converges better on a "noisy" network, resistant to packet loss/asymmetry, flexible NTS.

Minimal'chrony. conf '(server):

conf
Sources (top-level servers)
server ntp1. example iburst nts server ntp2. example iburst nts
Local GNSS with PPS (if any)
refclock SHM 0 poll 4 refid GNSS refclock PPS /dev/pps0 poll 4 refid PPS lock GNSS
Access restrictions allow 10. 0. 0. 0/8 deny all
makestep adjustment policy 0. 1 3 rtcsync log tracking measurements statistics

Verification and monitoring:

bash chronyc tracking chronyc sources -v chronyc sourcestats -v

Clients: specify at least two servers; include 'makestep' for an early start and 'maxslewrate' as needed.

Engineering Practices (PTP/linuxptp)

Hardware timestamp (HW-TS): Requires NIC/drivers with PHC (PHC = PTP Hardware Clock).

Check:

bash ethtool -T eth0      grep timestamp phc2sys -l

ptp4l (slave/GM/BC) - an example of a config:

conf
[global]
twoStepFlag      1 time_stamping     hardware tx_timestamp_timeout 30 logging_level     6 clock_class      248 clock_accuracy    0x20 priority1       128 priority2       128 delay_mechanism    E2E network_transport   L2 dsptp_domain     0

[eth0]
delay_filter     moving_average delay_filter_length  10 announceReceiptTimeout 3 syncReceiptTimeout   3

PHC bundle → system clock:

bash
PHC NIC -> system clock (slew)
phc2sys -s /dev/ptp0 -c CLOCK_REALTIME -O 0 -E ntpshm -w

For Boundary/Transparent clocks: use firmware/images of BC/TC-enabled switches and enable their profiles; monitor correction field in pmc:

bash pmc -u -b 0 "GET TIME_STATUS_NP"

Kubernetes, Virtualization and Containers

Nodes are K8s synchronized like regular hosts. Containers use host time.
For PTP: PTP Operator/DaemonSet (for example, 'linuxptp-daemonset') on dedicated nodes with HW-TS; 'NodeFeatureDiscovery' for marking NIC with PHC.
Workload isolation with time sensitivity (RNG/game events): tains/tolerations → nodes with better synchronization.
In virtualization, disable the aggressive "virtual" drift proofreaders of the hypervisor, use one discipline of time (either guest NTP/PTP or from the hypervisor).

Network and QoS

Separate time-VLAN/VRF, keep delays and jitter minimal.
For PTP E2E - avoid pathway asymmetries; for P2P - use link-local delay.
Enable jumbo MTU end-to-end only if agreed everywhere; otherwise, a standard MTU, but a stable queue.
Route NTP over UDP/123, allow NTS-TLS ports; for PTP, the correct multicast ACLs (224. 0. 1. 129/130).

Monitoring and alerts

What to measure:

Offset, jitter, frequency drift, corrections/sec
Для PTP: `offsetFromMaster`, `meanPathDelay`, `grandmasterIdentity`, `stepsRemoved`.
For GNSS: SNR, DOP, visible satellites, PPS jitter.

Toolbox:

'chrony'export to Prometheus (chrony-exporter), text logs → Loki.
'linuxptp'statistics (' ptp4l -m '), metrics via node-exporter textfile.
Network counters: drops/retransmit/queue-len on time-VLAN.

Alerts (ideas):

NTP offset p95> 1 ms for 5 min.
PTP offsetFromMaster > 25 μs (p95) 5 мин.
Loss of GNSS/PPS> 1 min (switch to holdover).
Grandmaster change (BMC) outside the planned window.
RTC ↔ system clock> boot threshold difference.

Operations and Updates

Start/Stop - first restore the network/GNSS/PPS → GM → BC/TC → clients.
Leap-second: announce in advance, check smear policy and compatibility.
Updates: firmware NIC/switches, 'linuxptp/chrony' - staged with offset control.
Runbooks: loss of GNSS, GM replacement, PTP domain relocation, cluster misalignment, VLAN crashes.

Implementation checklist

SLOs (offset/jitter) for services and logs are defined.
Two Independent Time Sources (GNSS + NTP), two GM, IUD/Manual Feilover Plan.
Dedicated time-VLAN/VRF, QoS, ACL/MACsec; BC/TC PTPs are enabled.
Everywhere a single leap policy (smear/step is prohibited in the sale).
Chrony с NTS; ptp4l/phc2sys - on nodes with PHC, settings servv.
Monitoring of offset/jitter/GM/GNSS losses, alerts and dashboards.
Runbooks: loss of GNSS, GM failover, leap-second, drift-hunt.
Audit documentation - sources, configs, SLO reports, GM shift log.

Common errors

One unprotected time server; mixing public pools and private pools without control.
PTP via "noisy" L3 routes/asymmetry, no BC/TC.
No NTS/Isolation - NTP spoofing/PTP spoofing capability.
Different leap policies in subsystems → a "crack" in time between services.
Ignore monitoring drift/holdover, sudden step corrections.
Dual discipline virtual machines (host + guest) → discrepancies.

iGaming/fintech specific

Legally significant time stamps: store offsets and synchronization statuses in transaction/event logs (to prove validity).

Event order: The cross-service correlator uses monotonic logical clocks + UTC labels, not just "walls."

Tournaments/matches: fix start/stop via single source of time (PTP-domain/NTP-server), TTL-cache on the fronts, offset check before the "whistle."

RNG/seed initialization: initialize from crypto sources, and use time only as a component, checking offset within SLO.
Reporting/regulators: periodic time SLO reports and GM/source shift log.

Mini playbooks

1) Fast cluster time audit

1. 'chronyc tracking' on each node → collect offset/jitter.
2. 'ptp4l -m '/' pmc' on PTP nodes → check GM, delay, stepsRemoved.
3. Verify leap policy, make sure of uniformity.

2) Loss of GNSS

1. Go to holdover (OCXO) alert.
2. Connect an external NTP over VPN as a temporary reference.
3. Check antenna/cable/receiver; replacement plan.

3) Grandmaster change

1. Check priority BMC; manually raising the second GM.
2. Offset control at aircraft/clients; if necessary, restart phc2sys.
3. Time series offset incident report.

Result

A reliable time loop is a stable reference (GNSS + PPS + OCXO), a correct PTP network architecture (BC/TC/QoS/isolation), secure NTP with NTS, consistent leap policy, slew correction discipline, and SLO observability (offset/jitter/holdover). Record everything in runbooks, regularly check offsets and learn from exercises - and your time will remain accurate even when everything else "trembles."