Quantcast
Channel: Weberblog.net
Viewing all articles
Browse latest Browse all 340

Azure PTP Accuracy

$
0
0

The Network Time Protocol (NTP) is widely used to synchronize computer clocks. The Precision Time Protocol (PTP) can be used as a time source as well, which is expected to be accurate within microseconds. However, at Microsoft Azure VMs, PTP-derived time-of-day errors could exceed 50,000 microseconds, which may be inadequate. Let’s go into some details:

(To improve readability and reduce distractions, this post shows generalisations, simplifications and omissions.)

Problem Description

PTP is supported on Azure Linux VMs, potentially providing time-of-day accuracy up to a few microseconds. Before continuing, I should first describe how many modern computers maintain time. It is not practical to read the PTP clock or to query an NTP reference whenever a program needs the time of day. Instead, a local oscillator is regularly calibrated to the reference time source. The local oscillator can be efficiently queried by applications using the clock_gettime(CLOCK_REALTIME) API call. A PTP query may be fifty times slower.

x86-based computers typically use the Time Stamp Counter (TSC), which increments at a roughly fixed rate as the basis for the system clock. The boot process determines an approximate clock frequency, and then time-keeping software such as ntpd, ntpsec or chrony is used to 1) track the changing clock frequency and 2) keep the time aligned with external sources such as NTP or PTP. The clock_gettime(CLOCK_REALTIME) system call uses the TSC value plus the current computed frequency and offset to return the time-of-day in Coordinated Universal Time (UTC). clock_gettime(CLOCK_REALTIME) is computationally cheaper than using NTP or PTP.

I recently began using commercial Microsoft Azure virtual machines for my studies. A test comparing NTP synchronization versus PTP synchronization was performed on 2025-03-30 using an Azure VM located in the Microsoft West US 2 region. Initially (side A below), the VM was synchronized via NTP to four Google NTP stratum 1 servers located nearby (RTT = 5 msec). At 16:43, the synchronization source was changed (side B below) to the Azure PTP device. The difference is striking:

Graph 1 shows the offset (difference in system clocks) between the Azure VM and a NIST NTP server located in Fort Collins, Colorado. The VM closely tracks NIST time when NTP synchronization is used (1A). When the VM uses PTP synchronization (1B) the offset from NIST is 20,000 to 35,000 microseconds. The offset and offset variance are unexpected from a PTP device. The periodic structure is unexplained.

Graph 2 shows the chrony-reported frequency adjustment. In 2A, chrony readily compensates for TSC frequency variance with an under-one-PPM range. 2B, on the other hand, shows an over 200 PPM range for the frequency variance. The same periodic structure mentioned above is also seen in 2B.

Graph 3 shows the PTP offset. In 3A the NTP-synchronized VM can still query the PTP clock. The offset shows the same structure seen in 1B; this is the PTP error. The PTP offset is naturally smaller when it is used to synchronize the local clock.

The above graphs show that the Azure PTP is not functioning as expected. For comparison, the PTP clocks available on AWS EC2 are stable within a few microseconds. The five-second and fifteen-minute patterns persist even if chrony is not running.

A larger data set is used in the three plots below. The VM is synchronized to Google NTP servers. Taking a one-month view (top), we see a roughly one-week structure in the offset. Using the same data, we zoom in (middle) and see a fifteen-minute structure. The final graph (bottom) shows a five-second structure in the PTP offset.

I created VMs in six Azure regions for PTP testing: West US 2, East US, Central US, North Central US, South Central and Germany West Central. Similar patterns were seen in all regions. Most testing was done with small VMs (1 CPU, 1GB).

Does it matter?

Ubuntu Linux was used for most of my testing. I installed a Red Hat Linux distribution on another VM at the Microsoft West US 2 location and saw similar timing offset patterns. The frequency of the five-second and fifteen-minute offsets was the same in both operating systems. The five-second offsets were out of phase by about 500 milliseconds. Whether the PTP problem affects non-Linux operating systems is unknown.

Exploring the hypervisor requires privileges beyond those available to VM administrators. PTP typically involves system software (e.g., chrony, ntpsec, ntpd) communicating with an Ethernet device supporting hardware time-stamping. The Azure PTP feature allows the VM to query the hypervisor clock. Poor hypervisor time synchronization could be an underlying problem.

These Azure VM inaccuracies aren’t subtle. Paul Gear reports “suspiciously regular spikes at 15-minute intervals” This may be related to figure 3B above.

Microsoft lists timing requirements, including “Government Regulations like: 50 msec accuracy for FINRA in the US, 1 msec ESMA (MiFID II) in the EU.” In my tests, PTP sometimes failed the 50-msec limit. The 1-msec limit was seldom met.

My testing typically used NTP-synchronized clients, limiting the absolute measurement accuracy to a couple of milliseconds. I’ve also used some GPS-synchronized clients, but accuracy is limited by network delay to a few milliseconds. The typical maximum VM PTP time error was 5-20 msec, though one VM regularly exceeded a 70 msec error. Better testing could be done by utilizing on-site external time references. Traceability to national standards could be realized by the use of NIST TMAS.

Next Steps

These observations were reported to Microsoft in late February 2025. As far as I can tell, the Azure VMs use a PTP-like mechanism to query the hypervisor clock; there is no indication that IEEE 1588 is being used. If this is correct, the inaccuracies reported here are likely due to poor synchronization of the Azure hypervisor. Microsoft stated that there is no SLA for time synchronization. At a minimum, the related Azure documentation should indicate that NTP is often superior to the Microsoft Azure “PTP” service.

I have no access to the Azure hypervisors and cannot explore them further. 

Footnote: Relationship to time.windows.com

The Microsoft Windows time service is provided by several sites; in the United States, this is Des Moines, Iowa. Multiple hosts/VMs at this site are involved with this service. Monitoring from several remote NTP test clients shows some similarity in offset patterns.

Photo by Pawel Czerwinski on Unsplash.


Viewing all articles
Browse latest Browse all 340

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>