The Wonders Of IP Clocking

How The Mechanics Of Audio-Over-IP Allay An Audio Engineer’s Worst Fear: Digital Glitches. 

All digital audio systems require a clock, in order to synchronise connected devices to a common, continuous, and time-aligned sampling rate. But with the technological advances of networked audio, with wholly different connection and communication infrastructures therein, we have to embrace new ways of distributing sync in the quest for glitch-free audio.

In conventional digital audio systems, the de facto method of syncing digital gear is to use a distributed word clock. Focusrite's Will Hoult explains further. “The most simple implementation of a distributed word clock is to take a BNC cable and connect it from the word clock output of one device to the word clock input of another. Digital audio would be running via a different cable, such as ADAT, AES3 or S/PDIF, using the clock cable simply as a means of driving sync. The signal transmitted down the BNC cable is a square wave that will have one leading edge (pulse) for every single sample that is taken. If your sample rate is 96kHz, that's 96,000 pulses per second. The clock signal tells the receiving device when the start of each sample is, essentially by sending a constant stream of 'sample start' pulses down the clock cable."

When you add a device to the chain, by connecting an additional BNC cable, the clock signal gets passed to the next device, and sync is maintained. This method is fine for systems where all the sync'ed devices are confined to a relatively small area, like a recording studio or mastering suite, for example. “But if you were to apply that concept to a shared interface like an IP network," says Hoult, “you're actually using a huge amount of data and bandwidth just to send clock, and it's not a very efficient way of working."

With so many network connections, it's impractical to send distributed word clock on an IP-based system.

IP Clocking: A World Apart

It's not just the matter of data bandwidth that makes the traditional word clock method impractical for audio-over-IP systems such as Dante. Maintaining a noise-free, uninterrupted stream of data gets technically difficult — not to mention unpredictable — when you send it over long distances. And with the potential to create geographically large networks with Dante, with connected devices kilometers away from each other, the distributed word clock method becomes untenable.

Thankfully, the architects of Internet Protocol (IP) found a solution to this long before the audio industry joined their LAN party. And so Dante-connected gear uses an entirely different method of clocking, officially known as IEEE 1588 v1 Precision Time Protocol (or 'PTP' for short). Will Hoult explains. “In a PTP-clocked system, each device on the network has its own high-quality crystal oscillator, which is known to resonate at a very high, predictable rate — albeit one subject to environmental conditions such as pressure and temperature. This provides each device with a stable independent internal clock that ensures it can process its own data. The interfaces basically count for themselves."

Of course, the missing link here is still syncronisation. For example, if there are two different devices, in different locations with different temperatures and pressures, each device may count at a different rate.

Though equipped with several distributed word clock connections, these are not necessary when interfacing only with Dante equipment.

“So what PTP does," continues Hoult, “is send frequent time updates around the network, so that all the connected devices re-calibrate themselves at regular intervals: once every quarter-second (250ms) to be precise. Instead of distributing a rapid and uninterrupted stream of clock pulses — as with a word clock system — what we're now distributing is the actual time*. In theory, if we're doing that frequently enough for what we know the likely drift to be of any of the oscillators on the network, we can guarantee clear and stable audio across all interfaces."

*Actual time is usually irrelevant to an audio network, unless in a broadcast scenario when a special GPS master clock may be used to lock it to actual time. Focusrite RedNet and Red devices use UNIX time (also known as POSIX or Epoch time). This is the number of seconds that have elapsed since midnight (UTC) on January 1st 1970. At the time of writing, it was 1487473910151 UNIX time.

Counting Out Loud

Hoult has a musical analogy, which he uses to explain PTP in non-technical terms. “It's like if you got three people to count together to five, then count silently in their head from six to ten, and when they get to eleven, say 'eleven'. You can almost guarantee that the first five will be perfectly in time when everyone is counting together. When you get to eleven, you'll hear the 'elevens' coming at different points in time." Adding a conductor to this scenario would fix the problem. Hoult continues, “in an orchestra, the conductor generally isn't counting every single beat for you — maybe they just mark the quarter notes. So the players are only referencing off occasional points in time, rather following the conductor's every move. The drummer might be playing sixteenths, for example, but they're only getting their time reference on the down beat, while making up what falls in between." In the case of PTP, the player's musical timing is the crystal oscillator; the time reference is the stroke of the conductor's baton.

A conductor provides a regular time reference for all players in the band to follow. A similar process takes place within the PTP protocol, used by Dante devices.

But where does the reference come from? In a distributed word clock scenario, the operator would choose which device is the master and then set all other devices in the sync chain to slave from it. But with audio-over-IP, a clock election protocol within PTP measures various properties of each device's internal clock and chooses the best one to be the 'grandmaster'. Basically, the system does all the work for you. (Dante Controller does have an option to designate 'preferred master' status to devices on the network, should you wish to choose your master clock source for yourself.)

The Benefits

Though the quarter-second period of PTP's time recalibration may sound like an eternity in digital audio, it's all that's needed to keep devices on track. The benefit compared to distributed word clock is that, instead of sending 96,000 clock pulses per second, we're sending four short packets of data, which is very low in terms of bandwidth consumption.

“The last thing you want is to lose your clock," says Hoult, “because it's the most important signal that goes round the network. And so we prioritise it above the audio data using Quality Of Service (QoS) configuration within the IP protocol. This sounds counter intuitive, but it's more important to maintain accurate timing of your system, than to have audio appearing out of sync."

Now, no system is perfect, so it is technically possible for a networked device to miss a packet of clock data, even if it's QoS-prioritised on the network. “If your network is badly configured, you could get a disrupted clock signal. If this happens, Dante devices (such as Focusrite's RedNet and Red interfaces) are programmed to rapidly mute audio, with a ramp down and back up when the clock is re-established."

Though this is the worst-case scenario, it's a far fairer result than if you were to lose sync in a distributed word clock system. In the latter situation, you would most likely have to re-sync all devices on your entire system using front-panel controls and cumbersome master/slave setup procedures. Not to mention enduring the hallmark digital glitching of a digital audio device that's lost its sync.