Thoughts on Redundancy and Failover

Apr 12

We will begin with the simple premise that, when a sound system is being used, it is usually undesirable for said sound system to stop working unexpectedly. A major and common vulnerability here is not the sound system itself, in the strictest sense, but rather the signals that are driving the system. Put very simply, how can we drive our systems in such a way that a single bad cable doesn’t interrupt the show?

If the digital mixing console is connected to the stage box by a single network cable, for example, that cable is now a “show stopper” if it were to fail. It could fail for a variety of reasons - wear and tear, a poor mechanical connection, or something like a wheelchair or a road case being rolled over the cable and damaging it. Whatever the reason, this single cable run is a major vulnerability for our system.

Many digital console systems incorporate a redundant snake connection between the surface and the stage box - two lines are run, so if one fails, the other one takes over and the show continues. Some while back, my friend who owns a small PA rental company expressed concern that the model of console he owned did not offer the option for a redundant snake connection. The approach I offered was for him to run a second network line along with the first during load in, even though the console could only use one at a time. If the line failed, he could simply swap over to the backup line at both ends and be back up and running in a few seconds rather than pausing the show for the time it took to run a second cable through the audience. A major component of a robust system is anticipating the ways it might fail, and having a plan in the case of that failure occurring.

In terms of PA system drive, most modern amplifiers offer a choice of analog or digital inputs, which makes the failover approach quite simple - if the system can accept either digital or analog input, we can configure it so that if the digital fails, the analog takes over. In general terms there are two main strategies here:

We always are sending both the digital and analog signals to the amplifiers, all the time (I like to think of this as Eddie Murphy failover strategy). The amplifier is configured with an internal priority setting that tells it to use the digital inputs by default, and if the digital signal fails, it will start using the analog inputs instead.
We configure the amplifiers to listen to both the digital and analog inputs, and choose to unmute one or the other in our front-end drive.

This raises an important point: our failover strategy must be robust to a network failure. If the failover strategy is “go into the control software and switch all the amps over to analog,” that strategy doesn’t help us when the network goes down. So the amplifiers must either manage the failover themselves without any prompting from the operator (Case 1 above) or the device switching between the digital and analog drive signals must be able to be operated directly via local control in the case of a network failure (Case 2). This is a major reason that I always opt to have a front-end processor in my drive rack even when driving sound systems whose amplifiers have robust built-in processing ability.

The advantage to the first approach is clear - the failover is automatic and requires no intervention from the system operator. However, since every amplifier is left to fend for itself, this can lead to a state where some amplifiers fail over to analog while others don’t, depending on exactly how the failure occurred. For example, maybe an AES jumper between amps 4 and 5 was the failure point in an 8-amplifier rack, so now you have half the main array running on digital and the other half on analog. (And yes, there could be a latency and / or level difference between those two signals - we’ll look at that below.)

We also want to understand what would happen in the case that the digital signal connection is restored to a healthy state - do the amplifiers hop back over to digital input, or stay on analog? If the digital signal is intermittent, will the amplifiers be hopping back and forth between analog and digital the entire show? If we have no functioning network, how can we monitor this state, or change the settings?

The second strategy sidesteps these issues because the amplifiers always have both their analog and digital inputs open, and that state never changes. We are never at the mercy of the amplifier’s internal logic to decide whether or not to fail over, and we can’t end up in a state where some of the amplifiers have failed to analog and some haven’t. The entire rig rolls to one or the other at once via the operator unmuting analog or digital at the front end, and we can do this even in the case of complete network failure. Simply put: if your “analog backup” strategy relies upon you having a functioning network to activate it, it’s not an analog backup strategy.

One of the drawbacks here is that this requires leaving the analog drive lines to the amps open all the time, so if any noise or buzz is picked up by the drive system, it will be reproduced by the loudspeakers. I can say that in practice this doesn’t tend to be a major concern as long as care is taken to properly maintain the analog connections (often a multi-pin connector that carries all the necessary analog feeds along a single cable) but if the environment is particularly hostile in terms of EMI, this may be a consideration.

The other common criticism of this approach is to point out that the system doesn’t fail over automatically - in the case of a digital signal failure, audio is interrupted until the operator affirmatively decides to unmute the analog backup. This is perfectly true in the case that the digital signal is AES, however the in the modern era, most larger systems are driven via networked audio, either Milan or Dante, both of which are inherently redundant.

So a Milan or Dante system with analog backup actually offers us three levels of redundancy: the inherent redundancy between the Primary and Secondary legs of the Dante or Milan distribution, which fails over seamlessly and automatically if configured properly, and an additional analog backup that can be manually engaged by the operator at will. This also provides us a handy tool in the case of troubleshooting a network issue during a show: simply roll the system over to analog, and we can restart or troubleshoot network switches as needed without any fear of interrupting the analog audio, then when the issue has been resolved, we can seamlessly roll back over to digital at a convenient point in the performance (say, between songs).

The two major backup / failover approaches described above can be varied and customized based on the specifics of thee situation, the equipment being used, and the needs and priorities of the event. A proper system drive approach considers the points at which the system is vulnerable to a failure and what we would like to happen in the event of such a failure.

Regardless of the approach you settle on, we should never assume that the latency differences and potential level offsets between digital and analog drives have been sorted out for us, as this is often not the case. This can be dealt with rather simply by placing your measurement mic in proximity to one of the loudspeakers, and capturing a measurement of the system driven by the digital drive. Swap over to analog and take the measurement again. The timing difference - usually on the order of 1 - 3 milliseconds - will be readily revealed by the phase trace, while level offsets will show up in the magnitude trace. Adjust the delay and gain either per-input in the amplifier settings, or on the digital and analog outputs in the front end processor, until the two traces overlay in both phase and magnitude.

Finally, be sure to test your failover / backup system as part of your system verification each time the rig is deployed, or periodically for installed systems. When using redundant Milan or Dante, pull the primary network line and ensure audio is not interrupted. Then reconnect it and pull the secondary line, again making sure audio is not interrupted. Roll over to analog drive and listen for any hums, buzzes, noise, and that the signal level is comparable to the digital.

Michael Lawrence

Thoughts on Redundancy and Failover

The Front Row Experience

New Quick Solver: Disto to EASE Focus