Airline Outages: An Availability Wake Up Call for All Industries

Passengers wait in long lines at LAX due to a recent data center outage.
Passengers wait in long lines at LAX due to a recent data center outage.

Delta’s debilitating data center outage last week, causing hundreds of flight cancellations and thousands of lengthy delays, coming on the heels of similar problems at Southwest Airlines, were just the latest such damaging incidents in the airline industry.

Last July United Airlines was forced to ground hundreds of flights due to a router configuration issue.   This January an outage at a Verizon data center used by JetBlue shut down all of the airline’s systems, paralyzing the company’s operations. Those four airlines have all had to ground their operations company-wide due to computer issues. Why aren’t backup systems kicking in? That’s what many observers are asking.

Large airlines have very intricate systems and usually are very dependent on complex legacy applications. But from what we know, these outages all appeared to be triggered by backup systems that didn’t work.

The two most recent outages, at Southwest and Delta, responsible for over $100 million in revenue loss, both should not have happened. In each case redundant systems didn’t activate. For Southwest, a router suffered only a “partial failure”, and the lack of a complete failure didn’t properly signal the backup systems to kick in. For Delta, a data center fire uncovered a situation where “300 of 7,000 data center components were discovered not to have been configured appropriately to avail backup power.” The resulting electrical failure grounded the airline’s entire fleet.

A Matter of Trust: When it comes down to equipment quality, facility design, and numerous other issues regarding availability, there is an enormous amount of trust involved.

  • Southwest trusted that a router failure would trigger backup systems.
  • Delta trusted that they had a properly architected and implemented data center redundancy solution.
  • JetBlue trusted that their data center provider had the necessary resiliency to avoid downtime.
  • United trusted the configuration of its router wouldn’t cause service interruption.

Trust is at the very heart of the data center ecosystem. That trust is usually placed in a piece of gear, a network, a data center service provider, a cloud provider, a contract engineer, or some other aspect of the IT infrastructure ecosystem.

The trust is placed not only in operations people, but in salespeople, whose honestly is often assumed during a purchasing process.

Misplaced trust in the judgment, expertise, or honesty of one person can shut down an entire airline.   What could it do to you?

 

 

 

Image from Al Seib / Los Angeles Times:  http://www.latimes.com/business/la-fi-united-flights-grounded-20150708-story.html