Hawaii’s alert was not the fault of a designer or operator

OK, enough with the unsolicited design updates to the Hawaii civil defense notification system, please! While the UI design of the operator’s interface could be improved, considerably, that’s not the root cause of the problem.

For those who have been in a cave for the past week, a false alarm sounded through Hawaii last weekend, which has attracted no shortage of design critique and righteous condemnation of an unquestionably atrocious user interface. The fires of that outrage only grew stronger when a screenshot of the interface was published showing the UI used (later, it turns out that this was NOT the actual UI, but a mockup).

Jared Spool describes how such an interface came into being. Prof. Robertson of the University of Hawaii also offered this detailed  outline of what likely happened. Since this post was first published, Vox posted an interview with a subject matter expert to explain how the system works in the national context.

The error was initially attributed to “Human (operator) error.” To which the UI design community (as represented on Twitter) pointed out that the system designers bear some (if not most) of the responsibility for the error by designing (or allowing) an interface that made such an error so easy.

The article linked in the NNgroup tweet observes that, “People should not be blamed for errors caused by poorly designed systems.” I hope the appropriate authorities consider this when they determine the fate and future career of the operator who made the fateful click.

While, NNgroup’s article is titled, “What the Erroneous Hawaiian Missile Alert Can Teach Us About Error Prevention,” I would argue that it is preaching to the choir and is not teaching their readers anything they don’t already know. The article describes a list of design patterns that can help make errors such as occurred in Hawaii less probable, usability 101 stuff, yet misses the elephant in the room–the overall system in which the UI exists. If context is important, and it is, the context of the system, not just that one interaction must be considered before identifying any errors to fix or placing any blame.

Here’s this armchair quarterback’s view.

Whose fault is it?

We like to (must?) blame someone, ideally someone other than ourselves,  when something goes wrong. Someone must be responsible for this! Here are the current suspects.

The operator

Yes, he or she made the fateful click or series of clicks, but I’d argue that person just happened to be at the scene of the accident–an accident that has been waiting to happen for years. The operator is an easy target on which to assign blame, but I don’t think the operator is the root cause of the problem.

The user interface

The next lowest hanging fruit on the tree of suspects is the user interface.  Could the user interface be better? Sure. I feel confident saying that even without having seen it. The historical evidence speaks for itself. The user interface, however, didn’t just happen. It is the result of how the system was built and procured. It is more of a symptom. But, let’s at least consider the possibility…

It’s possible the system designers were woefully incompetent as many seem to imply. However, there are many other possibilities that are much more likely. Here are a few:

  • There were no visual designers involved in the project.
  • Maybe there were some visual designers, but they didn’t work on this screen because it wasn’t seen as a high priority due to its: simplicity, frequency of use, distance from the main functions, or some other prioritization scheme.
  • There were designers who suggested better designs but who were ultimately vetoed by development or project management for reasons of budget or schedule.

The list of possibilities is long, and there are many more probable reasons for the resulting UI than incompetence or negligence on the part of designers. I’m not going to give them a pass, but I’m not going to stop looking, either.

The customer

How was the system procured? Who selected it? Did they verify that it would work and do what they needed without the possibility of serious errors?

A tweet mentioned in Jared Spool’s article shows an example of the system they might have used here:

Yikes!

It’s possible (and not uncommon in enterprise software procurement) that the operators had no influence on the selection or acceptance of the system.

The taxpayer/citizen

Something to remember about government systems is that they have a much longer procurement cycle and life cycle than a smartphone app, for example. The system could be many years old and newer revisions or newer systems could be available but maybe they weren’t procured due to budgetary constraints. The budgets are based on what the taxpaying citizens are willing to pay. So, if the system hasn’t had any problems so far, why spend money on changing it? After all, there are so many other programs that need funding.

The system itself

Is the system defective? Is the ability to add custom scripts that are identified by the script’s filename, as Jared Spool’s article describes a feature or a bug? Maybe it’s a feature that was improved in a later version of the software (but the government didn’t buy the updated version). That the system has operated without a problem until now shows that it has been working fine…until recently, of course.

Takeaways?

As with many accidents, the incident is usually the result of a chain of events, not just a single, fateful moment. As a chain of events, it could have been changed anywhere along the way and the incident would have been averted. It wasn’t and it wasn’t. What can we take from this to move forward?

Starting with the incident itself, the first solution is not always the best or most constructive solution. A few key changes have already been put into place to make it less likely that this particular error won’t happen again, but if the system itself is flawed, fixing only this problem leaves the door open for another (perhaps more serious) error could occur. Let’s not let our impatience to put this behind us get in the way of a clear and reasoned solution that addresses the systemic errors.

As with many accident investigations, there is rarely only one point of failure. Hence, the need to review, identify, and correct the systemic errors. it was not the fault of operator, the UI, the system, the Emergency Operations Center, or any other single entity. Rather, it is how they all came together last weekend. The solution is likely to require more than just “fixing” only one of these elements.

For those of us who design, document, and build things for other people, this is a reminder that designers and implementers must be on the constant lookout for latent problems. You don’t want to design for every possible scenario you can imagine or you’ll never build anything. That’s not what I’m suggesting. Rather, everyone involved on a product has the (moral, if not organizational) responsibility to step back and consider how things might go wrong, assess the likelihood of those events, and effectively advocate for changes where necessary.

Last weekend’s event was a sober reminder that nowadays, failure to consider possible error cases can have consequences that are literally unimaginable.

Leave a Reply