Defining what it means to create an "AI for Alert and Warning" Benchmark

Jeannette Sutton
May 2
3 min read

AI-generated Alerts and Warnings? Part 3

Originally published on LinkedIn on April 4 as part of a series on AI generated Alerts and Warnings.

In a previous article, I introduced the idea of establishing benchmarks for AI Alert and Warnings (see this link). I explained that AI A&W benchmarks can help to set standards that will increase trustworthy use of AI while also ensuring that AI-generated messages align with the state-of-the-art in research and practice.

I also introduced three constructs: acceptability, accuracy, and actionability. Today, I want to discuss why these were the first three constructs that I identified as key to framing an AI A&W benchmark and define them along with their multiple dimensions for measurement.

Let’s start with the “why.” Each concept points to different aspects of AI A&W. First, ACCEPTABILITY, points to the EM user and the dimensions that need to be satisfied in order to establish trust and willingness to use a new technology. Second, ACCURACY, represents the message and its contents. Without an accurate message that follows the state-of-the-art research record, we can assume that the AI generated A&W is not going to be effective and therefore will not be acceptable. Third, ACTIONABILITY, points to the message receiver, the member of the public who is at risk and needs to take action. When a message is not actionable, people will search for additional information before deciding how to act, resulting in delays that can put people at greater risk. When a message is actionable, people are more likely to be motivated to comply quickly and effectively. An actionable message is written accurately (also, the contents are accurate) and together they can reinforce acceptability of technology.

Now let’s move on to the “what” by identifying some of the measurable dimensions that are part of each construct.

ACCEPTABILITY

Acceptability studies focus on willingness to use new technology. If we consider the Diffusion of Innovation, Rogers’ well known and long-studied theory on innovation and adoption, we know that early users are willing to be riskier, to test boundaries, and to be on the cutting edge. This does not accurately characterize most emergency managers whose job is to reduce risk by managing it. Risk aversion means adopting later once a technology has been refined and proven to be useful without increasing dangers to the organization, its operations, and the public it serves. When it comes to AI for Alerts and Warnings, acceptability is directly linked to the next two constructs that can be directly measured: message accuracy and message actionability.

ACCURACY

Message accuracy has two prongs. The FIRST prong is demonstrating that the contents and style conform to the best evidence possible, including message structure and the use of plain language that meets the needs of the most vulnerable people in our communities. We have decades of research on how to accurately construct a message. These can be measured by assessing how each message conforms to the state-of-the-art science and practice.

The SECOND prong is accuracy of content itself – characterizing the hazard, its impacts, and the actions that people should take to protect themselves. An AI for Alerts and Warnings will be able to accurately parse information about an event that is occurring and identify what should be included in that alert to reduce the cognitive load placed on the message writer. Both types of message accuracy can be assessed by using established benchmarks.

ACTIONABILITY

Message actionability refers to the efficacy of the message receiver. The Warning Response Model and the Protective Action Decision Model, two theories that we draw from in disaster research, have identified how people make a decision to act. An actionable warning message will be accurate (see above), but it will also reduce delayed action due to information seeking and milling. It will provide content that helps people to feel like they have the self-efficacy (internal capability) to act. Experimental researchers have found specific contents in a message increase actionability (such as eliminating jargon, including instructions, and providing details about the hazard impacts). This means that actionability can also be measured by establishing benchmarks.

So, there you have it – acceptability, accuracy, and actionability. Each dimension should be measurable. Each construct should have minimal thresholds for demonstrating that it aligns with a standard. Each construct can be measured using benchmarks.

What constructs would you add/take away? And what standards would you include in a benchmark?

If you’d like to learn more about effective alerting, be sure to check out thewarnroom.com