top of page

Transparency builds trust: creating benchmarks out in the open.

  • Writer: Jeannette Sutton
    Jeannette Sutton
  • May 2
  • 3 min read


AI-generated Alerts and Warnings? Part 6



I recently had a conversation with a team of researchers who commented on how transparent I’ve been with my articles about building a benchmark for AI-generated alerts and warnings. Being so open about the planning, the process, and the steps we are taking is unexpected, it seems. And yet when EM1 asked me to come on board as a subject matter advisor on alerts and warnings, it was always with an idea that they would be open about how they were developing an AI toolkit that would serve the emergency management community.


Transparency builds trust.

AI has rolled out into our technological systems like a tsunami wave. In some ways, it feels like a near-shore event… “you are in danger! Get to higher ground now!” It seems to have been generated from some deep, underground event that can’t be seen until the world began to shake and everything rapidly transformed. We read every day about jobs at risk, new graduates wondering if they will even find a place in this changed world. It can feel like a very frightening time filled with uncertainty.


Truthfully, as a faculty member at the University at Albany, I observed the creation of the campus AI Institute, backed by millions of dollars to hire new faculty and build new campus computing systems, and wondered, what will happen to my social science colleagues? With attention directed to this shiny new object, will resources be redirected, giving us even less to fight over? And yes, my concerns have been validated; but it has also made me re-think my relationship to AI.


Researchers studying the diffusion of innovation and technology adoption have found that willingness to use new technology is related to three things: relative advantage, complexity, and trialability; unwillingness to use new technology stems from four related ideas: resistance to change, and fear – fear of the unknown, fear of losing control, and fear of failure. New research, conducted by a team at BYU, found that resistance to AI use includes two key issues that are relevant to emergency managers: in high-risk environments, output quality and ethics play a significant role. They found that errors, such as inaccurate messages, can have significant consequences. And, interestingly, they found that assigning generative AI to high-risk tasks may be perceived as unethical. And yet the wave keeps coming.


I’ve had time to observe this coming wave as we conducted our FEMA-funded research on alerts and warnings. We developed the FEMA-IPAWS Message Design Dashboard, ensuring that it was based in the state-of-the-art research methods: content analyses, scoping reviews, assessments of current tools, experiments, subject matter expert interviews, and user interaction design. And we created and published the data backbone, the Warning Lexicon and Post Alert Lexicon, to ensure that the analog versions would be available offline for all who want to access to it.


As we engaged in this 4-year research effort (built on 7 decades of social science research), we could see the distant tsunami approaching. And for us, it became an experience of “prepare now, the wave is coming, it’s time to get moving.” In fact, I watched emergency managers open up ChatGPT in the middle of a training session I was leading; a bold move given that the professor was standing right there sharing why it is important to understand the social science of alerts and warnings. Knowing that the tsunami had reached the community I was dedicated to serve, I agreed to dip my toe in the water. Perhaps you can relate? Perhaps you’ve been watching the tsunami approach too.


You can’t swim a tsunami, you can’t surf a tsunami, few people survive the debris that gets pushed along with the wave without hoisting themselves up onto a high point and hanging on to something stable. So, perhaps this metaphor has limited value.


But I will offer this: we have the research and the data to set benchmarks for effective alerts and warnings. We know the contents needed to reduce delayed action. We know the information people will search for as they make decisions to act. We know the failures that come when the words we use result in confusion and fear. And because we human beings know how to do this well, we can also expect the LLMs to prove that they meet our benchmarks before we rely on them for critical, high-risk activities like creating a warning message.


And that is the effort we are currently working on at EM1. To establish those benchmarks based in research, with the input of additional subject matter experts, so that those LLMs can be evaluated. And we’re doing this out in the open. Because transparency matters.

bottom of page