The Large Language Model for disaster risk reduction

LLMs are AI tools capable of understanding and generating natural language text, with applications in various sectors. But what about disaster risk management? CIMA Research Foundation, in collaboration with IE University, explored the integration of LLMs into operational workflows, highlighting both the potential and current limitations of these models.

“Large Language Models (LLMs) are a class of AI models designed to understand and generate natural language text. Based on deep neural network architectures, these models are trained on massive amounts of textual data, which can include books, articles, websites, and other online text resources.” This definition of LLMs is provided by none other than an LLM, ChatGPT. It continues, “I have been trained on a vast range of textual data to understand and generate natural language text. This allows me to answer questions, write texts, translate languages, and much more. I can assist you with many tasks that require language understanding and generation.”

It is well known that these tools play an increasingly important role in various areas of our society: they can be integrated into online services such as chatbots for assistance and support, provide translations, and generate articles, images, and various other forms of content for communication and training. It is evident that LLMs also have a significant place in the context of scientific research. However, disaster risk reduction is a peculiar field because it combines aspects of research with operational activities, such as emergency management. How can LLMs fit into this context?

Researchers at CIMA Research Foundation, in collaboration with IE University in Madrid, have begun to explore this topic.

LLMs and Disaster Risk Management

“LLMs are based on the same principle that drives machine learning techniques: they can learn and apply the information given to them during the training phase. However, classical machine learning tools are primarily used for regression or classification applications. For example, the work done by CIMA Research Foundation on wildfires, where the machine learning model used the information provided to return risk information. LLMs, on the other hand, use the information at their disposal to generate new information,” explain Mirko D’Andrea, a researcher at CIMA Research Foundation, and Jean Baptiste Bove, a PhD student at the University of Genoa conducting his studies at CIMA. “In practice, we can say that LLMs are systems capable of predicting the next word based on a sequence of words.”

The potential of these tools is becoming increasingly clear as they develop, but even today, it is not entirely clear (also in terms of risks that may arise with their advancement). The idea of the two scientists was to understand how they could integrate into the workflow of organizations dedicated to disaster risk management.

“There are many tools available: we selected four, which we tested on three case studies to see how well they could perform some of the tasks required in risk management,” explain D’Andrea and Bove. “In particular, we asked each of the four selected LLMs to try to write warning messages (like those sent to the population by the IT-Alert system), to describe scenarios usable in exercises, and finally, the most complex task, to assess the risk based on meteorological and climatic parameters provided to them.”

The initial hypothesis was that the models could provide effective outputs when it came to giving generic information, but would perform less well when the task was very specific and required contextualized information (and responses). According to this hypothesis, the results of the LLMs would be inferior when the information available for contextualization activities was scarce.

A Benchmark for LLMs

To verify this hypothesis, the researchers tested the models by asking them to perform the three tasks for different countries: Mozambique, the Philippines, and Spain. “This provided us with countries with different languages and risk profiles, but also for which the available information is not homogeneous: there is a lot of online data for Spain, intermediate for the Philippines, and scarce for Mozambique,” explain D’Andrea and Bove. “We then created a grid for evaluating the responses of the LLMs, which we asked both humans, experts in disaster risk management, and another AI, ChatGPT, to fill out, which has already proven to be very effective in performing an evaluation based on clear guidelines.”

The results are summarized in a score, indicating the methodology that works best depending on the country. “It’s as if we created a benchmark that allows us to make an initial assessment of what can be done with different LLMs in various countries in the context of activities commonly carried out in risk management,” explain the researchers. This work shows that LLMs find important applications in the field of risk management, for example in the rapid retrieval of data from large datasets, thereby improving the efficiency of text-based workflows. However, it also highlights the main limitation of these tools: widely available LLMs do not have access to the necessary data to provide outputs of human-equivalent quality at the required level of contextual specificity. This becomes particularly evident when the LLM is faced with, for example, non-Western regions, widely used local dialects, less common types of disasters, regions with less robust data, or the need for comprehensive risk assessments.

“In general, we cannot ignore the incredible speed at which AI is developing: undoubtedly, we will also see rapid advancements in this context in the near future. Understanding how it will develop and how we can leverage their development will certainly be an important task in the coming years,” conclude D’Andrea and Bove. “In the meantime, by further exploring how far LLMs can contribute to various aspects of disaster management, we will expand this initial investigation: our goal is to develop a benchmark that allows us to start using them operationally in daily risk reduction activities.”

Share