Possibilities and limitations of using large language models (LLMs) for alert classification and prioritisation in security operations centers (SOCs)

Rieger, Matthias, Shah, Atif, Alam, Abu ORCID logoORCID: https://orcid.org/0000-0002-5958-7905 and Hossain, Md Jakir (2026) Possibilities and limitations of using large language models (LLMs) for alert classification and prioritisation in security operations centers (SOCs). Expert Systems with Applications, 331 (C). art:133194. doi:10.1016/j.eswa.2026.133194

[thumbnail of Published version]
Preview
Text (Published version)
16399 Alam (2026) Possibilities and limitations of using large language models.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial No Derivatives 4.0.

Download (4MB) | Preview

Abstract

As cyber threats have become more sophisticated over time, security opera- tions centers (SOCs) have increasingly faced vast amounts of security alerts to be investigated, ultimately leading to overwhelmed security analysts and symptoms such as alert fatigue. While traditional automation has helped with streamlining parts of the incident response workflow, it remains limited, especially in regard to context-dependant tasks such as triage and prioriti- sation. Against this background, this research investigates the potential of large language models (LLMs) to augment SOC workflows through natural language understanding. Using a dataset of 178 manually labeled alerts, eight general-purpose LLMs from OpenAI, DeepSeek and Ai2 were tasked with independently classifying the alerts into true and false positives as well as prioritising them as low, medium, high or critical. In addition, traditional supervised machine learning baselines, including Logistic Regression, Random Forest and Linear Support Vector Machine (SVM), were implemented for comparative evaluation on the binary classification task. The performance of the models was assessed using standard evaluation metrics such as accuracy, precision, recall, F1-score and false positive rates as well as operational factors like runtime and cost per alert. Results show that while several LLMs achieved strong classification recall, lightweight machine learning models achieved competitive and, in some cases, superior binary classification performance, with the Linear SVM baseline achieving the highest overall F1-score. However, alert prioritisation proved substantially more challenging across all evaluated LLMs. While some models captured high- severity alerts with strong recall, precision remained consistently low, contributing to significant alert noise and elevated false positive rates. These findings suggest that while LLMs are able to support SOC analysts with ini- tial triage and contextual reasoning, their reliability for accurate prioritisation remains limited, and lightweight machine learning approaches continue to provide strong practical value for structured SOC alert classification tasks.

Item Type: Article
Article Type: Article
Uncontrolled Keywords: Security operations center; SOC; Large language models; LLM; Alert triage; Alert prioritisation; Alert classification; Incident response; LLM aided triage; Threat detection
Subjects: Q Science > Q Science (General) > Q336 Artificial intelligence
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Schools and Research Institutes > School of Business, Computing and Social Sciences
Depositing User: Kamila Niekoraniec
Date Deposited: 30 Jun 2026 09:19
Last Modified: 30 Jun 2026 09:30
URI: https://eprints.glos.ac.uk/id/eprint/16399

University Staff: Request a correction | Repository Editors: Update this record

University Of Gloucestershire

Bookmark and Share

Find Us On Social Media:

Social Media Icons Facebook Twitter YouTube Pinterest Linkedin

Other University Web Sites

University of Gloucestershire, The Park, Cheltenham, Gloucestershire, GL50 2RH. Telephone +44 (0)844 8010001.