Rieger, Matthias, Shah, Atif, Alam, Abu ORCID: https://orcid.org/0000-0002-5958-7905 and Hossain, Md Jakir
(2026)
Possibilities and limitations of using large language models (LLMs) for alert classification and prioritisation in security operations centers (SOCs).
Expert Systems with Applications, 331 (C).
art:133194.
doi:10.1016/j.eswa.2026.133194
Preview |
Text (Published version)
16399 Alam (2026) Possibilities and limitations of using large language models.pdf - Published Version Available under License Creative Commons Attribution Non-commercial No Derivatives 4.0. Download (4MB) | Preview |
Abstract
As cyber threats have become more sophisticated over time, security opera- tions centers (SOCs) have increasingly faced vast amounts of security alerts to be investigated, ultimately leading to overwhelmed security analysts and symptoms such as alert fatigue. While traditional automation has helped with streamlining parts of the incident response workflow, it remains limited, especially in regard to context-dependant tasks such as triage and prioriti- sation. Against this background, this research investigates the potential of large language models (LLMs) to augment SOC workflows through natural language understanding. Using a dataset of 178 manually labeled alerts, eight general-purpose LLMs from OpenAI, DeepSeek and Ai2 were tasked with independently classifying the alerts into true and false positives as well as prioritising them as low, medium, high or critical. In addition, traditional supervised machine learning baselines, including Logistic Regression, Random Forest and Linear Support Vector Machine (SVM), were implemented for comparative evaluation on the binary classification task. The performance of the models was assessed using standard evaluation metrics such as accuracy, precision, recall, F1-score and false positive rates as well as operational factors like runtime and cost per alert. Results show that while several LLMs achieved strong classification recall, lightweight machine learning models achieved competitive and, in some cases, superior binary classification performance, with the Linear SVM baseline achieving the highest overall F1-score. However, alert prioritisation proved substantially more challenging across all evaluated LLMs. While some models captured high- severity alerts with strong recall, precision remained consistently low, contributing to significant alert noise and elevated false positive rates. These findings suggest that while LLMs are able to support SOC analysts with ini- tial triage and contextual reasoning, their reliability for accurate prioritisation remains limited, and lightweight machine learning approaches continue to provide strong practical value for structured SOC alert classification tasks.
| Item Type: | Article |
|---|---|
| Article Type: | Article |
| Uncontrolled Keywords: | Security operations center; SOC; Large language models; LLM; Alert triage; Alert prioritisation; Alert classification; Incident response; LLM aided triage; Threat detection |
| Subjects: | Q Science > Q Science (General) > Q336 Artificial intelligence Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
| Divisions: | Schools and Research Institutes > School of Business, Computing and Social Sciences |
| Depositing User: | Kamila Niekoraniec |
| Date Deposited: | 30 Jun 2026 09:19 |
| Last Modified: | 30 Jun 2026 09:30 |
| URI: | https://eprints.glos.ac.uk/id/eprint/16399 |
University Staff: Request a correction | Repository Editors: Update this record

Tools
Tools