Phishing URL detection and interpretability with machine learning: a cross‐dataset approach

Yi, Liyan, Omotosho, Adebayo ORCID logoORCID: https://orcid.org/0000-0002-1642-7610 and Balogun, Hamed (2026) Phishing URL detection and interpretability with machine learning: a cross‐dataset approach. Security and Privacy, 9 (1). e70175. doi:10.1002/spy2.70175

[thumbnail of Published version]
Preview
Text (Published version)
15717 Omotosho (2026) Phishing URL Detection and Interpretability.pdf - Published Version
Available under License Creative Commons Attribution 4.0.

Download (2MB) | Preview

Abstract

Phishing attacks pose a significant security threat, particularly through deceptive emails designed to trick users into clicking on malicious links, with phishing URLs often serving as the primary indicator of such attacks. This paper presents a machine learning approach for detecting phishing email attacks by analyzing the URLs embedded within these emails, using Random Forest, eXtreme Gradient Boosting, and Light Gradient Boosting Machine models. Secondary datasets are used to evaluate model behavior and examine the applicability of model features across different samples. The models are assessed using metrics such as accuracy, precision, and recall to demonstrate their effectiveness in distinguishing between benign and malicious email URLs. The SHapley Additive exPlanations (SHAP) framework is employed to interpret the models' decision‐making processes and reinforce the relevance and reliability of key features. Our results show that across four test sets, the three models achieve an average classification error 4.03% and an average accuracy 94%, indicating strong generalization capability across diverse datasets using the same set of features.

Item Type: Article
Article Type: Article
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software > QA76.758 Software engineering
Divisions: Schools and Research Institutes > School of Business, Computing and Social Sciences
Depositing User: Rhiannon Goodland
Date Deposited: 05 Jan 2026 09:23
Last Modified: 07 Jan 2026 16:15
URI: https://eprints.glos.ac.uk/id/eprint/15717

University Staff: Request a correction | Repository Editors: Update this record

University Of Gloucestershire

Bookmark and Share

Find Us On Social Media:

Social Media Icons Facebook Twitter YouTube Pinterest Linkedin

Other University Web Sites

University of Gloucestershire, The Park, Cheltenham, Gloucestershire, GL50 2RH. Telephone +44 (0)844 8010001.