Computational Analysis of Quran Text Using Machine Learning and Large Language Models

Shahid, Usama ORCID logoORCID: https://orcid.org/0009-0005-6360-333X, Hussain, Muhammad Zunnurain and Sayers, William ORCID logoORCID: https://orcid.org/0000-0003-1677-4409 (2025) Computational Analysis of Quran Text Using Machine Learning and Large Language Models. In: 2025 8th International Conference on Data Science and Machine Learning Applications (CDMA), 16-17 February 2025, Riyadh, Saudi Arabia. ISBN 979-8-3315-3969-6

[thumbnail of Peer-reviewed version]
Preview
Text (Peer-reviewed version)
14964 Shahid (2025) Computational analysis of Quran text (accepted version).pdf - Accepted Version
Available under License Creative Commons Attribution 4.0.

Download (1MB) | Preview
[thumbnail of 14964 Shahid, U., et al (2025) Computational Analysis of Quran Text Using Machine Learning and Large Language Models.pdf] Text
14964 Shahid, U., et al (2025) Computational Analysis of Quran Text Using Machine Learning and Large Language Models.pdf - Published Version
Restricted to Repository staff only
Available under License All Rights Reserved.

Download (1MB)

Abstract

The Quran verses are foundational for Muslims worldwide. Significant research has been dedicated to information retrieval (IR) from Quran; however, multiple studies have focused on descriptive analysis and topic modelling of the Quran in Arabic and translated versions. This study presents a comprehensive framework for analysing large textual data using an English translation of the Quran. Initially, it conducts a descriptive analysis of the verses to uncover various features, including readability, word clouds, significant n-grams, and network graphs illustrating word associations. The framework then applies machine learning techniques, specifically clustering models based on numerical vectors from text-embedding-3-large, to identify effective groupings of verses. Additionally, GPT-4-turbo is used for topic modelling within each cluster through prompt engineering, aiming to enhance the understanding of these clusters. The results include statistical information graphs and concise knowledge summaries that are beneficial to both domain experts and wider populace.

Item Type: Conference or Workshop Item (Paper)
Subjects: Q Science > QA Mathematics > QA76 Computer software
Divisions: Schools and Research Institutes > School of Business, Computing and Social Sciences
Depositing User: Kamila Niekoraniec
Date Deposited: 11 Apr 2025 13:35
Last Modified: 24 Apr 2025 09:30
URI: https://eprints.glos.ac.uk/id/eprint/14964

University Staff: Request a correction | Repository Editors: Update this record

University Of Gloucestershire

Bookmark and Share

Find Us On Social Media:

Social Media Icons Facebook Twitter YouTube Pinterest Linkedin

Other University Web Sites

University of Gloucestershire, The Park, Cheltenham, Gloucestershire, GL50 2RH. Telephone +44 (0)844 8010001.