Digital Quran Computing Algorithms and Applications

Authors

  • Amro Badawy Department of Computer Science, Faculty of Computer and Informatics, Zagazig University, Zagazig 44519, Egypt
  • Ahmed Salah Department of Computer Science, Faculty of Computer and Informatics, Zagazig University, Zagazig 44519, Egypt
  • Mahmoud Mahdy Department of Computer Science, Faculty of Computer and Informatics, Zagazig University, Zagazig 44519, Egypt

Keywords:

Quran, Authentication, Classification, Topic Analysis, Quranic Text Mining, LDA Topic Modeling, TF-IDF, K-means Clustering, Thematic Analysis, Digital Humanities

Abstract

The computational analysis of the Quran presents unique challenges due to its linguistic complexity, thematic richness, and cultural significance. This paper explores advanced text mining techniques to uncover thematic structures in three Surahs: Al-Kahf, An-Naml, and Al-Baqarah. Two primary methodologies are employed: (1) Latent Dirichlet Allocation (LDA) for topic modeling, applied to Surahs Al-Kahf and An-Naml to extract latent themes such as faith, morality, and divine guidance, validated by Quranic scholars; and (2) a TF-IDF/UMAP/K-means clustering pipeline for Surah Al-Baqarah, which identifies semantically coherent thematic groups (e.g., "Divine Law and Guidance," "Faith and Belief") through dimensionality reduction and unsupervised learning. Key findings demonstrate the efficacy of these methods in bridging traditional exegesis (tafsir) with data-driven approaches, revealing nuanced thematic interconnections and validating known Quranic structures quantitatively (e.g., 10-cluster solution for Al-Baqarah with a silhouette score of 0.21). The paper contributes to digital Quranic studies by providing reproducible frameworks for thematic analysis, addressing challenges in Arabic text preprocessing, and highlighting the potential of hybrid computational-hermeneutic methodologies. This work advances the interdisciplinary dialogue between Islamic scholarship and modern NLP, offering tools for scalable, objective analysis while preserving theological nuance. The study leverages LDA topic modeling to analyze Surahs Al-Kahf and An-Naml, identifying five key themes per Surah, such as "Stories of Prophets" and "Divine Parables," which align with classical exegetical classifications. Preprocessing steps—including diacritic removal, stop-word filtering, and tokenization—were tailored to Quranic Arabic, ensuring linguistic fidelity. Expert validation confirmed the theological coherence of extracted topics, demonstrating LDA’s utility in augmenting manual tafsir with scalable, data-driven insights.

Downloads

Download data is not yet available.

Published

2025-06-23

How to Cite

Badawy, A., Salah, A., & Mahdy, M. (2025). Digital Quran Computing Algorithms and Applications. International Journal of Computers and Informatics (Zagazig University), 7, 79–90. Retrieved from http://www.ijci.zu.edu.eg/index.php/ijci/article/view/106