Sample Reduction for Physiological Data Analysis Using Principal Component Analysis in Artificial Neural Network

Adolfo, Cid Mathew, Chizari, Hassan ORCID: 0000-0002-6253-1822, Win, Thu Yein ORCID: 0000-0002-4977-0511 and Al-Majeed, Salah ORCID: 0000-0002-5932-9658 (2021) Sample Reduction for Physiological Data Analysis Using Principal Component Analysis in Artificial Neural Network. Applied Sciences, 11 (17). Art 8240. doi:10.3390/app11178240

[img]
Preview
Text (Peer Reviewed Version)
10132 Adolfo, Chizari, Win and Al-Majeed (2021) Sample-reduction-for-physiological-data-analysis-using-principal-component-analysis-in-artificial-neural-network.pdf - Accepted Version
Available under License Creative Commons Attribution 4.0.

Download (1MB) | Preview

Abstract

With its potential, extensive data analysis is a vital part of biomedical applications and of medical practitioner interpretations, as data analysis ensures the integrity of multidimensional datasets and improves classification accuracy; however, with machine learning, the integrity of the sources is compromised when the acquired data pose a significant threat in diagnosing and analysing such information, such as by including noisy and biased samples in the multidimensional datasets. Removing noisy samples in dirty datasets is integral to and crucial in biomedical applications, such as the classification and prediction problems using artificial neural networks (ANNs) in the body’s physiological signal analysis. In this study, we developed a methodology to identify and remove noisy data from a dataset before addressing the classification problem of an artificial neural network (ANN) by proposing the use of the principal component analysis–sample reduction process (PCA–SRP) to improve its performance as a datacleaning agent. We first discuss the theoretical background to this data-cleansing methodology in the classification problem of an artificial neural network (ANN). Then, we discuss how the PCA is used in data-cleansing techniques through a sample reduction process (SRP) using various publicly available biomedical datasets with different samples and feature sizes. Lastly, the cleaned datasets were tested through the following: PCA–SRP in ANN accuracy comparison testing, sensitivity vs. specificity testing, receiver operating characteristic (ROC) curve testing, and accuracy vs. additional random sample testing. The results show a significant improvement in the classification of ANNs using the developed methodology and suggested a recommended range of selectivity (Sc) factors for typical cleaning and ANN applications. Our approach successfully cleaned the noisy biomedical multidimensional datasets and yielded up to an 8% increase in accuracy with the aid of the Python language.

Item Type: Article
Article Type: Article
Uncontrolled Keywords: Principal Component Analysis (PCA); Artificial Neural Network (ANN); Multidimensional Dataset; Dimension Reduction Process; Sample Reduction Process (SRP); Receiver Operating Characteristic (ROC) Curve; Selectivity (Sc); Sensitivity; Specificity
Subjects: H Social Sciences > HF Commerce > HF5001 Business
R Medicine > R Medicine (General)
Divisions: Schools and Research Institutes > Gloucestershire Business School
Research Priority Areas: Applied Business & Technology
Depositing User: Kate Greenaway
Date Deposited: 09 Sep 2021 16:10
Last Modified: 09 Sep 2021 16:15
URI: http://eprints.glos.ac.uk/id/eprint/10132

University Staff: Request a correction | Repository Editors: Update this record

University Of Gloucestershire

Bookmark and Share

Find Us On Social Media:

Social Media Icons Facebook Twitter Google+ YouTube Pinterest Linkedin

Other University Web Sites

University of Gloucestershire, The Park, Cheltenham, Gloucestershire, GL50 2RH. Telephone +44 (0)844 8010001.