The Classification of Heart Murmurs: The Identification of Significant Time Domain Features
Keywords:PCG, Heart Murmur, Machine Learning, Classification, Time Domain, Feature Selection
Phonocardiogram (PCG) is a type of acoustic signal collected from the heartbeat sound. PCG signals collected in the form of wave files and collected type of heart sound with a specific period. The difficulty of the binomial class in supervised machine learning is the steps-by-steps workflow. The consideration and decision make for every part are importantly stated so that misclassification will not occur. For the heart murmurs classification, data extraction has highly cared for it as we might have fault data consisting of outside signals. Before classifying murmurs in binomial, it will involve multiple features for selection that can have a better classification of the heart murmurs. Nevertheless, since classification performance is vital to conclude the results, models are needed to compare the research's output. The main objective of this study is to classify the signal of the murmur via time-domain based EEG signals. In this study, significant time-domain features were identified to determine the best features by using different feature selection methods. It continues with the classification with different models to compete for the highest accuracy as the performance for murmur classification. A set of Michigan Heart Sound and Murmur database (MHSDB) was provided by the University of Michigan Health System with chosen signals listening with the bell of the stethoscope at the apex area by left decubitus posture of the subjects. The PCG signals are pre-processed by segmentation of three seconds, downsampling eight thousand Hz and normalized to -1, +1. Features are extracted into ten features: Root Mean Square, Variance, Standard Deviation, Maximum, Minimum, Median, Skewness, Shape Factor, Kurtosis, and Mean. Two cross-validation methods applied, such as data splitting and k-fold cross-validation, were used to measure this study's data. Chi-Square and ANOVA technique practice to identify the significant features to improve the classification performance. The classification learners are enrolled and compared by Ada Boost, Random Forest (RF) and Support Vector Machine (SVM). The datasets will separate into a ratio of 70:30 and 80:20 for training and testing data, respectively. The chi-Square selection method was the best features selection method and 80:20 data splitting with better performance than the 70:30 ratio. The best classification accuracy for the models significantly come by SVM with all the categories with 100% except 70:30 test on test data with 97.2% only.
How to Cite
Copyright (c) 2020 Wai Kit Cheng , Ismail Mohd Khairuddin, Anwar P.P. Abdul Majeed, Mohd Azraai Mohd Razman
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.