AN INTEGRATED HYBRID SOFT VOTING ENSEMBLE AI MODEL OF MACHINE LEARNING AND DEEP LEARNING MODELS FOR DIABETES PREDICTION
DOI:
https://doi.org/10.15282//ijsecs.11.2.2025.13.0145Keywords:
Diabetes prediction, Machine learning, Deep learning, Soft voting classifier, AI-Based medical diagnosis, Hybrid ensemble learningAbstract
The goal of the study is to make a hybrid prediction model that uses both machine learning and deep learning methods to make diabetes predictions more accurate, generalizable, and strong. It combines ML and DL models, fixes class imbalance in medical datasets, and tests performance on several datasets, such as the Pima Indians Diabetes Dataset and the LMCH dataset, to see how well it works in real-life healthcare. The ML Ensemble, which included RF, LR, and XGBoost, and the DL Ensemble, which included CNN, FNN, and ENN, were the two stacked ensembles used in the study. Soft voting was used to aggregate the results in order to improve the accuracy of the predictions. In order to prepare the structured medical data, we employed feature preprocessing techniques and the Synthetic Minority Over-sampling Technique (SMOTE). Cross-validation was used to ensure that the results were good and to prevent them from being overly specific. The performance was compared to independent models and standard methods. The ensemble hybrid AI model performed better than the traditional ML and DL models. Its best performance metrics were Accuracy of 98.89%, Precision of 98.99%, Recall of 87.07%, F1-score of 92.05%, ROC-AUC of 92.48%, and Cohen's Kappa of 84.96%. This shows that it was better at making generalizations and working with datasets that weren't balanced. The stacking of ensembles with soft voting combines machine learning and deep learning models to improve diabetes prediction performance and fix problems with class imbalance in medical datasets. The model's ability to be used in the real world and its ability to be generalized show that it could be used to find diabetes early and accurately, which could help with preventive healthcare strategies.
References
[1] World Health Organization. Diabetes [Internet]. Geneva: World Health Organization; 2024 Nov 14 [cited 2026 Jan 22]. Available from: https://www.who.int/news-room/fact-sheets/detail/diabetes
[2] Tran TT, Yun G, Kim S. Artificial intelligence and predictive models for early detection of acute kidney injury: transforming clinical practice. BMC Nephrol. 2024 Oct;25(1). doi:10.1186/s12882-024-03793-7.
[3] Katiyar N, Thakur HK, Ghatak A. Recent advancements using machine learning and deep learning approaches for diabetes detection: a systematic review. e-Prime Adv Electr Eng Electron Energy. 2024 Sep;9:100661. doi:10.1016/j.prime.2024.100661.
[4] Allal Z, Noura HN, Chahine K. Machine learning algorithms for solar irradiance prediction: a recent comparative study. e-Prime Adv Electr Eng Electron Energy. 2024 Mar;7:100453. doi:10.1016/j.prime.2024.100453.
[5] Altamimi A, et al. An automated approach to predict diabetic patients using KNN imputation and effective data mining techniques. BMC Med Res Methodol. 2024 Sep;24(1). doi:10.1186/s12874-024-02324-0.
[6] Ansari GA, Shafi S, Ansari MD, Shadab A. Advanced supervised machine learning methods for precise diabetes mellitus prediction using feature selection. Front Med. 2025 Sep;12. doi:10.3389/fmed.2025.1620268.
[7] Khokhar PB, Gravino C, Palomba F. Advances in artificial intelligence for diabetes prediction: insights from a systematic literature review. Artif Intell Med. 2025 Apr;164:103132. doi:10.1016/j.artmed.2025.103132.
[8] Talukder MA, Talaat AS, Kazi M. HXAI-ML: a hybrid explainable artificial intelligence-based machine learning model for cardiovascular heart disease detection. Results Eng. 2025 Feb;104370. doi:10.1016/j.rineng.2025.104370.
[9] Abdelbaky I, Ahmed M, Taha M. Machine learning classification approaches for prediction of effective diabetes drugs. Egypt Inform J. 2025 Sep;31:100786. doi:10.1016/j.eij.2025.100786.
[10] Li W, Peng Y, Peng K. Diabetes prediction model based on GA-XGBoost and stacking ensemble algorithm. PLoS One. 2024 Sep;19(9):e0311222. doi:10.1371/journal.pone.0311222.
[11] Amer AA, Ravana SD, Ahamed R. Effective k-nearest neighbor models for data classification enhancement. J Big Data. 2025 Apr;12(1). doi:10.1186/s40537-025-01137-2.
[12] Shamshirband S, Fathi M, Dehzangi A, Chronopoulos AT, Alinejad-Rokny H. A review on deep learning approaches in healthcare systems: taxonomies, challenges, and open issues. J Biomed Inform. 2020 Nov;113:103627. doi:10.1016/j.jbi.2020.103627.
[13] Zafar MM, Khan ZA, Javaid N, Aslam M, Alrajeh N. From data to diagnosis: a novel deep learning model for early and accurate diabetes prediction. Healthcare. 2025 Aug;13(17):2138. doi:10.3390/healthcare13172138.
[14] Qamar T, Bawany NZ. Understanding the black-box: towards interpretable and reliable deep learning models. PeerJ Comput Sci. 2023 Nov;9:e1629. doi:10.7717/peerj-cs.1629.
[15] Rondón-Cordero VH, Montuori L, Alcázar-Ortega M, Siano P. Advancements in hybrid and ensemble ML models for energy consumption forecasting: results and challenges of their applications. Renew Sustain Energy Rev. 2025 Jul;224:116095. doi:10.1016/j.rser.2025.116095.
[16] Rustam F, et al. Enhanced detection of diabetes mellitus using novel ensemble feature engineering approach and machine learning model. Sci Rep. 2024 Oct;14(1). doi:10.1038/s41598-024-74357-w.
[17] Abousaber I, Abdallah HF, El-Ghaish H. Robust predictive framework for diabetes classification using optimized machine learning on imbalanced datasets. Front Artif Intell. 2025 Jan;7. doi:10.3389/frai.2024.1499530.
[18] Kibria HB, Nahiduzzaman M, Goni MOF, Ahsan M, Haider J. An ensemble approach for the prediction of diabetes mellitus using a soft voting classifier with an explainable AI. Sensors. 2022 Sep;22(19):7268. doi:10.3390/s22197268.
[19] Ijaz M, Alfian G, Syafrudin M, Rhee J. Hybrid prediction model for type 2 diabetes and hypertension using DBSCAN-based outlier detection, SMOTE, and random forest. Appl Sci. 2018 Aug;8(8):1325. doi:10.3390/app8081325.
[20] Olorunfemi BO, et al. Efficient diagnosis of diabetes mellitus using an improved ensemble method. Sci Rep. 2025 Jan;15(1). doi:10.1038/s41598-025-87767-1.
[21] Ahmed N, et al. Machine learning-based diabetes prediction and development of smart web application. Int J Cogn Comput Eng. 2021 Jun;2:229–41. doi:10.1016/j.ijcce.2021.12.001.
[22] Fan Y. Diabetes diagnosis using a hybrid CNN-LSTM-MLP ensemble. Sci Rep. 2025 Jul;15(1). doi:10.1038/s41598-025-12151-y.
[23] Kumari S, Kumar D, Mittal M. An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. Int J Cogn Comput Eng. 2021 Jun;2:40–6. doi:10.1016/j.ijcce.2021.01.001.
[24] Afsaneh E, Sharifdini A, Ghazzaghi H, Ghobadi MZ. Recent applications of machine learning and deep learning models in the prediction, diagnosis, and management of diabetes: a comprehensive review. Diabetol Metab Syndr. 2022 Dec;14(1). doi:10.1186/s13098-022-00969-9.
[25] Zhang W, Xia Z, Cai G, Wang J, Dong X. Enhancing diabetes risk prediction through focal active learning and machine learning models. PLoS One. 2025 Jan;20(7):e0327120. doi:10.1371/journal.pone.0327120.
[26] Dash S, et al. Privacy-preserving diabetes and heart disease prediction via federated learning and WCO. Int J Comput Intell Syst. 2025 Aug;18(1). doi:10.1007/s44196-025-00956-8.
[27] Bejugam SK, Vankara J. An efficient model for diabetic detection using heuristic approach-based serial cascaded convolutional ensemble network. Artif Intell Rev. 2025 Aug;58(10). doi:10.1007/s10462-025-11334-3.
[28] Alzubaidi L, et al. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data. 2021 Mar;8(1):1–74. doi:10.1186/s40537-021-00444-8.
[29] Rihan A, Anbar M, Alabsi BA. Meta-learner-based approach for detecting attacks on Internet of Things networks. Sensors. 2023 Sep;23(19):8191. doi:10.3390/s23198191.
[30] Kaggle. Pima Indians diabetes database [Internet]. 2016 [cited 2026 Jan 22]. Available from: https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database
[31] Ziya. Diabetes clinical dataset (100k rows) [Internet]. Kaggle; 2025 [cited 2026 Jan 22]. Available from: https://www.kaggle.com/datasets/ziya07/diabetes-clinical-dataset100k-rows
[32] Python Software Foundation. Python [Internet]. 2025 [cited 2026 Jan 22]. Available from: https://www.python.org/
[33] Scikit-learn. Importance of feature scaling [Internet]. [cited 2026 Jan 22]. Available from: https://scikit-learn.org/stable/auto_examples/preprocessing/plot_scaling_importance.html
[34] Sonia JJ, Jayachandran P, Md AQ, Mohan S, Sivaraman AK, Tee KF. Machine-learning-based diabetes mellitus risk prediction using multi-layer neural network No-Prop algorithm. Diagnostics. 2023 Jan;13(4):723. doi:10.3390/diagnostics13040723.
[35] Kim GI, Kim S, Jang B. Classification of mathematical test questions using machine learning on datasets of learning management system questions. PLoS One. 2023 Oct;18(10):e0286989. doi:10.1371/journal.pone.0286989.
[36] Singh K, Mahajan A, Mansotra V. 1D-CNN-based model for classification and analysis of network attacks. Int J Adv Comput Sci Appl. 2021;12(11). doi:10.14569/ijacsa.2021.0121169.
[37] Han Y, Kim DY, Woo J, Kim J. Glu-Ensemble: an ensemble deep learning framework for blood glucose forecasting in type 2 diabetes patients. Heliyon. 2024 Apr;10(8):e29030. doi:10.1016/j.heliyon.2024.e29030.
[38] Liu Q, et al. Development and validation of a meta-learner for combining statistical and machine learning prediction models in individuals with depression. BMC Psychiatry. 2022 May;22(1). doi:10.1186/s12888-022-03986-0.
[39] Khan AA, Chaudhari O, Chandra R. A review of ensemble learning and data augmentation models for class imbalanced problems: combination, implementation and evaluation. Expert Syst Appl. 2023 Dec;244:122778. doi:10.1016/j.eswa.2023.122778.
[40] Kuo AT, Chen H, Tang L, Ku W, Qin X. ProbSky: efficient computation of probabilistic skyline queries over distributed data. IEEE Trans Knowl Data Eng. 2022 Jan. doi:10.1109/TKDE.2022.3151740.
[41] Liu X, Yang DN, Ye M, Lee WC. U-Skyline: a new skyline query for uncertain databases. IEEE Trans Knowl Data Eng. 2013 Apr;25(4):945–60. doi:10.1109/TKDE.2012.33.
[42] Chen B, Zhu D, Wang Y, Zhang P. An approach to combine the power of deep reinforcement learning with a graph neural network for routing optimization. Electronics. 2022 Jan;11(3):368. doi:10.3390/electronics11030368.
[43] Islam MZ. Enhancing diabetes prediction accuracy using stacked machine learning and deep learning models: a public health approach. Indones J Comput Sci. 2025 Aug;14(4). doi:10.33022/ijcs.v14i4.4947.
[44] Kumar P. A comparison of CNN, RNN, and FNN algorithms to investigate effective diabetes prediction. Arch Comput Methods Eng. 2025 Sep. doi:10.1007/s11831-025-10333-5.
[45] Oliullah K, Rasel M, Islam MM, Islam MR, Anwar M, Whaiduzzaman M. A stacked ensemble machine learning approach for the prediction of diabetes. J Diabetes Metab Disord. 2023 Nov. doi:10.1007/s40200-023-01321-2.
[46] Barton M, Lennox B. Model stacking to improve prediction and variable importance robustness for soft sensor development. Digit Chem Eng. 2022 Jun;3:100034. doi:10.1016/j.dche.2022.100034.
[47] Halder RK, Lima MA, Uddin MN, Islam MA, Saha A. Integrated feature selection-based stacking ensemble model using optimized hyperparameters to predict breast cancer with smart web application. Clin eHealth. 2025 Aug. doi:10.1016/j.ceh.2025.08.001.
[48] Razzaq K, Shah M. Next-generation machine learning in healthcare fraud detection: current trends, challenges, and future research directions. Information. 2025 Aug;16(9):730. doi:10.3390/info16090730.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 The Author(s)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.



