Imputation of new COVID-19 cases missing data using basic statistical methods

Authors

  • Nor Zila binti Abd Hamid Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, 35900, Tanjong Malim, Perak, Malaysia
  • Aswani Ahmad Zambri Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, 35900, Tanjong Malim, Perak, Malaysia
  • Nurul Bahiyah binti Abd Wahid Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, 35900, Tanjong Malim, Perak, Malaysia
  • Nur Hamiza binti Adenan Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, 35900, Tanjong Malim, Perak, Malaysia
  • Nor Hafizah Binti Md Husin Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, 35900, Tanjong Malim, Perak, Malaysia
  • Noor Wahida binti Md. Junus Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, 35900, Tanjong Malim, Perak, Malaysia
  • Nor Suriya binti Abd Karim Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, 35900, Tanjong Malim, Perak, Malaysia
  • Rawdah Adawiyah binti Tarmizi Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, 35900, Tanjong Malim, Perak, Malaysia

DOI:

https://doi.org/10.15282/daam.v5i2.9705

Keywords:

Imputation method, Missing data, Basic statistical method, COVID-19 cases

Abstract

This study investigates the imputation of new COVID-19 cases with missing data in Kedah and Selangor states by using basic statistical methods. This study aims to impute missing data using four basic statistical methods and compare the methods using a performance index. The four basic statistical methods applied in this study are Linear Interpolation, Top Bottom Average, 7-Day Average, and 14-Day Average. The time series data employed is the number of new COVID-19 cases for 365 days, which was recorded in daily numbers for 2021 in Kedah and Selangor states. The time series data was sampled and randomly removed by 10% and 20%. The removed data will be imputed using the four basic statistical methods. The performance indices used to compare the performance of the basic statistical methods were mean absolute error (MAE), root mean square error (RMSE), and correlation coefficient (CC). Overall, Linear Interpolation and 14-day Average are suitable basic statistical methods for finding missing data. The findings of this study suggest that basic statistical techniques can be instrumental in supporting the Malaysian Ministry of Health (MOH) in filling gaps in data on new COVID-19 cases in future initiatives.

References

[1] Yong SS, Sia JK. COVID-19 and social wellbeing in Malaysia: A case study. Current Psychology. 2023;42(12):9577-91.

[2] Lim LL. The socioeconomic impacts of COVID-19 in Malaysia: Policy review and guidance for protecting the most vulnerable and supporting enterprises. International Labour Organization. 2020:1-99.

[3] Azit NA, Mohd Suan MA, Omar N, Dali N, Romli M, Md Yusof MA, Ahmad M, Ibrahim MZ, Abdul Rahman S. Epidemiological investigation of a covid-19 community cluster in kedah, malaysia. International Journal of Travel Medicine and Global Health. 2022;10(1):10-5.

[4] Hanis TM, Arifin WN, Musa KI, Hasani WS, Nawi CM, Shahrani SA, Chen XW, Suliman MA, Khan EE, Ab Aziz WA, Said MZ. Risk factors for COVID-19 mortality in Malaysia. The Malaysian Journal of Medical Sciences: MJMS. 2022;29(6):123.

[5] Cheng C. Pandemic Economics: the impact of the COVID-19 pandemic on the Malaysian economy. In 2020 RIN Online Workshop Series on COVID-19. Available at: https://d-arch. ide. go. jp/RIN/common/pdf/2020-09_ws-abstract_4-2_calvin. pdf 2020.

[6] Malaysian National Security Council. MySOP. Putrajaya: Majlis Keselamatan Negara. 2022.

[7] Ministry of Health Malaysia. COVID-19 Malaysia. Retrieved from https://data.moh.gov.my/dashboard/covid-19; 2023.

[8] Shafii H, Radzi NA, Yassin AM, Masram H. Implementing COVID-19 Standard Operation Procedure (SOP) in Malaysia Construction Industry: Challenges and Strategies. International Journal of Property Sciences (E-ISSN: 2229-8568). 2022;12(1):37-53.

[9] Sufian SA, Nordin NA, Tauji SS, Nasir MK. The impacts of Covid-19 to the situation of Malaysian education. International Journal of Academic Research in Progressive Education and Development. 2020;9(2):764-74.

[10] Zulfakar HHB, Yusof AMB, Bin Sapian MK, Nallaluthan K. A Review of Covid-19 Pandemic Impacts on Malaysian Manufacturing Industries. Quality and Quantity Research Review. 2021;6(3):116–28.

[11] Ghapor AA, Zubairi YZ, Imon AHMR. Missing value estimation methods for data in linear functional relationship model. Sains Malaysiana. 2017;46(2):317–26.

[12] Pham HT, Do T, Baek J, Nguyen CK, Pham QT, Nguyen HL, Goldberg R, Pham QL, Giang LM. Handling missing data in COVID-19 incidence estimation: Secondary data analysis. JMIR Public Health and Surveillance. 2024;10:e53719.

[13] Rendana M, Idris WMR, Rahim SA. Effect of COVID-19 movement control order policy on water quality changes in Sungai Langat, Selangor, Malaysia within distinct land use areas. Sains Malaysiana. 2022;51(5):1587–98.

[14] Zainuri NA, Jemain AA, Muda N. A comparison of various imputation methods for missing values in air quality data. Sains Malaysiana. 2015;44(3):449–56.

[15] Sukatis FF, Noor NM, Zakaria NA, Ul-Saufie AZ, Suwardi A. Estimation of missing values in air pollution dataset by using various imputation methods. International Journal of Conservation Science. 2019;10(4):791–804.

[16] Libasin Z, Ul-Saufie AZ, Ahmat H, Shaziayani WN. Single and multiple imputation method to replace missing values in air pollution datasets: A review. IOP Conference Series: Earth and Environmental Science. 2020;616(1):012021.

[17] Libasin Z, Fauzi WSWM, Ul-Saufie AZ, Idris N, Mazeni NA. Evaluation of single missing value imputation techniques for incomplete air particulates matter (PM10) data in Malaysia. Pertanika Journal of Science and Technology. 2021;29(4):3099–112.

[18] Chen M, Zhu H, Chen Y, Wang Y. A novel missing data imputation approach for time series air quality data based on logistic regression. Atmosphere. 2022;13(7):1044.

[19] Schober P, Schwarte LA. Correlation coefficients: Appropriate use and interpretation. Anesthesia and Analgesia. 2018;126(5):1763–8.

Downloads

Published

2024-09-30

Issue

Section

Research Articles

How to Cite

[1]
N. Z. binti A. Hamid, “Imputation of new COVID-19 cases missing data using basic statistical methods”, Data Anal. Appl. Math., vol. 5, no. 2, pp. 23–27, Sep. 2024, doi: 10.15282/daam.v5i2.9705.

Similar Articles

1-10 of 52

You may also start an advanced similarity search for this article.