Covid-19 Comparative Data Analysis Using Various Machine Learning Regression Algorithms
Abstract
In this study, the authors investigated the potential of machine learning (ML) algorithms to forecast the spread of COVID-19 using data from three distinct datasets: confirmed, deceased, and recoverable cases. Eight different ML techniques were employed to develop regression models, and the Root Mean Squared Error (RMSE) score was used to assess their accuracy. The study found that some algorithms performed better than others for specific dataset combinations. Linear regression, ridge regression, Bayesian ridge regression, and least angle regression were the most accurate algorithms for predicting confirmed and recoverable cases, while decision tree regression and random forest regression were the best for deceased cases. The study offers valuable insights into the ability of ML to identify patterns and trends in virus transmission, enabling more precise predictions about its future course. However, ensuring that the data utilized to train machine learning models are trustworthy and accurate in reflecting the actual situation is essential. The quality and accessibility of data significantly impact the accuracy of machine learning models, particularly in the case of COVID-19, where accurate data exchange and collection have been challenging. Overall, the findings of this study contribute to the growing body of knowledge on the application of ML in healthcare and disease forecasting, potentially providing valuable information for policy-makers and healthcare professionals.
References
1. A. Tomar and N. Gupta; Prediction for the Spread of COVID-19 in India and Effectiveness of Preventive Measures. Science of the Total Environment, 728, Artilce Id 138762, 1-6 (2020)
2. C. Fraley and T. Hesterberg; Least Angle Regression and LASSO for Large Datasets. Statistical Analysis and Data Mining, 1(4), 251-259 (2009).
3. COVID-19IndiaData,availableat:https://www.covid19india.org/.
4. D.Manner;J.W.SeamanandD.M.Young;BayesianMethodsforRegressionusingSurrogateVariables.
Biometrical Journal, 46(6), 750-759 (2004).
5. F. C. Jiang and S. F. Tian; (2015) Research on the Confidence Regression Based on KNN Algorithm.
Applied Mechanics and Materials, 2014 International Conference on Mechatronics Engineering and Modern Technologies in Industrial Engineering (MEMTIE 2014), Changsha, Hunan, China, 713-715, 1877-1881.
6. F. Shahid; A. Zameer and M. Muneeb; Predictions for COVID-19 with Deep Learning Models of LSTM, GRU and Bi-LSTM. Chaos, Solitons & Fractals, 140, Article Id 110212, 1-9 (2020).
7. G. Mendez and S. Lohr; Estimating Residual Variance in Random Forest Regression. Comput Stat Data Anal, 55(11), 2937-2950 (2011).
8. L.A.Amar;A.A.TahaandM.Y.Mohamed;PredictionoftheFinalSizeforCOVID-19Epidemicusing Machine Learning: A Case Study of Egypt. Infectious Disease Modelling, 5, 622-634 (2020).
9. M. N. Alenezi; F. S. Al-Anzi and H. Alabdulrazzaq; Building a Sensible SIR Estimation Model for COVID-19 Outspread in Kuwait. Alexandria Engineering Journal, 60(3), 3161-3175 (2021).
10.M. Saqib; Forecasting COVID-19 Outbreak Progression using Hybrid Polynomial-Bayesian Ridge Regression Model. Applied Intelligence, 51, 2703-2713 (2020).
11.M. Xu; P. Watanachaturaporn; P. K. Varshney and M. K. Arora; Decision Tree Regression for Soft Classification of Remote Sensing Data. Remote Sens Environ, 97(3), 322-336 (2005).
12.M. Yadav; M. Perumal and M. Srinivas; Analysis on Novel Coronavirus (COVID-19) using Machine Learning Methods. Chaos, Solitons & Fractals, 139, Artilce Id 110050, 1-12 (2020).
13. N. Zhu; D. Zhang; W. Wang; X. Li; B. Yang; J. Song; X. Zhao; B. Huang; W. Shi; R. Lu; P. Niu; F. Zhan; X. Ma; D. Wang; W. Xu; G. Wu; G. F. Gao and W. Tan; A Novel Coronavirus from Patients with Pneumonia in China, 2019. New England Journal of Medicine, 382(8), 727-733 (2020).
14. P. Arora; H. Kumar and B. K. Panigrahi; Prediction and Analysis of COVID-19 Positive Cases using Deep Learning Models: A Descriptive Case study of India. Chaos, Solitons & Fractals, 139, Artilce Id 110017, 1- 9 (2020).
15. P. Wang; X. Zheng; J. Li and B. Zhu; Prediction of Epidemic Trends in COVID-19 with Logistic Model and Machine Learning Technics. Chaos, Solitons & Fractals, 139, Artilce Id 110058, 1-7 (2020).
16.R. Hilgenfeld; M. Peiris; From SARS to MERS: 10 years of research on highly pathogenic human coronaviruses, Antiviral Res (2013) doi:10.1016/j.antiviral.2013.08.015.
17. R. K. Mojjada; A. Yadav; A. V. Prabhu and Y. Natarajan; Machine Learning Models for Covid-19 Future Forecasting. Mater Today Proc (2020). Doi:10.1016/j.matpr.2020.10.962.
18.S. Chan; J. Chu; Y. Zhang and S. Nadarajah; Count regression models for COVID-19, Physica A: Statistical Mechanics and its Applications, 563, Article Id 125460, 1-10 (2021).
19. S. Ghosal; S. Sengupta; M. Majumder and B. Sinha; Linear Regression Analysis to predict the number of deaths in India due to SARS-CoV-2 at 6 weeks from day 0 (100 cases - March 14th 2020). Diabetes and Metabolic Syndrome: Clinical Research and Reviews, 14(4), 311-315 (2020).
20. S. Gupta; Y. Ramadevi and K. Agarwal; COVID Pandemic Analysis using Auto-Regression-Based Moving Average Method. Mater Today Proc (2021) doi:10.1016/j.matpr.2021.01.710.
21. S. K. Prion and K. A. Haerling; Making Sense of Methods and Measurements: Simple Linear Regression. Clin Simul Nurs, 48, 94-95 (2020).
22.S. Lipovetsky and W. M. Conklin; Ridge Regression in Two-parameter Solution. Applied Stochastic Models in Business and Industry, 21(6),525-540 (2005).
23. S. Rath; A. Tripathy; A. R. Tripathy; Prediction of New Active Cases of Coronavirus Disease (COVID-19) Pandemic using Multiple Linear Regression Model. Diabetes and Metabolic Syndrome: Clinical Research and Reviews, 14(5), 1467-1474 (2020).
24. S. Tuli; S. Tuli; R. Tuli and S. S. Gill; Predicting the Growth and Trend of COVID-19 Pandemic using Machine Learning and Cloud Computing. Internet of Things, 11, Artilce Id 100222, 1-16 (2020).
25. S. Zhang; Q. Hu, Z. Xie and J. Mi; Kernel Ridge Regression for General Noise Model with Its Application. Neurocomputing, 149 (B), 836-846 (2015).
26. V. K. R. Chimmula and L. Zhang; Time Series Forecasting of COVID-19 Transmission in CANADA using LSTM Networks. Chaos, Solitons & Fractals, 135, Artilce Id 109864, 1-6 (2020).
27. WHO, WHO Director-General's Opening Remarks at the Media Briefing on COVID-19, 2020, Available at https://www.who.int/director-general/speeches/detail/who-director-general-s-opening-remarks-at-the- media-briefing-on-covid-19---11-march-2020.
28. W. P. T. M. Wickramaarachchi and S. S. N. Perera; An SIER Model to Estimate Optimal Transmission Rate and Initial Parameters of COVD-19 Dynamic in Sri Lanka. Alexandria Engineering Journal, 60(1), 1557- 1563 (2021).
29. Z. Erlisa; R. Setiawan and A. Effendi; A Comparison: Prediction of Death and Infected COVID-19 Cases in Indonesia using Time Series Smoothing and LSTM Neural Network. Procedia Comput Sci, 179, 982- 988 (2021).