Abstract:
Cardiovascular disease prediction aids practitioners in making more accurate health decisions for their patients. Early detection
can aid people in making lifestyle changes and, if necessary, ensuring effective medical care. Machine learning (ML) is a
plausible option for reducing and understanding heart symptoms of disease. The chi-square statistical test is performed to
select specific attributes from the Cleveland heart disease (HD) dataset. Support vector machine (SVM), Gaussian Naive Bayes,
logistic regression, LightGBM, XGBoost, and random forest algorithm have been employed for developing heart disease risk
prediction model and obtained the accuracy as 80.32%, 78.68%, 80.32%, 77.04%, 73.77%, and 88.5%, respectively. The data
visualization has been generated to illustrate the relationship between the features. According to the findings of the
experiments, the random forest algorithm achieves 88.5% accuracy during validation for 303 data instances with 13 selected
features of the Cleveland HD dataset.