Machine learning model for prediction of malaria in low and high endemic areas of Tanzania
Abstract
Presumptive treatment and self-medication with anti-malaria drugs is a common practice in
most limited resource settings that hinders proper management of malaria. However, these
approaches have been considered unreliable due to the unnecessary use of malaria medication
and untreated diseases that relate to malaria. This study aimed to develop a machine-learning
model for malaria diagnosis using patients’ symptoms and non-symptomatic features in high
and low endemic areas of Tanzania. The malaria diagnosis dataset with 2556 patient’s records
and 36 features was collected in two regions of Tanzania: Morogoro and Kilimanjaro from
2015 -2019. Machine learning classifiers with the k-fold cross-validation methods were used
to train and validate the model. To improve the performance of the diagnostic model, important
features for malaria diagnosis were selected, and it was observed that the ranking of features
differs among regions and when combined dataset. Significant features selected are residence
area, fever, age, general body malaise, visit date, and headache. Random Forest and Decision
Tree algorithms were the best performing classifiers in modelling malaria diagnosis datasets
and attained 96%, 99% and 98% prediction accuracy for Kilimanjaro, Combined and Morogoro
dataset respectively. These best-performing classifiers were evaluated using the unseen malaria
diagnosis dataset and performed well in classifying malaria patients from sick patients. The
final developed model showed that only a specific combination of features can predict malaria
accurately. The results of this study revealed that malaria diagnosis using patients’ symptoms
and demographic features is possible. Also, the study results offer additional knowledge and
shed light on the state diagnosis of malaria in the country. The developed machine learning
model enables prediction of patient’s malaria state using symptoms observed and non-
symptomatic features before prescription of anti-malaria drugs. Apart from that the output of
this study will be a necessary step in designing a malaria diagnosis decision support system
through the developed model. Furthermore, towards reducing drug resistance, the results of
this study can be used by the policymakers and the Ministry of Health for better management
of malaria disease in health facilities and drug dispensing outlets to avoid self-medication and
presumptive treatment.