4/2/2022

Introduction

  • Diabetes is a chronic, metabolic disease characterized by elevated levels of blood glucose, which leads over time to serious damage to the heart, blood vessels, eyes, kidneys and nerves.

  • The most common is type 2 diabetes, usually in adults, which occurs when the body becomes resistant to insulin or doesn’t make enough insulin.

  • About 422 million people worldwide have diabetes, the majority living in low-and middle-income countries, and 1.5 million deaths are directly attributed to diabetes each year.

  • Both the number of cases and the prevalence of diabetes have been steadily increasing over the past few decades.

  • Better tools for screening are needed in order to detect early-onset diabetes and reduce morbidity and mortality.

Our Predictor Tool

  • A Random Forest model was created with the Early Stage Diabetes Risk Prediction Dataset from kaggle.

  • The Web App has a sidebar panel, in which the user inputs the corresponding patient data. The model takes this input and displays a prediction in the main panel, along with the probability of being classified as Positive or Negative.

  • Instructions for the use of the App can be found in the [documentation].

  • The script to generate the Training and Testing Data sets, the model and it’s corresponding confusion matrix can be found on the GitHub Repository

The implementation of Machine Learning models tools as Decision Support Systems for health care providers could help with the screening and early identification of patients at risk of diabetes or at early stages, before complications occur.

DISCLAIMER: it is important to note that this is not a diagnostic tool, and should not be used to establish or exclude a diagnosis.

Confusion Matrix

These are the result of the confusion matrix for our model predictions on the testing set.

confusionMat$table
##           Reference
## Prediction Negative Positive
##   Negative       39        1
##   Positive        1       63
as.matrix(confusionMat$byClass, dimnames = TRUE)[c(1:4, 8), ]
##    Sensitivity    Specificity Pos Pred Value Neg Pred Value     Prevalence 
##      0.9843750      0.9750000      0.9843750      0.9750000      0.6153846

References



  1. Islam, MM Faniqul, et al. ‘Likelihood prediction of diabetes at early stage using data mining techniques.’ Computer Vision and Machine Intelligence in Medical Image Analysis. Springer, Singapore, 2020. 113-125.
  2. Diabetes, World Health Organization