A Quick and Easy Diabetes test for Women

Frankie Ragnet
18/11/2015

Motivation

The goal of this project was to create a simple and easy diabetis test for women, that could be used for remote populations and areas. This had to meet the following conditions:

  • Easy to run
  • Reliable, based on state-of-the art machine learning techniques
  • Usable even in remote areas, with limited medical equipement

This led to a simple Machine-Learning prediction algorithm, available online, with easy-to-collect parameters.

Approach (1/2)

We used diabetes data collected on Pima Indian Women (MASS library):

Variable Description
npreg number of pregnancies
glu plasma glucose concentration in an oral glucose tolerance test
bp diastolic blood pressure
skin triceps skin fold thickness
bmi body mass index
ped diabetes pedigree function
age age of the patient
type Diabetes diagnostic according to WHO criteria

We trained a Boosted Tree model and plotted variable importance:

library(caret)
library(MASS)
modF<-train(type~.,
data=Pima.tr)
plot(varImp(modF))

plot of chunk unnamed-chunk-1

Approach (2/2)

Considering for each parameter: 1. variable importance (see graph on previous slide) and 2. ease of collection, we chose the following shortlist for our final prediction application:

  • glu (easy oral test, very high impact on prediction)
  • age and BMI, both easy to collect (although BMI is not straightforward, but can be easily computed from height and weight of the patient).
  • npreg - fairly low importance, but very easy to collect.

Although they did have some influence on the model, we did not include ped or skin, as those were difficult to measure and brought minimal information to the final prediction model.

Likewise, although easy to collect, bp was did not have any impact on the model, so we did not include either.

Application

We therefore designed our application as a simple, easy to use prediction engine available on shiny.rstudio.io for everyone to use.

Entering the selected parameters allows medical staff or patient to get an instant prediction (and probability) for diabetes.

As a “bonus”, it also provides a computation and plot of the patient's BMI, based on height and weight, compared to training population. This is very useful as BMI is usually a good health indicator, but is not obvious to compute.

The application can be tested at: https://frankieragnet.shinyapps.io/myShinyPage