A Quick and Easy Diabetes test for Women

Frankie Ragnet
18/11/2015

Motivation

The goal of this project was to create a simple and easy diabetis test for women, that could be used for remote populations and areas. This had to meet the following conditions:

Easy to run
Reliable, based on state-of-the art machine learning techniques
Usable even in remote areas, with limited medical equipement

This led to a simple Machine-Learning prediction algorithm, available online, with easy-to-collect parameters.

Approach (1/2)

We used diabetes data collected on Pima Indian Women (MASS library):

Variable	Description
npreg	number of pregnancies
glu	plasma glucose concentration in an oral glucose tolerance test
bp	diastolic blood pressure
skin	triceps skin fold thickness
bmi	body mass index
ped	diabetes pedigree function
age	age of the patient
type	Diabetes diagnostic according to WHO criteria

We trained a Boosted Tree model and plotted variable importance:

library(caret)
library(MASS)
modF<-train(type~.,
data=Pima.tr)
plot(varImp(modF))

plot of chunk unnamed-chunk-1

Approach (2/2)

Considering for each parameter: 1. variable importance (see graph on previous slide) and 2. ease of collection, we chose the following shortlist for our final prediction application:

glu (easy oral test, very high impact on prediction)
age and BMI, both easy to collect (although BMI is not straightforward, but can be easily computed from height and weight of the patient).
npreg - fairly low importance, but very easy to collect.

Although they did have some influence on the model, we did not include ped or skin, as those were difficult to measure and brought minimal information to the final prediction model.

Likewise, although easy to collect, bp was did not have any impact on the model, so we did not include either.

Application

We therefore designed our application as a simple, easy to use prediction engine available on shiny.rstudio.io for everyone to use.

Entering the selected parameters allows medical staff or patient to get an instant prediction (and probability) for diabetes.

As a “bonus”, it also provides a computation and plot of the patient's BMI, based on height and weight, compared to training population. This is very useful as BMI is usually a good health indicator, but is not obvious to compute.

The application can be tested at: https://frankieragnet.shinyapps.io/myShinyPage