Frankie Ragnet
18/11/2015
The goal of this project was to create a simple and easy diabetis test for women, that could be used for remote populations and areas. This had to meet the following conditions:
This led to a simple Machine-Learning prediction algorithm, available online, with easy-to-collect parameters.
We used diabetes data collected on Pima Indian Women (MASS library):
| Variable | Description |
|---|---|
| npreg | number of pregnancies |
| glu | plasma glucose concentration in an oral glucose tolerance test |
| bp | diastolic blood pressure |
| skin | triceps skin fold thickness |
| bmi | body mass index |
| ped | diabetes pedigree function |
| age | age of the patient |
| type | Diabetes diagnostic according to WHO criteria |
We trained a Boosted Tree model and plotted variable importance:
library(caret)
library(MASS)
modF<-train(type~.,
data=Pima.tr)
plot(varImp(modF))
Considering for each parameter: 1. variable importance (see graph on previous slide) and 2. ease of collection, we chose the following shortlist for our final prediction application:
Although they did have some influence on the model, we did not include ped or skin, as those were difficult to measure and brought minimal information to the final prediction model.
Likewise, although easy to collect, bp was did not have any impact on the model, so we did not include either.
We therefore designed our application as a simple, easy to use prediction engine available on shiny.rstudio.io for everyone to use.
Entering the selected parameters allows medical staff or patient to get an instant prediction (and probability) for diabetes.
As a “bonus”, it also provides a computation and plot of the patient's BMI, based on height and weight, compared to training population. This is very useful as BMI is usually a good health indicator, but is not obvious to compute.
The application can be tested at: https://frankieragnet.shinyapps.io/myShinyPage