1. Import data
setwd('D:/Rml')
dat = read.csv('diabetes.csv')
attach(dat)
head(dat)
## X preg plas pres skin test mass pedi age class
## 1 1 1 85 66 29 0 26.6 0.351 31 0
## 2 2 8 183 64 0 0 23.3 0.672 32 1
## 3 3 1 89 66 23 94 28.1 0.167 21 0
## 4 4 0 137 40 35 168 43.1 2.288 33 1
## 5 5 5 116 74 0 0 25.6 0.201 30 0
## 6 6 3 78 50 32 88 31.0 0.248 26 1
2. Creat BMA Logistic Regression Model of dataset=dat
library('BMA')
## Loading required package: survival
## Loading required package: leaps
## Loading required package: robustbase
##
## Attaching package: 'robustbase'
## The following object is masked from 'package:survival':
##
## heart
## Loading required package: inline
## Loading required package: rrcov
## Scalable Robust Estimators with High Breakdown Point (version 1.5-2)
a = dat[,2:9] #The first variable of dat (X) & the last one (class) are not included because 'X' has no impact on the outcome; 'class' is the outcome.
b = class
M = bic.glm(a,b, strict=F, OR=20, glm.family='binomial')
summary(M)
##
## Call:
## bic.glm.data.frame(x = a, y = b, glm.family = "binomial", strict = F, OR = 20)
##
##
## 9 models were selected
## Best 5 models (cumulative posterior probability = 0.8631 ):
##
## p!=0 EV SD model 1 model 2 model 3
## Intercept 100 -8.2244498 0.7158212 -8.402e+00 -7.942e+00 -8.112e+00
## preg 100.0 0.1427687 0.0289066 1.414e-01 1.530e-01 1.366e-01
## plas 100.0 0.0341824 0.0034509 3.374e-02 3.457e-02 3.407e-02
## pres 38.6 -0.0046798 0.0066866 . -1.199e-02 .
## skin 2.8 -0.0001952 0.0015267 . . .
## test 8.9 -0.0001159 0.0004418 . . .
## mass 100.0 0.0817407 0.0144130 7.807e-02 8.479e-02 8.151e-02
## pedi 85.9 0.7780202 0.4167900 8.964e-01 9.056e-01 .
## age 7.6 0.0010814 0.0045725 . . .
##
## nVar 4 5 3
## BIC -4.328e+03 -4.327e+03 -4.325e+03
## post prob 0.407 0.258 0.087
## model 4 model 5
## Intercept -8.618e+00 -7.646e+00
## preg 1.367e-01 1.480e-01
## plas 3.552e-02 3.488e-02
## pres . -1.190e-02
## skin . .
## test -1.319e-03 .
## mass 8.107e-02 8.811e-02
## pedi 9.534e-01 .
## age . .
##
## nVar 5 4
## BIC -4.324e+03 -4.324e+03
## post prob 0.057 0.054
SUMMARY:
From the result above, select model 1.
=> Model 1: P1 = 1/(1 + exp(-(8.402 + 0.1414.preg + 0.03374.plas + 0.07807.mass + 0.8964.pedi)))
Odds ratio of the chosen variables in model 1:
=> ‘Pedi’ has the greatest impact on ‘class’.
=> If ‘Pedi’ increases 1 unit, the probability of being positive to diabetes increases ~ 2.45 times.
Model 2: P2 = 1/(1 + exp(-(-7.94 + 0.153.preg + 0.0346.plas - 0.01199.pres + 0.0848.mass + 0.906.pedi)))
Odds ratio of the chosen variables in model 2:
EXAMPLE:
P2 = 0.0446316 => ‘class’~ 0.
P2 = 0.8073857 => ‘class’ ~ 1.
N = glm(class~preg+plas+pres+skin+test+mass+pedi+age, family='binomial')
summary(N)
##
## Call:
## glm(formula = class ~ preg + plas + pres + skin + test + mass +
## pedi + age, family = "binomial")
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.5529 -0.7281 -0.4195 0.7271 2.9275
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -8.3862464 0.7163838 -11.706 < 2e-16 ***
## preg 0.1232043 0.0320586 3.843 0.000121 ***
## plas 0.0350533 0.0037073 9.455 < 2e-16 ***
## pres -0.0132215 0.0052316 -2.527 0.011497 *
## skin 0.0003301 0.0069091 0.048 0.961898
## test -0.0011554 0.0009025 -1.280 0.200478
## mass 0.0897505 0.0150822 5.951 2.67e-09 ***
## pedi 0.9414265 0.2989965 3.149 0.001640 **
## age 0.0146080 0.0093408 1.564 0.117844
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 991.38 on 766 degrees of freedom
## Residual deviance: 722.79 on 758 degrees of freedom
## AIC: 740.79
##
## Number of Fisher Scoring iterations: 5
SUMMARY
From the result above, we can see:
Only ‘preg’, ‘plas’, ‘pres’, ‘mass’ & ‘pedi’ have impact on the outcome (‘class’) (because their Pr values are all < 0.05 respectively).
=> Select those variables to create the appropriate model.
=> Model: P = 1/(1 + exp(-(-8.386 + 0.123.preg + 0.035.plas - 0.013.pres + 0.089.mass + 0.94.pedi)))
Odds ratio of the chosen variables:
=> ‘Pedi’ still has the greatest impact on ‘class’.
=> If ‘Pedi’ increases 1 unit, the probability of being positive to diabetes increases ~ 2.56 times
EXAMPLE:
P = 0.0308088 => ‘class’ ~ 0.
P = 0.7060715 => ‘class’ ~ 1.
=> The BMA-LRM has the closest & accurate result.