Comparison of BMA-LRM with Standard LRM of dataset = diabetes

BMA-LRM

1. Import data

setwd('D:/Rml')
dat = read.csv('diabetes.csv')
attach(dat)
head(dat)
##   X preg plas pres skin test mass  pedi age class
## 1 1    1   85   66   29    0 26.6 0.351  31     0
## 2 2    8  183   64    0    0 23.3 0.672  32     1
## 3 3    1   89   66   23   94 28.1 0.167  21     0
## 4 4    0  137   40   35  168 43.1 2.288  33     1
## 5 5    5  116   74    0    0 25.6 0.201  30     0
## 6 6    3   78   50   32   88 31.0 0.248  26     1

2. Creat BMA Logistic Regression Model of dataset=dat

library('BMA')
## Loading required package: survival
## Loading required package: leaps
## Loading required package: robustbase
## 
## Attaching package: 'robustbase'
## The following object is masked from 'package:survival':
## 
##     heart
## Loading required package: inline
## Loading required package: rrcov
## Scalable Robust Estimators with High Breakdown Point (version 1.5-2)
a = dat[,2:9] #The first variable of dat (X) & the last one (class) are not included because 'X' has no impact on the outcome; 'class' is the outcome.
b = class
M = bic.glm(a,b, strict=F, OR=20, glm.family='binomial')
summary(M)
## 
## Call:
## bic.glm.data.frame(x = a, y = b, glm.family = "binomial", strict = F,     OR = 20)
## 
## 
##   9  models were selected
##  Best  5  models (cumulative posterior probability =  0.8631 ): 
## 
##            p!=0    EV         SD         model 1     model 2     model 3   
## Intercept  100    -8.2244498  0.7158212  -8.402e+00  -7.942e+00  -8.112e+00
## preg       100.0   0.1427687  0.0289066   1.414e-01   1.530e-01   1.366e-01
## plas       100.0   0.0341824  0.0034509   3.374e-02   3.457e-02   3.407e-02
## pres        38.6  -0.0046798  0.0066866       .      -1.199e-02       .    
## skin         2.8  -0.0001952  0.0015267       .           .           .    
## test         8.9  -0.0001159  0.0004418       .           .           .    
## mass       100.0   0.0817407  0.0144130   7.807e-02   8.479e-02   8.151e-02
## pedi        85.9   0.7780202  0.4167900   8.964e-01   9.056e-01       .    
## age          7.6   0.0010814  0.0045725       .           .           .    
##                                                                            
## nVar                                        4           5           3      
## BIC                                      -4.328e+03  -4.327e+03  -4.325e+03
## post prob                                 0.407       0.258       0.087    
##            model 4     model 5   
## Intercept  -8.618e+00  -7.646e+00
## preg        1.367e-01   1.480e-01
## plas        3.552e-02   3.488e-02
## pres            .      -1.190e-02
## skin            .           .    
## test       -1.319e-03       .    
## mass        8.107e-02   8.811e-02
## pedi        9.534e-01       .    
## age             .           .    
##                                  
## nVar          5           4      
## BIC        -4.324e+03  -4.324e+03
## post prob   0.057       0.054

SUMMARY:

From the result above, select model 1.

=> Model: P = 1/(1 + exp(-(8.402 + 0.1414.preg + 0.03374.plas + 0.07807.mass + 0.8964.pedi)))

Odds ratio of the chosen variables in model 1:

  • OR1 = 1.1518853
  • OR2 = 1.0343156
  • OR3 = 1.0811983
  • OR4 = 2.4507645

=> ‘Pedi’ has the greatest impact on ‘class’.

=> If ‘Pedi’ increases 1 unit, the probability of being positive to diabetes increases ~ 2.45 times.

Standard LRM

N = glm(class~preg+plas+pres+skin+test+mass+pedi+age, family='binomial')
summary(N)
## 
## Call:
## glm(formula = class ~ preg + plas + pres + skin + test + mass + 
##     pedi + age, family = "binomial")
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.5529  -0.7281  -0.4195   0.7271   2.9275  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -8.3862464  0.7163838 -11.706  < 2e-16 ***
## preg         0.1232043  0.0320586   3.843 0.000121 ***
## plas         0.0350533  0.0037073   9.455  < 2e-16 ***
## pres        -0.0132215  0.0052316  -2.527 0.011497 *  
## skin         0.0003301  0.0069091   0.048 0.961898    
## test        -0.0011554  0.0009025  -1.280 0.200478    
## mass         0.0897505  0.0150822   5.951 2.67e-09 ***
## pedi         0.9414265  0.2989965   3.149 0.001640 ** 
## age          0.0146080  0.0093408   1.564 0.117844    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 991.38  on 766  degrees of freedom
## Residual deviance: 722.79  on 758  degrees of freedom
## AIC: 740.79
## 
## Number of Fisher Scoring iterations: 5

SUMMARY

From the result above, we can see:

Only ‘preg’, ‘plas’, ‘pres’, ‘mass’ & ‘pedi’ have impact on the outcome (‘class’) (because their Pr values are all < 0.05 respectively).

=> Select those variables to create the appropriate model.

=> Model: class = -8.386 + 0.123.preg + 0.035.plas - 0.013.pres + 0.089.mass + 0.94.pedi

Odds ratio of the chosen variables:

  • OR1 = 1.1311106
  • OR2 = 1.0356715
  • OR3 = 0.986867
  • OR4 = 1.0938461
  • OR5 = 2.5640807

=> ‘Pedi’ still has the greatest impact on ‘class’.

=> If ‘Pedi’ increases 1 unit, the probability of being positive to diabetes increases ~ 2.56 times

CONCLUSION

  • Of both the 2 cases (BMA-LRM & Standard LRM), the results show that ‘Pedi’ has the greatest impact on the outcome - ‘class’.
  • In the Standard Logistic Regression Model case, the Odds Ratio of ‘Pedi’ is ~ 2.56, whereas that of ‘Pedi’ is ~ 2.45 in the BMA Logistic Regression Model case.

=> The BMR-LRM has the closest & accurate result.