Bayesian Logistic model

Tran Binh Thang
9 March 2017

Content

  • Read data and coding
  • packages
  • Code
  • results

Read data and code

library(foreign)
r=read.dta("C:/Users/BINH THANG/Dropbox/Korea/STudy/Thesis/data management/DataR/dataR.dta")
attach(r)
newdata <- r[ which(r$ct==1),]

Coding

newdata$cost1=as.factor(newdata$cost1)
newdata$c1=as.factor(newdata$c1)
newdata$h1=as.factor(newdata$h1)
newdata$label1=as.factor(newdata$label1)
newdata$policy=as.factor(newdata$policy)
newdata$ter_fa1=as.factor(newdata$ter_fa1)
newdata$educ2=as.factor(newdata$educ2)
newdata$age_group=as.factor(newdata$age_group)
newdata$d1a=as.factor(newdata$d1a)
newdata$selfhealth=as.factor(newdata$selfhealth)
newdata$b18a=as.factor(newdata$b18a)
newdata$b16a=as.factor(newdata$b16a)
newdata$b6a=as.factor(newdata$b6a)
newdata$smostt=as.factor(newdata$smostt)

Deviding data (train and test)

newdata1=newdata[sample(1:nrow(newdata), 461, replace=F),]

train=newdata1[1:415, ]
View(train)

test=newdata1[415:461, ]
View(test)

Deviding data (train and test)

library("brms")
library("caret")
library("coda")
library("rstan")
options(mc.cores = parallel::detectCores())

Building model

prior=get_prior(formula=cost1~age_group+c1+d1a+educ2+ter_fa1+selfhealth+smostt+b16a+b6a+b18a+policy, family="bernoulli", data=train)
baybin=brm(formula=cost1~age_group+ d1a+educ2+ter_fa1+selfhealth+smostt+b16a+b6a+b18a+policy, family="bernoulli", data=train, chains=5, iter=2000, warmup=1000)

Product the results

summary(baybin)
 Family: bernoulli (logit) 
Formula: cost1 ~ age_group + d1a + educ2 + ter_fa1 + selfhealth + smostt + b16a + b6a + b18a + policy 
   Data: train (Number of observations: 392) 
Samples: 5 chains, each with iter = 2000; warmup = 1000; thin = 1; 
         total post-warmup samples = 5000
   WAIC: Not computed

Population-Level Effects: 
                Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept          -0.82      0.51    -1.82     0.19       5000    1
age_groupgr3039    -0.21      0.39    -0.97     0.56       3614    1
age_groupgr4049    -0.57      0.39    -1.35     0.19       3062    1
age_groupgr5059    -0.48      0.42    -1.32     0.33       3199    1
age_group60plus    -0.25      0.58    -1.38     0.88       5000    1
d1amarried          0.56      0.33    -0.09     1.22       3094    1
educ22              0.07      0.31    -0.54     0.67       3996    1
educ23              0.84      0.38     0.10     1.59       3763    1
ter_fa12           -0.26      0.27    -0.81     0.29       5000    1
ter_fa13            0.21      0.30    -0.36     0.80       5000    1
selfhealthgood      0.80      0.24     0.34     1.27       5000    1
smosttmedium        0.33      0.30    -0.23     0.91       5000    1
smosttheavy         0.52      0.31    -0.09     1.14       4586    1
b16a1              -0.44      0.33    -1.10     0.20       5000    1
b6ayes              0.44      0.31    -0.16     1.06       5000    1
b18abad             0.14      0.25    -0.33     0.62       5000    1
policy1            -0.08      0.26    -0.59     0.43       5000    1
policy2            -0.18      0.33    -0.83     0.48       5000    1
policy3            -1.32      1.41    -4.50     1.03       5000    1

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).

test model

prob=as.data.frame(predict(baybin,test,type="response"))
library(e1071)
pred=ifelse(prob$Estimate> 0.5,1,0)

Results: accuracy of model

confusionMatrix(data=pred,reference=test$cost1)
Confusion Matrix and Statistics

          Reference
Prediction  0  1
         0 14 12
         1  8 13

               Accuracy : 0.5745          
                 95% CI : (0.4218, 0.7174)
    No Information Rate : 0.5319          
    P-Value [Acc > NIR] : 0.3317          

                  Kappa : 0.1547          
 Mcnemar's Test P-Value : 0.5023          

            Sensitivity : 0.6364          
            Specificity : 0.5200          
         Pos Pred Value : 0.5385          
         Neg Pred Value : 0.6190          
             Prevalence : 0.4681          
         Detection Rate : 0.2979          
   Detection Prevalence : 0.5532          
      Balanced Accuracy : 0.5782          

       'Positive' Class : 0               

Slide With Plot

plot of chunk unnamed-chunk-9

Slide coppied from BS Nam (Namlundidong)

Slide coppied from BS Lê Đông Nhật Nam