Tran Binh Thang
9 March 2017
library(foreign)
r=read.dta("C:/Users/BINH THANG/Dropbox/Korea/STudy/Thesis/data management/DataR/dataR.dta")
attach(r)
newdata <- r[ which(r$ct==1),]
newdata$cost1=as.factor(newdata$cost1)
newdata$c1=as.factor(newdata$c1)
newdata$h1=as.factor(newdata$h1)
newdata$label1=as.factor(newdata$label1)
newdata$policy=as.factor(newdata$policy)
newdata$ter_fa1=as.factor(newdata$ter_fa1)
newdata$educ2=as.factor(newdata$educ2)
newdata$age_group=as.factor(newdata$age_group)
newdata$d1a=as.factor(newdata$d1a)
newdata$selfhealth=as.factor(newdata$selfhealth)
newdata$b18a=as.factor(newdata$b18a)
newdata$b16a=as.factor(newdata$b16a)
newdata$b6a=as.factor(newdata$b6a)
newdata$smostt=as.factor(newdata$smostt)
newdata1=newdata[sample(1:nrow(newdata), 461, replace=F),]
train=newdata1[1:415, ]
View(train)
test=newdata1[415:461, ]
View(test)
library("brms")
library("caret")
library("coda")
library("rstan")
options(mc.cores = parallel::detectCores())
prior=get_prior(formula=cost1~age_group+c1+d1a+educ2+ter_fa1+selfhealth+smostt+b16a+b6a+b18a+policy, family="bernoulli", data=train)
baybin=brm(formula=cost1~age_group+ d1a+educ2+ter_fa1+selfhealth+smostt+b16a+b6a+b18a+policy, family="bernoulli", data=train, chains=5, iter=2000, warmup=1000)
summary(baybin)
Family: bernoulli (logit)
Formula: cost1 ~ age_group + d1a + educ2 + ter_fa1 + selfhealth + smostt + b16a + b6a + b18a + policy
Data: train (Number of observations: 392)
Samples: 5 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 5000
WAIC: Not computed
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept -0.82 0.51 -1.82 0.19 5000 1
age_groupgr3039 -0.21 0.39 -0.97 0.56 3614 1
age_groupgr4049 -0.57 0.39 -1.35 0.19 3062 1
age_groupgr5059 -0.48 0.42 -1.32 0.33 3199 1
age_group60plus -0.25 0.58 -1.38 0.88 5000 1
d1amarried 0.56 0.33 -0.09 1.22 3094 1
educ22 0.07 0.31 -0.54 0.67 3996 1
educ23 0.84 0.38 0.10 1.59 3763 1
ter_fa12 -0.26 0.27 -0.81 0.29 5000 1
ter_fa13 0.21 0.30 -0.36 0.80 5000 1
selfhealthgood 0.80 0.24 0.34 1.27 5000 1
smosttmedium 0.33 0.30 -0.23 0.91 5000 1
smosttheavy 0.52 0.31 -0.09 1.14 4586 1
b16a1 -0.44 0.33 -1.10 0.20 5000 1
b6ayes 0.44 0.31 -0.16 1.06 5000 1
b18abad 0.14 0.25 -0.33 0.62 5000 1
policy1 -0.08 0.26 -0.59 0.43 5000 1
policy2 -0.18 0.33 -0.83 0.48 5000 1
policy3 -1.32 1.41 -4.50 1.03 5000 1
Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
is a crude measure of effective sample size, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
prob=as.data.frame(predict(baybin,test,type="response"))
library(e1071)
pred=ifelse(prob$Estimate> 0.5,1,0)
confusionMatrix(data=pred,reference=test$cost1)
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 14 12
1 8 13
Accuracy : 0.5745
95% CI : (0.4218, 0.7174)
No Information Rate : 0.5319
P-Value [Acc > NIR] : 0.3317
Kappa : 0.1547
Mcnemar's Test P-Value : 0.5023
Sensitivity : 0.6364
Specificity : 0.5200
Pos Pred Value : 0.5385
Neg Pred Value : 0.6190
Prevalence : 0.4681
Detection Rate : 0.2979
Detection Prevalence : 0.5532
Balanced Accuracy : 0.5782
'Positive' Class : 0
Slide coppied from BS Lê Đông Nhật Nam