I’ve been working on a Bayesian replication of some frequentist analyses (Early et al. 2007) this term. I’m trying to wrap my head around how the approaches are similar and different, and what you do and don’t get from the two types of results. In particular, I’ve been thinking about null results and how estimation procedures like Bayesian modeling can facilitate interpretation there.
The replication I’m working on is of one part of a really neat meta-analysis done by Early et al. (2007) on teacher training and classroom outcomes for preschoolers across the country. The short version of the results is that teacher education variables appear to not be related to classroom quality or child learning outcomes (I’m way over-simplifying this, obviously). Check out the article if you’d like the complete story; it’s a good piece of work.
Since null results like these can be irritatingly uninformative within a frequentist framework, I want to take the opportunity to compare and contrast that with what the Bayesian analyses yield. Unfortunately, I’m still struggling to get the full model to run on the real dataset (which is large and messy, as real datasets are). In the meantime, here’s a demonstration with a toy dataset. I did try to include some genuine mess in the toy dataset as well (skewed data, missingness), so it’s still real-ish…
Make up the data
# Meth Lab Presentation Nov 2014
# null results in Bayes vs. frequentist stats
rm(list = ls())
# -------------------------------------
# make up some data, large N
# -------------------------------------
n <- 1000
x1 <- rnorm(n)
x2 <- rf(n, df1=1, df2=5) # skewed
cond <- c(rep(1, n/2), rep(0, n/2))
# set true coefficients to use (more or less null results)
b0 <- 0
b1 <- 0
b2 <- .05
b3 <- .1
b4 <- .1
b5 <- 0
b6 <- .1
b7 <- 0
y <- b0 + b1*x1 + b2*x2 + b3*cond + b4*x1*x2 + b5*x1*cond + b6*x2*cond + b7*x1*x2*cond + rnorm(n, sd=5)
df <- data.frame(y=y, cond=as.factor(cond), x1=x1, x2=x2)
# introduce missing data (MAR)
df$rand <- runif(n, min=0, max=1) # random numbers for making MAR missingness
df$x1 <- ifelse(df$rand>.5 & df$x2<median(df$x2), NA, df$x1)
df$x2 <- ifelse(df$rand<.25, NA, df$x2)
df <- subset(df, select=-rand) # drop rand column
summary(df)
## y cond x1 x2
## Min. :-15.946 0:500 Min. :-4.04 Min. : 0.00
## 1st Qu.: -3.351 1:500 1st Qu.:-0.69 1st Qu.: 0.13
## Median : 0.064 Median :-0.01 Median : 0.56
## Mean : 0.146 Mean :-0.02 Mean : 1.59
## 3rd Qu.: 3.595 3rd Qu.: 0.66 3rd Qu.: 1.61
## Max. : 16.391 Max. : 2.87 Max. :36.94
## NA's :249 NA's :236
Frequentist Analysis
# -------------------------------------
# frequentist analysis
# -------------------------------------
# listwise deletion
mod.ld <- lm(y ~ x1 * x2 * cond, data=df)
summary(mod.ld)
##
## Call:
## lm(formula = y ~ x1 * x2 * cond, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.221 -3.521 -0.118 3.527 14.998
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.2807 0.3782 -0.74 0.458
## x1 0.1572 0.3716 0.42 0.672
## x2 0.0853 0.0800 1.07 0.287
## cond1 -0.2049 0.5507 -0.37 0.710
## x1:x2 0.0209 0.0744 0.28 0.779
## x1:cond1 0.5139 0.5619 0.91 0.361
## x2:cond1 0.3107 0.1391 2.23 0.026 *
## x1:x2:cond1 -0.0268 0.1598 -0.17 0.867
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.16 on 507 degrees of freedom
## (485 observations deleted due to missingness)
## Multiple R-squared: 0.0378, Adjusted R-squared: 0.0245
## F-statistic: 2.85 on 7 and 507 DF, p-value: 0.0064
# multiple imputation
library(Amelia)
m <- 5
amelia.out <- amelia(x = df, noms= "cond", m = m)
## -- Imputation 1 --
##
## 1 2 3 4 5 6 7
##
## -- Imputation 2 --
##
## 1 2 3 4 5 6 7 8
##
## -- Imputation 3 --
##
## 1 2 3 4 5 6 7
##
## -- Imputation 4 --
##
## 1 2 3 4 5 6 7
##
## -- Imputation 5 --
##
## 1 2 3 4 5 6 7
library(Zelig)
mod.mi <- zelig(y ~ x1 * x2 * cond, data = amelia.out, model = "ls")
##
##
## How to cite this model in Zelig:
## Kosuke Imai, Gary King, and Olivia Lau. 2014.
## "ls: Least Squares Regression for Continuous Dependent Variables"
## in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
## http://gking.harvard.edu/zelig
##
round(summary(mod.mi)$coefficients,3)
## Value Std. Error t-stat p-value
## (Intercept) -0.253 0.271 -0.932 0.352
## x1 0.486 0.319 1.525 0.135
## x2 0.103 0.078 1.316 0.193
## cond1 0.232 0.396 0.585 0.559
## x1:x2 -0.008 0.069 -0.108 0.914
## x1:cond1 -0.088 0.442 -0.199 0.843
## x2:cond1 0.169 0.141 1.200 0.241
## x1:x2:cond1 0.000 0.142 0.001 1.000
Bayesian Analysis
# -------------------------------------
# bayesian analysis
# -------------------------------------
# prep data file for jags
jags.dat <- with(df, list("y"=y, "cond"=cond, "x1"=x1, "x2"=x2, "N"=length(y)))
jags.dat$cond <- as.numeric(jags.dat$cond) - 1
# save parameters for multiple imputation in jags
jags.dat$x1.mean <- mean(jags.dat$x1, na.rm=TRUE)
jags.dat$x1.prec <- 1/var(jags.dat$x1, na.rm=TRUE)
# for the gamma dist
library(psych)
# dgamma(r, lambda)
# skew = 2/sqrt(abs(r))
# r =(2/skew)^2
# mean = r/lambda
# lambda = r/mean
skew.x2 <- skew(jags.dat$x2, na.rm=TRUE)
mean.x2 <- mean(jags.dat$x2, na.rm=TRUE)
jags.dat$x2.r <- (2/skew.x2)^2
jags.dat$x2.lambda <- ((2/skew.x2)^2)/mean.x2
str(jags.dat)
## List of 9
## $ y : num [1:1000] -10.952 -0.988 2.607 1.309 0.688 ...
## $ cond : num [1:1000] 1 1 1 1 1 1 1 1 1 1 ...
## $ x1 : num [1:1000] -0.3426 -1.8495 0.2944 0.0731 -1.6893 ...
## $ x2 : num [1:1000] NA 0.373 1.265 1.731 2.643 ...
## $ N : int 1000
## $ x1.mean : num -0.0181
## $ x1.prec : num 1.05
## $ x2.r : num 0.141
## $ x2.lambda: num 0.0886
# write the jags model
jags.mod <- function() {
for(i in 1:N){
y[i] ~ dnorm(mu[i], tau)
mu[i] <- b0 + b1*x1[i] + b2*x2[i] + b3*cond[i] + b4*x1[i]*x2[i] + b5*x1[i]*cond[i] + b6*x2[i]*cond[i] + b7*x1[i]*x2[i]*cond[i]
}
b0 ~ dnorm(0, 0.1)
b1 ~ dnorm(0, 0.1)
b2 ~ dnorm(0, 0.1)
b3 ~ dnorm(0, 0.1)
b4 ~ dnorm(0, 0.1)
b5 ~ dnorm(0, 0.1)
b6 ~ dnorm(0, 0.1)
b7 ~ dnorm(0, 0.1)
# priors on predictors, for multiple imputation
for(i in 1:N){
x1[i] ~ dnorm(x1.mean,x1.prec)
x2[i] ~ dgamma(x2.r,x2.lambda)
}
tau ~ dgamma(0.01, 0.01)
sigma <- 1/sqrt(tau)
}
# specify parameters to track
params <- c("b0", "b1", "b2", "b3", "b4", "b5", "b6", "b7", "sigma")
# specify inits for missing values separately from observed values
x1.inits <- jags.dat$x1
x2.inits <- jags.dat$x2
x1.inits[is.na(jags.dat$x1)] <- median(jags.dat$x1, na.rm=TRUE)
x2.inits[is.na(jags.dat$x2)] <- median(jags.dat$x2, na.rm=TRUE)
x1.inits[!is.na(jags.dat$x1)] <- NA
x2.inits[!is.na(jags.dat$x2)] <- NA
inits1 <-list(tau= 1, b0= 0, b1= 0, b2= 0, b3=0, b4=0, b5=0, b6=0, b7=0,
x1=x1.inits, x2=x2.inits)
inits2 <-list(tau= 1, b0= 0, b1= 0, b2= 0, b3=0, b4=0, b5=0, b6=0, b7=0,
x1=x1.inits, x2=x2.inits)
inits = list(inits1, inits2)
# run the model
library(R2jags)
fit <- jags(data = jags.dat, inits = inits, parameters.to.save = params,
n.chains = 2, n.iter = 10000, n.burnin = 5000, n.thin = 1, model.file = jags.mod)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph Size: 11022
##
## Initializing model
round(fit$BUGSoutput$summary[, c(1, 2, 3, 7)],3)
## mean sd 2.5% 97.5%
## b0 -0.237 0.260 -0.732 0.280
## b1 0.385 0.321 -0.250 1.008
## b2 0.097 0.076 -0.051 0.246
## b3 0.213 0.371 -0.514 0.945
## b4 0.016 0.076 -0.136 0.160
## b5 -0.047 0.480 -0.967 0.905
## b6 0.196 0.119 -0.035 0.432
## b7 0.010 0.148 -0.275 0.292
## deviance 10525.133 6.817 10511.434 10538.596
## sigma 5.211 0.118 4.986 5.446
library(mcmcplots)
fit.mcmc <- as.mcmc(fit)
# mcmcplot(fit.mcmc)
library(ggmcmc)
fit.gg <- ggs(fit.mcmc)
fit.gg$Chain <- 1 # collapse both chains together, since convergence appears to be good
# ggs_density(fit.gg, family="b")
library(ggplot2)
betas <- subset(fit.gg, Parameter=="b1" | Parameter=="b2" | Parameter=="b3" | Parameter=="b4" | Parameter=="b5" | Parameter=="b6" | Parameter=="b7")
ggplot(betas, aes(x=value)) +
facet_wrap(~Parameter, scales="fixed", ncol=2) +
geom_density(color="lightblue", fill="lightblue", alpha=.7) +
geom_vline(x=0, linetype = 8) +
ggtitle(paste("N =",n)) +
theme_bw()

With smaller N
Frequentist Analysis
Bayesian Analysis
Examining Results
true.betas <- c(b0,b1,b2,b3,b4,b5,b6,b7)
freq.res <- as.data.frame(round(summary(mod.mi)$coefficients,3))
colnames(freq.res) <- c("Estimate", "SE", "t", "p")
freq.res <- cbind(true.betas, freq.res) # to ease comparison
library(dplyr)
freq.res <- mutate(freq.res,
ci.low = Estimate-SE*1.96,
ci.high = Estimate+SE*1.96)
bayes.res <- as.data.frame(round(fit$BUGSoutput$summary[, c(1, 2, 3, 7)],3))[1:8,]
colnames(bayes.res) <- c("Estimate", "SD", "ci.low", "ci.high")
bayes.res <- cbind(true.betas, bayes.res) # to ease comparison
library(knitr)
kable(freq.res, format="html", caption="Frequentist Results Summary")
Frequentist Results Summary
|
true.betas
|
Estimate
|
SE
|
t
|
p
|
ci.low
|
ci.high
|
|
0.00
|
-0.253
|
0.271
|
-0.932
|
0.352
|
-0.7842
|
0.2782
|
|
0.00
|
0.486
|
0.319
|
1.525
|
0.135
|
-0.1392
|
1.1112
|
|
0.05
|
0.103
|
0.078
|
1.316
|
0.193
|
-0.0499
|
0.2559
|
|
0.10
|
0.232
|
0.396
|
0.585
|
0.559
|
-0.5442
|
1.0082
|
|
0.10
|
-0.008
|
0.069
|
-0.108
|
0.914
|
-0.1432
|
0.1272
|
|
0.00
|
-0.088
|
0.442
|
-0.199
|
0.843
|
-0.9543
|
0.7783
|
|
0.10
|
0.169
|
0.141
|
1.200
|
0.241
|
-0.1074
|
0.4454
|
|
0.00
|
0.000
|
0.142
|
0.001
|
1.000
|
-0.2783
|
0.2783
|
kable(bayes.res, format="html", caption="Bayesian Results Summary")
Bayesian Results Summary
| |
true.betas
|
Estimate
|
SD
|
ci.low
|
ci.high
|
|
b0
|
0.00
|
-0.237
|
0.260
|
-0.732
|
0.280
|
|
b1
|
0.00
|
0.385
|
0.321
|
-0.250
|
1.008
|
|
b2
|
0.05
|
0.097
|
0.076
|
-0.051
|
0.246
|
|
b3
|
0.10
|
0.213
|
0.371
|
-0.514
|
0.945
|
|
b4
|
0.10
|
0.016
|
0.076
|
-0.136
|
0.160
|
|
b5
|
0.00
|
-0.047
|
0.480
|
-0.967
|
0.905
|
|
b6
|
0.10
|
0.196
|
0.119
|
-0.035
|
0.432
|
|
b7
|
0.00
|
0.010
|
0.148
|
-0.275
|
0.292
|
# -------------------------------------
# examining b1, b2, b3 in particular (main effects)
# -------------------------------------
beta1 <- subset(fit.gg, Parameter=="b1")
p <- ggplot(beta1, aes(x=value)) +
geom_density(fill="lightblue", color="lightblue", alpha=.7) +
ggtitle(paste("Coefficient for x1, true effect =", b1)) +
theme_bw()
#p # bayesian posterior
p <- p + geom_vline(x=bayes.res$Estimate[2], linetype = 1, color="blue") +
geom_vline(x=c(bayes.res$ci.low[2], bayes.res$ci.high[2]), linetype = 8, color="blue")
#p # with bayesian ci and esimtate added
p <- p + geom_vline(x=freq.res$Estimate[2], linetype = 1, color="red") +
geom_vline(x=c(freq.res$ci.low[2], freq.res$ci.high[2]), linetype = 8, color="red")
p # with frequentist ci and estimate added

beta2 <- subset(fit.gg, Parameter=="b2")
p <- ggplot(beta2, aes(x=value)) +
geom_density(fill="lightblue", color="lightblue" , alpha=.7) +
ggtitle(paste("Coefficient for x2, true effect =", b2)) +
theme_bw()
#p # bayesian posterior
p <- p + geom_vline(x=bayes.res$Estimate[3], linetype = 1, color="blue") +
geom_vline(x=c(bayes.res$ci.low[3], bayes.res$ci.high[3]), linetype = 8, color="blue")
#p # with bayesian ci and esimtate added
p <- p + geom_vline(x=freq.res$Estimate[3], linetype = 1, color="red") +
geom_vline(x=c(freq.res$ci.low[3], freq.res$ci.high[3]), linetype = 8, color="red")
p # with frequentist ci and estimate added

beta3 <- subset(fit.gg, Parameter=="b3")
p <- ggplot(beta3, aes(x=value)) +
geom_density(fill="lightblue", color="lightblue", alpha=.7) +
ggtitle(paste("Coefficient for condition, true effect =", b3)) +
theme_bw()
#p # bayesian posterior
p <- p + geom_vline(x=bayes.res$Estimate[4], linetype = 1, color="blue") +
geom_vline(x=c(bayes.res$ci.low[4], bayes.res$ci.high[4]), linetype = 8, color="blue")
#p # with bayesian ci and esimtate added
p <- p + geom_vline(x=freq.res$Estimate[4], linetype = 1, color="red") +
geom_vline(x=c(freq.res$ci.low[4], freq.res$ci.high[4]), linetype = 8, color="red")
p # with frequentist ci and estimate added

key difference:
we’re looking at probability density functions
- The total area under the curve is always 1
- y-axis is not interpretable as probability of individual values for theta
- the probability of a range of values can be determined by the relevant area under the curve
- point probabilities can be interpreted relatively (i.e. as a ratio)
# what is the probability that b2 falls with .1 of 0?
length(which(beta2$value>-.1 & beta2$value<.1))/length(beta2$value)
## [1] 0.5138
# what is the probability that b2 falls with .2 of 0?
length(which(beta2$value>-.2 & beta2$value<.2))/length(beta2$value)
## [1] 0.9104
# what is the probability that b2 is positive?
length(which(beta2$value>0))/length(beta2$value)
## [1] 0.897
# how to calcuate a density point for a posterior?