I’ve been working on a Bayesian replication of some frequentist analyses (Early et al. 2007) this term. I’m trying to wrap my head around how the approaches are similar and different, and what you do and don’t get from the two types of results. In particular, I’ve been thinking about null results and how estimation procedures like Bayesian modeling can facilitate interpretation there.
The replication I’m working on is of one part of a really neat meta-analysis done by Early et al. (2007) on teacher training and classroom outcomes for preschoolers across the country. The short version of the results is that teacher education variables appear to not be related to classroom quality or child learning outcomes (I’m way over-simplifying this, obviously). Check out the article if you’d like the complete story; it’s a good piece of work.
Since null results like these can be irritatingly uninformative within a frequentist framework, I want to take the opportunity to compare and contrast that with what the Bayesian analyses yield. Unfortunately, I’m still struggling to get the full model to run on the real dataset (which is large and messy, as real datasets are). In the meantime, here’s a demonstration with a toy dataset. I did try to include some genuine mess in the toy dataset as well (skewed data, missingness), so it’s still real-ish…

Make up the data

# Meth Lab Presentation Nov 2014
# null results in Bayes vs. frequentist stats
rm(list = ls())
# -------------------------------------
# make up some data, large N
# -------------------------------------
n <- 1000
x1 <- rnorm(n) 
x2 <- rf(n, df1=1, df2=5) # skewed
cond <- c(rep(1, n/2), rep(0, n/2))

# set true coefficients to use (more or less null results)
b0 <- 0
b1 <- 0
b2 <- .05
b3 <- .1
b4 <- .1
b5 <- 0
b6 <- .1
b7 <- 0
y <- b0 + b1*x1 + b2*x2 + b3*cond + b4*x1*x2 + b5*x1*cond + b6*x2*cond + b7*x1*x2*cond + rnorm(n, sd=5)

df <- data.frame(y=y, cond=as.factor(cond), x1=x1, x2=x2)

# introduce missing data (MAR)
df$rand <- runif(n, min=0, max=1) # random numbers for making MAR missingness
df$x1 <- ifelse(df$rand>.5 & df$x2<median(df$x2), NA, df$x1)
df$x2 <- ifelse(df$rand<.25, NA, df$x2)
df <- subset(df, select=-rand) # drop rand column

summary(df)

##        y           cond          x1              x2       
##  Min.   :-15.946   0:500   Min.   :-4.04   Min.   : 0.00  
##  1st Qu.: -3.351   1:500   1st Qu.:-0.69   1st Qu.: 0.13  
##  Median :  0.064           Median :-0.01   Median : 0.56  
##  Mean   :  0.146           Mean   :-0.02   Mean   : 1.59  
##  3rd Qu.:  3.595           3rd Qu.: 0.66   3rd Qu.: 1.61  
##  Max.   : 16.391           Max.   : 2.87   Max.   :36.94  
##                            NA's   :249     NA's   :236

Frequentist Analysis

# -------------------------------------
# frequentist analysis
# -------------------------------------
# listwise deletion
mod.ld <- lm(y ~ x1 * x2 * cond, data=df)
summary(mod.ld)

## 
## Call:
## lm(formula = y ~ x1 * x2 * cond, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.221  -3.521  -0.118   3.527  14.998 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  -0.2807     0.3782   -0.74    0.458  
## x1            0.1572     0.3716    0.42    0.672  
## x2            0.0853     0.0800    1.07    0.287  
## cond1        -0.2049     0.5507   -0.37    0.710  
## x1:x2         0.0209     0.0744    0.28    0.779  
## x1:cond1      0.5139     0.5619    0.91    0.361  
## x2:cond1      0.3107     0.1391    2.23    0.026 *
## x1:x2:cond1  -0.0268     0.1598   -0.17    0.867  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.16 on 507 degrees of freedom
##   (485 observations deleted due to missingness)
## Multiple R-squared:  0.0378, Adjusted R-squared:  0.0245 
## F-statistic: 2.85 on 7 and 507 DF,  p-value: 0.0064

# multiple imputation
library(Amelia)
m <- 5
amelia.out <- amelia(x = df, noms= "cond", m = m)

## -- Imputation 1 --
## 
##   1  2  3  4  5  6  7
## 
## -- Imputation 2 --
## 
##   1  2  3  4  5  6  7  8
## 
## -- Imputation 3 --
## 
##   1  2  3  4  5  6  7
## 
## -- Imputation 4 --
## 
##   1  2  3  4  5  6  7
## 
## -- Imputation 5 --
## 
##   1  2  3  4  5  6  7

library(Zelig)
mod.mi <- zelig(y ~ x1 * x2 * cond, data = amelia.out, model = "ls")

## 
## 
##  How to cite this model in Zelig:
##   Kosuke Imai, Gary King, and Olivia Lau. 2014.
##   "ls: Least Squares Regression for Continuous Dependent Variables"
##   in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
##   http://gking.harvard.edu/zelig
##

round(summary(mod.mi)$coefficients,3)

##              Value Std. Error t-stat p-value
## (Intercept) -0.253      0.271 -0.932   0.352
## x1           0.486      0.319  1.525   0.135
## x2           0.103      0.078  1.316   0.193
## cond1        0.232      0.396  0.585   0.559
## x1:x2       -0.008      0.069 -0.108   0.914
## x1:cond1    -0.088      0.442 -0.199   0.843
## x2:cond1     0.169      0.141  1.200   0.241
## x1:x2:cond1  0.000      0.142  0.001   1.000

Bayesian Analysis

# -------------------------------------
# bayesian analysis
# -------------------------------------

# prep data file for jags
jags.dat <- with(df, list("y"=y, "cond"=cond, "x1"=x1, "x2"=x2, "N"=length(y)))
jags.dat$cond <- as.numeric(jags.dat$cond) - 1

# save parameters for multiple imputation in jags
jags.dat$x1.mean <- mean(jags.dat$x1, na.rm=TRUE)
jags.dat$x1.prec <- 1/var(jags.dat$x1, na.rm=TRUE)
# for the gamma dist
library(psych)
# dgamma(r, lambda)
# skew = 2/sqrt(abs(r))
# r =(2/skew)^2
# mean = r/lambda
# lambda = r/mean
skew.x2 <- skew(jags.dat$x2, na.rm=TRUE)
mean.x2 <- mean(jags.dat$x2, na.rm=TRUE)
jags.dat$x2.r <- (2/skew.x2)^2 
jags.dat$x2.lambda <- ((2/skew.x2)^2)/mean.x2

str(jags.dat)

## List of 9
##  $ y        : num [1:1000] -10.952 -0.988 2.607 1.309 0.688 ...
##  $ cond     : num [1:1000] 1 1 1 1 1 1 1 1 1 1 ...
##  $ x1       : num [1:1000] -0.3426 -1.8495 0.2944 0.0731 -1.6893 ...
##  $ x2       : num [1:1000] NA 0.373 1.265 1.731 2.643 ...
##  $ N        : int 1000
##  $ x1.mean  : num -0.0181
##  $ x1.prec  : num 1.05
##  $ x2.r     : num 0.141
##  $ x2.lambda: num 0.0886

# write the jags model
jags.mod <- function() {
  for(i in 1:N){
  y[i] ~ dnorm(mu[i], tau)
  mu[i] <- b0 + b1*x1[i] + b2*x2[i] + b3*cond[i] + b4*x1[i]*x2[i] + b5*x1[i]*cond[i] + b6*x2[i]*cond[i] + b7*x1[i]*x2[i]*cond[i]
  }

  b0 ~ dnorm(0, 0.1)
  b1 ~ dnorm(0, 0.1)
  b2 ~ dnorm(0, 0.1)
  b3 ~ dnorm(0, 0.1)
  b4 ~ dnorm(0, 0.1)
  b5 ~ dnorm(0, 0.1)
  b6 ~ dnorm(0, 0.1)
  b7 ~ dnorm(0, 0.1)

  # priors on predictors, for multiple imputation
  for(i in 1:N){
    x1[i] ~ dnorm(x1.mean,x1.prec)
    x2[i] ~ dgamma(x2.r,x2.lambda)
  }   

  tau ~ dgamma(0.01, 0.01)
  sigma <- 1/sqrt(tau)
}

# specify parameters to track
params <- c("b0", "b1", "b2", "b3", "b4", "b5", "b6", "b7", "sigma")

# specify inits for missing values separately from observed values
x1.inits <- jags.dat$x1
x2.inits <- jags.dat$x2

x1.inits[is.na(jags.dat$x1)] <- median(jags.dat$x1, na.rm=TRUE)
x2.inits[is.na(jags.dat$x2)] <- median(jags.dat$x2, na.rm=TRUE)

x1.inits[!is.na(jags.dat$x1)] <- NA
x2.inits[!is.na(jags.dat$x2)] <- NA

inits1 <-list(tau= 1, b0= 0, b1= 0, b2= 0, b3=0, b4=0, b5=0, b6=0, b7=0,
      x1=x1.inits, x2=x2.inits)
inits2 <-list(tau= 1, b0= 0, b1= 0, b2= 0, b3=0, b4=0, b5=0, b6=0, b7=0,
      x1=x1.inits, x2=x2.inits)
inits = list(inits1, inits2)

# run the model
library(R2jags)
fit <- jags(data = jags.dat, inits = inits, parameters.to.save = params, 
    n.chains = 2, n.iter = 10000, n.burnin = 5000, n.thin = 1, model.file = jags.mod)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
##    Graph Size: 11022
## 
## Initializing model

round(fit$BUGSoutput$summary[, c(1, 2, 3, 7)],3)

##               mean    sd      2.5%     97.5%
## b0          -0.237 0.260    -0.732     0.280
## b1           0.385 0.321    -0.250     1.008
## b2           0.097 0.076    -0.051     0.246
## b3           0.213 0.371    -0.514     0.945
## b4           0.016 0.076    -0.136     0.160
## b5          -0.047 0.480    -0.967     0.905
## b6           0.196 0.119    -0.035     0.432
## b7           0.010 0.148    -0.275     0.292
## deviance 10525.133 6.817 10511.434 10538.596
## sigma        5.211 0.118     4.986     5.446

library(mcmcplots)
fit.mcmc <- as.mcmc(fit)
# mcmcplot(fit.mcmc)

library(ggmcmc)
fit.gg <- ggs(fit.mcmc)
fit.gg$Chain <- 1 # collapse both chains together, since convergence appears to be good
# ggs_density(fit.gg, family="b")
library(ggplot2)
betas <- subset(fit.gg, Parameter=="b1" | Parameter=="b2" | Parameter=="b3" | Parameter=="b4" | Parameter=="b5" | Parameter=="b6" | Parameter=="b7")
ggplot(betas, aes(x=value)) +
  facet_wrap(~Parameter, scales="fixed", ncol=2) +
  geom_density(color="lightblue", fill="lightblue", alpha=.7) +
  geom_vline(x=0, linetype = 8) +
  ggtitle(paste("N =",n)) +
  theme_bw()

plot of chunk unnamed-chunk-3

With smaller N

Frequentist Analysis

Bayesian Analysis

Examining Results

true.betas <- c(b0,b1,b2,b3,b4,b5,b6,b7)

freq.res <- as.data.frame(round(summary(mod.mi)$coefficients,3))
colnames(freq.res) <- c("Estimate", "SE", "t", "p")
freq.res <- cbind(true.betas, freq.res) # to ease comparison
library(dplyr)
freq.res <- mutate(freq.res,
                   ci.low = Estimate-SE*1.96,
                   ci.high = Estimate+SE*1.96)

bayes.res <- as.data.frame(round(fit$BUGSoutput$summary[, c(1, 2, 3, 7)],3))[1:8,]
colnames(bayes.res) <- c("Estimate", "SD", "ci.low", "ci.high")
bayes.res <- cbind(true.betas, bayes.res) # to ease comparison

library(knitr)
kable(freq.res, format="html", caption="Frequentist Results Summary")

Frequentist Results Summary
true.betas	Estimate	SE	t	p	ci.low	ci.high
0.00	-0.253	0.271	-0.932	0.352	-0.7842	0.2782
0.00	0.486	0.319	1.525	0.135	-0.1392	1.1112
0.05	0.103	0.078	1.316	0.193	-0.0499	0.2559
0.10	0.232	0.396	0.585	0.559	-0.5442	1.0082
0.10	-0.008	0.069	-0.108	0.914	-0.1432	0.1272
0.00	-0.088	0.442	-0.199	0.843	-0.9543	0.7783
0.10	0.169	0.141	1.200	0.241	-0.1074	0.4454
0.00	0.000	0.142	0.001	1.000	-0.2783	0.2783

kable(bayes.res, format="html", caption="Bayesian Results Summary")

Bayesian Results Summary
	true.betas	Estimate	SD	ci.low	ci.high
b0	0.00	-0.237	0.260	-0.732	0.280
b1	0.00	0.385	0.321	-0.250	1.008
b2	0.05	0.097	0.076	-0.051	0.246
b3	0.10	0.213	0.371	-0.514	0.945
b4	0.10	0.016	0.076	-0.136	0.160
b5	0.00	-0.047	0.480	-0.967	0.905
b6	0.10	0.196	0.119	-0.035	0.432
b7	0.00	0.010	0.148	-0.275	0.292

# -------------------------------------
# examining b1, b2, b3 in particular (main effects)
# -------------------------------------

beta1 <- subset(fit.gg, Parameter=="b1")
p <- ggplot(beta1, aes(x=value)) +
  geom_density(fill="lightblue", color="lightblue", alpha=.7) +  
  ggtitle(paste("Coefficient for x1, true effect =", b1)) +
  theme_bw()
#p # bayesian posterior 
p <- p +  geom_vline(x=bayes.res$Estimate[2], linetype = 1, color="blue") +
  geom_vline(x=c(bayes.res$ci.low[2], bayes.res$ci.high[2]), linetype = 8, color="blue") 
#p # with bayesian ci and esimtate added
p <- p + geom_vline(x=freq.res$Estimate[2], linetype = 1, color="red") +
  geom_vline(x=c(freq.res$ci.low[2], freq.res$ci.high[2]), linetype = 8, color="red")
p # with frequentist ci and estimate added

plot of chunk unnamed-chunk-9

beta2 <- subset(fit.gg, Parameter=="b2")
p <- ggplot(beta2, aes(x=value)) +
  geom_density(fill="lightblue", color="lightblue" , alpha=.7) +  
  ggtitle(paste("Coefficient for x2, true effect =", b2)) +
  theme_bw()
#p # bayesian posterior 
p <- p +  geom_vline(x=bayes.res$Estimate[3], linetype = 1, color="blue") +
  geom_vline(x=c(bayes.res$ci.low[3], bayes.res$ci.high[3]), linetype = 8, color="blue") 
#p # with bayesian ci and esimtate added
p <- p + geom_vline(x=freq.res$Estimate[3], linetype = 1, color="red") +
  geom_vline(x=c(freq.res$ci.low[3], freq.res$ci.high[3]), linetype = 8, color="red")
p # with frequentist ci and estimate added

plot of chunk unnamed-chunk-9

beta3 <- subset(fit.gg, Parameter=="b3")
p <- ggplot(beta3, aes(x=value)) +
  geom_density(fill="lightblue", color="lightblue", alpha=.7) +  
  ggtitle(paste("Coefficient for condition, true effect =", b3)) +
  theme_bw()
#p # bayesian posterior 
p <- p +  geom_vline(x=bayes.res$Estimate[4], linetype = 1, color="blue") +
  geom_vline(x=c(bayes.res$ci.low[4], bayes.res$ci.high[4]), linetype = 8, color="blue") 
#p # with bayesian ci and esimtate added
p <- p + geom_vline(x=freq.res$Estimate[4], linetype = 1, color="red") +
  geom_vline(x=c(freq.res$ci.low[4], freq.res$ci.high[4]), linetype = 8, color="red")
p # with frequentist ci and estimate added

plot of chunk unnamed-chunk-9

key difference:

we’re looking at probability density functions

The total area under the curve is always 1
y-axis is not interpretable as probability of individual values for theta
the probability of a range of values can be determined by the relevant area under the curve
point probabilities can be interpreted relatively (i.e. as a ratio)

# what is the probability that b2 falls with .1 of 0?
length(which(beta2$value>-.1 & beta2$value<.1))/length(beta2$value)

## [1] 0.5138

# what is the probability that b2 falls with .2 of 0?
length(which(beta2$value>-.2 & beta2$value<.2))/length(beta2$value)

## [1] 0.9104

# what is the probability that b2 is positive?
length(which(beta2$value>0))/length(beta2$value)

## [1] 0.897

# how to calcuate a density point for a posterior?

Frequentist vs. Bayesian: Null Results

Rose Maier

November 17, 2014

Make up the data

Frequentist Analysis

Bayesian Analysis

With smaller N

Frequentist Analysis

Bayesian Analysis

Examining Results

key difference:

we’re looking at probability density functions