ANOVA using Bayesian Methods

Introduction

ANOVA (analysis of variance) is used when we have categorical explanatory variables where the observations belong to groups, i.e., we compare the variability of responses between groups. If the variability between groups is large relative to the variability within groups, we conclude that there is a ‘grouping effect’.

Data

As an example of a one-way ANOVA, we’ll look at the Plant Growth data in R.

data("PlantGrowth")
?PlantGrowth
head(PlantGrowth)

##   weight group
## 1   4.17  ctrl
## 2   5.58  ctrl
## 3   5.18  ctrl
## 4   6.11  ctrl
## 5   4.50  ctrl
## 6   4.61  ctrl

Because the explanatory variable group is a factor and not continuous, we choose to visualize the data with box plots rather than scatter plots.

boxplot(weight ~ group, data=PlantGrowth)

The box plots summarize the distribution of the data for each of the three groups. It appears that treatment 2 has the highest mean yield. It might be questionable whether each group has the same variance, but we’ll assume that is the case.

Modeling

Again, we can start with the reference analysis (with a noninformative prior) with a linear model in R.

lmod = lm(weight ~ group, data=PlantGrowth)
summary(lmod)

## 
## Call:
## lm(formula = weight ~ group, data = PlantGrowth)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.0710 -0.4180 -0.0060  0.2627  1.3690 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   5.0320     0.1971  25.527   <2e-16 ***
## grouptrt1    -0.3710     0.2788  -1.331   0.1944    
## grouptrt2     0.4940     0.2788   1.772   0.0877 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6234 on 27 degrees of freedom
## Multiple R-squared:  0.2641, Adjusted R-squared:  0.2096 
## F-statistic: 4.846 on 2 and 27 DF,  p-value: 0.01591

plot(lmod)

The default model structure in R is the linear model with dummy indicator variables. Hence, the “intercept” in this model is the mean yield for the control group. The two other parameters are the estimated effects of treatments 1 and 2. To recover the mean yield in treatment group 1, you would add the intercept term and the treatment 1 effect. To see how R sets the model up, use the model.matrix(lmod) function to extract the \(X\) matrix.

The anova() function in R compares variability of observations between the treatment groups to variability within the treatment groups to test whether all means are equal or whether at least one is different. The small p-value here suggests that the means are not all equal.

Let’s fit the cell means model in JAGS.

library("rjags")

## Loading required package: coda

## Linked to JAGS 4.3.2

## Loaded modules: basemod,bugs

mod_string = " model {
    for (i in 1:length(y)) {
        y[i] ~ dnorm(mu[grp[i]], prec)
    }
    
    for (j in 1:3) {
        mu[j] ~ dnorm(0.0, 1.0/1.0e6)
    }
    
    prec ~ dgamma(5/2.0, 5*1.0/2.0)
    sig = sqrt( 1.0 / prec )
} "

set.seed(82)
str(PlantGrowth)

## 'data.frame':    30 obs. of  2 variables:
##  $ weight: num  4.17 5.58 5.18 6.11 4.5 4.61 5.17 4.53 5.33 5.14 ...
##  $ group : Factor w/ 3 levels "ctrl","trt1",..: 1 1 1 1 1 1 1 1 1 1 ...

data_jags = list(y=PlantGrowth$weight, 
              grp=as.numeric(PlantGrowth$group))

params = c("mu", "sig")

inits = function() {
    inits = list("mu"=rnorm(3,0.0,100.0), "prec"=rgamma(1,1.0,1.0))
}

mod = jags.model(textConnection(mod_string), data=data_jags, inits=inits, n.chains=3)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 30
##    Unobserved stochastic nodes: 4
##    Total graph size: 74
## 
## Initializing model

update(mod, 1e3)

mod_sim = coda.samples(model=mod,
                        variable.names=params,
                        n.iter=5e3)
mod_csim = as.mcmc(do.call(rbind, mod_sim)) # combined chains

Model checking

As usual, we check for convergence of our MCMC.

plot(mod_sim)

gelman.diag(mod_sim)

## Potential scale reduction factors:
## 
##       Point est. Upper C.I.
## mu[1]          1          1
## mu[2]          1          1
## mu[3]          1          1
## sig            1          1
## 
## Multivariate psrf
## 
## 1

autocorr.diag(mod_sim)

##               mu[1]        mu[2]         mu[3]           sig
## Lag 0   1.000000000  1.000000000  1.0000000000  1.000000e+00
## Lag 1  -0.001132242 -0.029398307 -0.0018640883  9.776646e-02
## Lag 5  -0.001613350 -0.009012361  0.0141201008 -3.917936e-05
## Lag 10 -0.019304177 -0.015381565  0.0008784647 -9.218947e-03
## Lag 50  0.013621258 -0.005693181  0.0065324962  9.162095e-05

effectiveSize(mod_sim)

##    mu[1]    mu[2]    mu[3]      sig 
## 15000.00 15860.21 14579.54 12222.85

We can also look at the residuals to see if there are any obvious problems with our model choice.

(pm_params = colMeans(mod_csim))

##     mu[1]     mu[2]     mu[3]       sig 
## 5.0320393 4.6620621 5.5288421 0.7110024

yhat = pm_params[1:3][data_jags$grp]
resid = data_jags$y - yhat
plot(resid)

plot(yhat, resid)

Again, it might be appropriate to have a separate variance for each group. We will have you do that as an exercise.

Results

Let’s look at the posterior summary of the parameters.

summary(mod_sim)

## 
## Iterations = 1001:6000
## Thinning interval = 1 
## Number of chains = 3 
## Sample size per chain = 5000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##        Mean      SD  Naive SE Time-series SE
## mu[1] 5.032 0.22845 0.0018653      0.0018650
## mu[2] 4.662 0.22409 0.0018297      0.0017815
## mu[3] 5.529 0.22756 0.0018580      0.0018866
## sig   0.711 0.09252 0.0007554      0.0008373
## 
## 2. Quantiles for each variable:
## 
##         2.5%   25%    50%   75%  97.5%
## mu[1] 4.5811 4.882 5.0304 5.182 5.4879
## mu[2] 4.2261 4.513 4.6621 4.811 5.1083
## mu[3] 5.0800 5.379 5.5269 5.678 5.9783
## sig   0.5584 0.645 0.7015 0.766 0.9193

HPDinterval(mod_csim)

##           lower     upper
## mu[1] 4.5933894 5.4964453
## mu[2] 4.2237876 5.1041522
## mu[3] 5.0769071 5.9734837
## sig   0.5443715 0.8978277
## attr(,"Probability")
## [1] 0.95

The HPDinterval() function in the coda package calculates intervals of highest posterior density for each parameter.

We are interested to know if one of the treatments increases mean yield. It is clear that treatment 1 does not. What about treatment 2?

mean(mod_csim[,3] > mod_csim[,1])

## [1] 0.9388667

There is a high posterior probability that the mean yield for treatment 2 is greater than the mean yield for the control group.

It may be the case that treatment 2 would be costly to put into production. Suppose that to be worthwhile, this treatment must increase mean yield by 10%. What is the posterior probability that the increase is at least that?

mean(mod_csim[,3] > 1.1*mod_csim[,1])

## [1] 0.4921333

We have about 50/50 odds that adopting treatment 2 would increase mean yield by at least 10%.