ANOVA in Bayes

0.1 Data

As an example of a one-way ANOVA, we’ll look at the Plant Growth data in R.

data("PlantGrowth")
?PlantGrowth

## starting httpd help server ... done

head(PlantGrowth)

dim(PlantGrowth)

## [1] 30  2

Because the explanatory variable group is a factor and not continuous, we choose to visualize the data with box plots rather than scatter plots.

boxplot(weight ~ group, data=PlantGrowth)

The box plots summarize the distribution of the data for each of the three groups. It appears that treatment 2 has the highest mean yield. It might be questionable whether each group has the same variance, but we’ll assume that is the case.

0.2 Modeling

Again, we can start with the reference analysis (with a noninformative prior) with a linear model in R.

lmod = lm(weight ~ group, data=PlantGrowth)
summary(lmod)

## 
## Call:
## lm(formula = weight ~ group, data = PlantGrowth)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.0710 -0.4180 -0.0060  0.2627  1.3690 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   5.0320     0.1971  25.527   <2e-16 ***
## grouptrt1    -0.3710     0.2788  -1.331   0.1944    
## grouptrt2     0.4940     0.2788   1.772   0.0877 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6234 on 27 degrees of freedom
## Multiple R-squared:  0.2641, Adjusted R-squared:  0.2096 
## F-statistic: 4.846 on 2 and 27 DF,  p-value: 0.01591

anova(lmod)

plot(lmod) # for graphical residual analysis

The default model structure in R is the linear model with dummy indicator variables. Hence, the “intercept” in this model is the mean yield for the control group. The two other parameters are the estimated effects of treatments 1 and 2. To recover the mean yield in treatment group 1, you would add the intercept term and the treatment 1 effect. To see how R sets the model up, use the model.matrix(lmod) function to extract the X matrix.

The anova() function in R compares variability of observations between the treatment groups to variability within the treatment groups to test whether all means are equal or whether at least one is different. The small p-value here suggests that the means are not all equal.

Let’s fit the cell means model in JAGS.

Specify the model:
- \(y_i\) starts with normal distribution where \(\mu\) of \(group_i\) has another index (by extract the group number from 1,2 or 3)
  \[ y_i \sim N(\mu_{group_i}, precision) \]
- for 3 cell means, we use none informative normal prior
  \[ \mu_j \sim N(0.0, 1.0/1.0e6) \]
- for the prior for variance we use an inverse gamma with 5 observations as our effective sample size, and prior guess on the variance of 1.0
  \[ \sigma \sim \Gamma^{-1}(5.0/2, 5 \times 1.0/2.0) \]

library("rjags")
mod_string = " model {
    for (i in 1:length(y)) {
        y[i] ~ dnorm(mu[grp[i]], prec)
    }
    
    for (j in 1:3) {
        mu[j] ~ dnorm(0.0, 1.0/1.0e6)
    }
    
    prec ~ dgamma(5/2.0, 5*1.0/2.0)
    sig = sqrt( 1.0 / prec )
} "

set.seed(82)
str(PlantGrowth)

## 'data.frame':    30 obs. of  2 variables:
##  $ weight: num  4.17 5.58 5.18 6.11 4.5 4.61 5.17 4.53 5.33 5.14 ...
##  $ group : Factor w/ 3 levels "ctrl","trt1",..: 1 1 1 1 1 1 1 1 1 1 ...

data_jags = list(y=PlantGrowth$weight, 
              grp=as.numeric(PlantGrowth$group))

params = c("mu", "sig")

inits = function() {
    inits = list("mu"=rnorm(3,0.0,100.0), "prec"=rgamma(1,1.0,1.0))
}

mod = jags.model(textConnection(mod_string), data=data_jags, inits=inits, n.chains=3)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 30
##    Unobserved stochastic nodes: 4
##    Total graph size: 74
## 
## Initializing model

update(mod, 1e3)

mod_sim = coda.samples(model=mod,
                        variable.names=params,
                        n.iter=5e3)
mod_csim = as.mcmc(do.call(rbind, mod_sim)) # combined chains

0.3 Model checking

As usual, we check for convergence of our MCMC.

#look at the trace plots (good!)
plot(mod_sim)

#check gelman and rubin stastistics (all is 1)
gelman.diag(mod_sim)

## Potential scale reduction factors:
## 
##       Point est. Upper C.I.
## mu[1]          1          1
## mu[2]          1          1
## mu[3]          1          1
## sig            1          1
## 
## Multivariate psrf
## 
## 1

# autocorrelation is not a big problem so we can imagine that the effect of sample size will be large, close to the actual size of the chain (true!)
autocorr.diag(mod_sim)

##               mu[1]        mu[2]        mu[3]         sig
## Lag 0   1.000000000  1.000000000  1.000000000 1.000000000
## Lag 1  -0.013764261  0.008390137  0.012406490 0.082050480
## Lag 5  -0.015457773 -0.002427229 -0.011734123 0.001164453
## Lag 10 -0.007449202 -0.002891236 -0.007206202 0.011222271
## Lag 50  0.003677028  0.003321709 -0.001653606 0.008690881

effectiveSize(mod_sim)

##    mu[1]    mu[2]    mu[3]      sig 
## 15284.01 15000.00 14672.41 12881.61

We can also look at the residuals to see if there are any obvious problems with our model choice.

(pm_params = colMeans(mod_csim))

##     mu[1]     mu[2]     mu[3]       sig 
## 5.0327673 4.6610177 5.5285226 0.7112867

Note: we can compare with coefficients of lm fit model (above) (5.032, -0.371, 0.494) by adding up the Intercept (equivalent to \(\mu_1\)) with group treatment 1 (would equals to \(\mu_2\)) and so on

yhat = pm_params[1:3][data_jags$grp]
resid = data_jags$y - yhat
plot(resid)

plot(yhat, resid)

Again, it might be appropriate to have a separate variance for each group. In the way we can implement in modeling the \(\sigma\).

0.4 Results

Let’s look at the posterior summary of the parameters.

summary(mod_sim)

## 
## Iterations = 1001:6000
## Thinning interval = 1 
## Number of chains = 3 
## Sample size per chain = 5000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##         Mean     SD Naive SE Time-series SE
## mu[1] 5.0328 0.2263 0.001848      0.0018314
## mu[2] 4.6610 0.2290 0.001870      0.0018701
## mu[3] 5.5285 0.2244 0.001832      0.0018540
## sig   0.7113 0.0916 0.000748      0.0008083
## 
## 2. Quantiles for each variable:
## 
##        2.5%    25%   50%    75%  97.5%
## mu[1] 4.583 4.8848 5.031 5.1809 5.4747
## mu[2] 4.202 4.5124 4.662 4.8126 5.1144
## mu[3] 5.091 5.3822 5.528 5.6746 5.9782
## sig   0.560 0.6468 0.701 0.7644 0.9206

HPDinterval(mod_csim) #0.95

##           lower     upper
## mu[1] 4.5979754 5.4862179
## mu[2] 4.2212390 5.1290745
## mu[3] 5.0754068 5.9576636
## sig   0.5397441 0.8905307
## attr(,"Probability")
## [1] 0.95

HPDinterval(mod_csim, 0.9) #0.90

##           lower    upper
## mu[1] 4.6719303 5.407051
## mu[2] 4.2856647 5.038490
## mu[3] 5.1637216 5.902674
## sig   0.5581455 0.849172
## attr(,"Probability")
## [1] 0.9

The HPDinterval() function in the coda package calculates intervals of highest posterior density for each parameter.

We are interested to know if one of the treatments increases mean yield. It is clear that treatment 1 does not. What about treatment 2?

head(mod_csim) # make sure to get the correct column

## Markov Chain Monte Carlo (MCMC) output:
## Start = 1 
## End = 7 
## Thinning interval = 1 
##         mu[1]    mu[2]    mu[3]       sig
## [1,] 4.786112 4.535094 5.608035 0.6841112
## [2,] 5.126522 4.639489 5.823189 0.6386737
## [3,] 4.870069 4.643521 5.570249 0.6339086
## [4,] 4.903034 4.778146 5.313771 0.6882587
## [5,] 4.952361 4.404168 6.006823 0.7599991
## [6,] 5.132476 4.597472 5.902216 0.6920421
## [7,] 4.537526 4.709347 5.653255 0.7137848

mean(mod_csim[,3] > mod_csim[,1])

## [1] 0.9396

There is a high posterior probability that the mean yield for treatment 2 is greater than the mean yield for the control group.

It may be the case that treatment 2 would be costly to put into production. Suppose that to be worthwhile, this treatment must increase mean yield by 10%. What is the posterior probability that the increase is at least that?

mean(mod_csim[,3] > 1.1*mod_csim[,1])

## [1] 0.4908667

We have about 50/50 odds that adopting treatment 2 would increase mean yield by at least 10%.

ANOVA in Bayes

Hai Nguyen

3/12/2021

0.1 Data

0.2 Modeling

0.3 Model checking

0.4 Results