Chapter 6. Blocking and Two-Way ANOVA

Blocking and Two-Way ANOVA are statistical techniques used to analyze experimental data, particularly when there are multiple factors affecting the outcome. Let’s introduce each of them:

Blocking

Purpose: To reduce variability and increase precision in experimental studies.

Implementation: Experimental units are grouped into blocks based on a factor external to the main experimental treatment.

Advantages: Reduces the influence of nuisance variables, increases the accuracy of treatment comparisons, and enhances the ability to detect treatment effects.

6.1 The Randomized Complete Block (RCB) Design

The Randomized Complete Block (RCB) Design is a common experimental design used in various research fields to reduce variability and increase the precision of treatment comparisons. It is particularly useful when there are known sources of variability that may confound the treatment effects.

The RCB Design can be implemented in various ways depending on the experimental setup and research goals. Here are three common strategies for implementing RCB designs:

  1. Subdivision: Subdivide a large unit (the block) into smaller units (the experimental units) so that each unit receives one of several treatments randomly. There can be multiple blocks.
  2. Sorting/Matching: Sort or match individuals with similar baseline characteristics into blocks. Each block should have the same number of individuals as the number of treatments. Randomly assign one treatment to each individual within a block.
  3. Reuse: Reuse each subject (the block) multiple times, assigning each treatment in a random order.

Each strategy results in a data table with n rows and k columns, where n is the number of blocks and k is the number of treatments. The total number of observations equals n * k.

Here’s an example data table for a scenario with n=4 blocks and k=2 treatments:

n <- 4  # Number of blocks
k <- 2  # Number of treatments
set.seed(123)  # Set a seed for reproducibility
data <- matrix(paste0("y", round(runif(n*k, 10, 20), 2)), nrow = n, ncol = k)
colnames(data) <- paste0("Treatment", 1:k)
rownames(data) <- paste0("Block", 1:n)

# Create the table using kableExtra
library(kableExtra)
kable(data, caption = "Block design")
Block design
Treatment1 Treatment2
Block1 y12.88 y19.4
Block2 y17.88 y10.46
Block3 y14.09 y15.28
Block4 y18.83 y18.92

6.2 Exploring Data from Block Designs

What graphical insights can we glean from the data before model fitting? Constructing a plot for Randomized Complete Block Design (RCBD) data involves creating side-by-side dot plots for each treatment group and connecting observations from the same block.

Key Points Revealed by the Plot:

  • Variability due to block within a treatment group.
  • Variability due to treatment within a block.
  • Parallelism of lines corresponding to blocks .

Our aim is to discern differences in treatment effects. To achieve this, it’s crucial to eliminate block effects and derive adjusted responses before comparing treatments.

Steps to Calculate Adjusted Values:

  1. Determine the grand mean.
  2. Compute the mean of each block (subject in this example).
  3. Determine the effect of each block: block mean minus grand mean.
  4. Calculate adjusted values: subtract the corresponding block effect from each observation.

Plotting the adjusted values against treatments offers a clearer picture of each treatment’s effect, as the influence of blocks has been mitigated.

Example 2: Plant Growth Experiment

Let’s consider an experiment where we want to investigate the effects of three different fertilizers (Fertilizer A, B, and C) on plant growth. We have four pots available, and each pot serves as a block. Each fertilizer is randomly assigned to a pot, and the plant height is measured after a specific period. The data is presented in a wide format below:

library(ggplot2)
# Simulate some data 
set.seed(123)
D <- data.frame(
  Pot = factor(rep(1:4)),
  Treatment = rep(c("Fertilizer A", "Fertilizer B", "Fertilizer C"), 4),
  Y = c(rnorm(12, mean = 18, sd = 2))
)

D$Treatment <- factor(D$Treatment, levels = c("Fertilizer A", "Fertilizer B", "Fertilizer C"))

# Calculate block means (pot means in this case)
block_means <- aggregate(Y ~ Pot, data = D, FUN = mean)

# Calculate grand mean
grand_mean <- mean(D$Y)

# Calculate block effects
block_effects <- block_means$Y - grand_mean

# Adjusted data
D$Adjusted_Y <- D$Y - block_effects[match(D$Pot, block_means$Pot)]

# Original data plot
with(D, interaction.plot(Treatment, Pot, Y, xlab = "Treatment", ylab = "Height", col = 1:4))

# Block effect data plot
ggplot(D, aes(x = Treatment, y = Adjusted_Y)) +
  geom_point() +
  geom_rect(aes(xmin = -Inf, xmax = Inf, ymin = -Inf, ymax = Inf), fill = "transparent", color = "black") +
  labs(title = "Block Effect Data",
       x = "Fertilizer",
       y = "Plant Height") +
  theme(panel.background = element_rect(fill = "white"),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank())

# Generate table
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:kableExtra':
## 
##     group_rows
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
plant_growth_summary <- D %>%
  group_by(Treatment) %>%
  summarize(Mean_Height = mean(Y), .sd = sd(Y))

# Create the table using kableExtra library(kableExtra)
kable(plant_growth_summary, caption = "Plant Growth Summary by Fertilizer")
Plant Growth Summary by Fertilizer
Treatment Mean_Height .sd
Fertilizer A 17.76264 0.947857
Fertilizer B 17.92907 2.053752
Fertilizer C 19.47337 2.251353

6.3 The Two-Way ANOVA model for RCBD design

The two-way ANOVA model is formulated as:

Y = μ + τ_i + β_j + ϵ

where:

  • μ is the grand or population mean.
  • τ_i is the i^th treatment effect (e.g., effect of fertilizer A, B, or C).
  • β_j is the j^th block effect (e.g., effect of pot 1, 2, 3, or 4).
  • ϵ is the random error term.

Assumptions of the Model:

  1. The observations are all independent.
  2. The effects are fixed and additive.
  3. The error term has a normal distribution with mean 0 and constant variance σ^2.

Similar to one-way ANOVA, an observation can be decomposed as:

Observed Value = Grand Mean + Block Effect + Treatment Effect + Residual

Example: Decomposing Plant Growth Data

Imagine we have a dataset where:

  • Y represents plant height after treatment.
  • Block refers to the pot where the plant was grown (e.g., Pot 1, Pot 2, etc.).
  • Treatment represents the type of fertilizer applied (e.g., Fertilizer A, B, or C).

We can calculate the effects:

  • Block effects: difference between block means (average plant height per pot) and the grand mean (average plant height across all pots).
  • Treatment effects: difference between treatment means (average plant height for each fertilizer) and the grand mean.

Sums of Squares and Degrees of Freedom

An important concept in ANOVA is the sum of squares (SS) for different components. We can show that:

∑(y - grand mean)^2 = ∑(block effect)^2 + ∑(treatment effect)^2 + ∑(residual)^2

where the summations are taken over all plants in the experiment.

The terms in the equation correspond to:

  • Total sum of squares (SST)
  • Block sum of squares (SSB)
  • Treatment sum of squares (SSTr)
  • Residual or error sum of squares (SSE)

Their degrees of freedom (df) are:

  • SST: n - 1 (total plants - 1)
  • SSB: b - 1 (number of blocks - 1)
  • SSTr: k - 1 (number of treatments - 1)
  • SSE: n - b - k + 1 (or (b - 1) * (k - 1) since n = b * k)

Key Point:

The total degrees of freedom equals the sum of the individual degrees of freedom:

(n - 1) = (b - 1) + (k - 1) + (n - b - k + 1)

R code for Two-Way ANOVA

The following R code demonstrates how to perform a two-way ANOVA on plant growth data:

D <- data.frame(
  Y = c(18, 20, 16, 19, 22, 25, 21, 23, 15, 19, 17, 18),  # Plant height
  Block = rep(c("Pot 1", "Pot 2", "Pot 3", "Pot 4"), 3),
  Treatment = rep(c("Fertilizer A", "Fertilizer B", "Fertilizer C"), each = 4)
)

model <- aov(Y ~ Block + Treatment, data = D)
summary(model)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Block        3  21.58    7.19    9.25 0.011440 *  
## Treatment    2  68.67   34.33   44.14 0.000258 ***
## Residuals    6   4.67    0.78                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Block Effects:

Block sum of squares: 21.58. This indicates the total variance in the response variable attributable to differences between blocks. Degrees of freedom (Df) for blocks: 3. There are 3 blocks in the experiment. F-value: 9.25. This compares the variance between blocks to the variance within blocks. Pr(>F): 0.01144. This is a statistically significant p-value (less than 0.05), signifying a significant effect of blocks on the response variable. Treatment Effects:

Treatment sum of squares: 68.67. This represents the total variance due to the different treatments. Df for treatments: 2. There are 2 treatments being compared. F-value: 44.14. This compares the variance between treatments to the unexplained variance within blocks. Pr(>F): 0.000258. This is a highly significant p-value (much less than 0.05), indicating a strong influence of treatments on the response variable. In conclusion:

The blocking design was effective, as evidenced by the significant block effect. This suggests that block-specific factors influenced the response variable. More importantly, there’s a statistically significant treatment effect. This implies that the different treatments have a substantial impact on the response variable, warranting further investigation into how each treatment affects the outcome.

6.4 Checking Assumptions for Two-Way ANOVA with Residual Plots

The two-way ANOVA model for RCBD designs relies on several assumptions, including normality of residuals, constant variance, and additivity of effects. We can use residual plots to visually assess these assumptions.

  1. Q-Q plot to check for normality
  2. Plot of residuals versus fitted values to check constant variances.
  3. Tukey plot for non-additivity. It shows whether a change of scale can lead to a better fit. The plot is constructed by first calculating the comparison values (Block effect times Treatment effect divided by Grand mean), plotting residuals versus comparison values, and adding a fitted line with 0 intercepts. If the estimated slope is not 0, do a power transformation over the response with power equal to 1 minus the slope.

Example 3: We will use a dataset with four subjects and three treatments, named “Low Dose”, “High Dose”, and “Control”. We will then generate a Q-Q plot, a plot of residuals versus fitted values, a tukey plot and a two-way anova model:

# Generate synthetic data for the two-way ANOVA model
set.seed(123)  # for reproducibility
Subject <- factor(rep(1:4, each = 3))
Treatment <- rep(c("Low Dose", "High Dose", "Control"), 4)
Y <- c(12, 28, 23, 58, 82, 69, 16, 35, 42, 8, 15, 30)  
D <- data.frame(Subject = Subject, Treatment = Treatment, Y = Y)

# Fit the two-way ANOVA model
model <- aov(Y ~ Subject + Treatment, data = D)
summary(model)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Subject      3   5142  1714.1  38.859 0.000252 ***
## Treatment    2    773   386.3   8.758 0.016609 *  
## Residuals    6    265    44.1                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Q-Q plot to check for normality
plot(model, 2)

# Residuals versus fitted values
plot(model, 1)

# Tukey additivity plot
block_effects <- tapply(D$Y, D$Subject, mean)
treatment_effects <- tapply(D$Y, D$Treatment, mean)
grand_mean <- mean(D$Y)
comparison_values <- outer(block_effects, treatment_effects, FUN = "*") / grand_mean
comparison_values <- as.vector(comparison_values)
fit <- lm(D$Y ~ comparison_values + 0)
residuals <- resid(fit)
slope <- coef(fit)[1]
plot(comparison_values, residuals, xlab = "Comparison Values", ylab = "Residuals",
     main = "Tukey Additivity Plot", pch = 16)
abline(a = 0, b = slope, col = "red")
legend("topright", legend = paste("Slope:", round(slope, 4)), col = "red", lwd = 1, bty = "n")

# Calculate suggested power transformation
power <- 1 - slope
power
## comparison_values 
##         0.2742876
  1. Normality Check: The Q-Q plot indicates that the normality assumption is satisfied as the points tend to follow the line.

  2. Constant Variance Check: The residuals versus fitted values plot shows a fairly even spread of points along the horizontal axis, suggesting that the variance of the residuals is approximately constant across all levels of the predictor variables.

  3. Tukey Additivity Plot: The slope of the fitted line in the Tukey additivity plot is approximately 0.7257. Therefore, the suggested power transformation is 1−0.7257 = 0.2743 which is not very close to 0.5, indicating that a different transformation may be appropriate.

# Fit a two-way ANOVA model
model <- aov(Y ~ Subject + Treatment, data = D)
summary(model)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Subject      3   5142  1714.1  38.859 0.000252 ***
## Treatment    2    773   386.3   8.758 0.016609 *  
## Residuals    6    265    44.1                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  1. Normality Check: There is a significant indication from the ANOVA table that the normality condition may be violated.
  2. Constant Variance Check: There is a significant indication from the ANOVA table that the variation might not be constant.

6.5 Using the Model for RCBD

We can test hypotheses

  • whether there is a significant difference among treatments (major interest)
  • whether there is a significant difference among blocks (less interest)
  • whether there is a significant difference between two treatments

We can also find a confidence interval

  • for a treatment mean
  • for the difference between two treatment means
library(lsmeans)
## Loading required package: emmeans
## The 'lsmeans' package is now basically a front end for 'emmeans'.
## Users are encouraged to switch the rest of the way.
## See help('transition') for more information, including how to
## convert old 'lsmeans' objects and scripts to work with 'emmeans'.
model <- aov(Y ~ Treatment + Subject, data = D)

# Compute least square means and confidence intervals
lsmeans_model <- lsmeans(model, "Treatment")
lsmeans_model
##  Treatment lsmean   SE df lower.CL upper.CL
##  Control     41.0 3.32  6     32.9     49.1
##  High Dose   40.0 3.32  6     31.9     48.1
##  Low Dose    23.5 3.32  6     15.4     31.6
## 
## Results are averaged over the levels of: Subject 
## Confidence level used: 0.95
# Calculate confidence intervals for pairwise comparisons
pairwise_CI <- pairs(lsmeans_model, adjust = "none", infer = TRUE)
pairwise_CI
##  contrast             estimate  SE df lower.CL upper.CL t.ratio p.value
##  Control - High Dose       1.0 4.7  6   -10.49     12.5   0.213  0.8384
##  Control - Low Dose       17.5 4.7  6     6.01     29.0   3.726  0.0098
##  High Dose - Low Dose     16.5 4.7  6     5.01     28.0   3.513  0.0126
## 
## Results are averaged over the levels of: Subject 
## Confidence level used: 0.95

The result shows that

  • The 95% confidence interval for Control group is 32.9 to 49.1.
  • The 95% confidence interval for the mean difference between Control group and High dosage group is -10.49 to 12.5.
  • The p-value for testing whether there is mean difference between Control group and High dosage group is 0.8384, indicating no significant difference at level 5%.
  • The p-value for testing whether there is mean difference between Control group and low dosage group is 0.0098, indicating a significant difference at any commonly used significance level.

6.6 Observational Relatives of the RCBD

Fitting a two way anova model for the data from RiverIron gives us:

# Fit a two-way ANOVA model
riveriron <- read.csv("RiverIron.csv")
model <- aov(Iron ~ River + Site, data = riveriron)
summary(model)
##             Df Sum Sq Mean Sq F value Pr(>F)  
## River        3 542458  180819   7.041 0.0216 *
## Site         2 442395  221198   8.613 0.0172 *
## Residuals    6 154083   25680                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1