Blocking and Two-Way ANOVA are statistical techniques used to analyze experimental data, particularly when there are multiple factors affecting the outcome. Let’s introduce each of them:
Purpose: To reduce variability and increase precision in experimental studies.
Implementation: Experimental units are grouped into blocks based on a factor external to the main experimental treatment.
Advantages: Reduces the influence of nuisance variables, increases the accuracy of treatment comparisons, and enhances the ability to detect treatment effects.
The Randomized Complete Block (RCB) Design is a common experimental design used in various research fields to reduce variability and increase the precision of treatment comparisons. It is particularly useful when there are known sources of variability that may confound the treatment effects.
The RCB Design can be implemented in various ways depending on the experimental setup and research goals. Here are three common strategies for implementing RCB designs:
Each strategy results in a data table with n rows and
k columns, where n is the number of blocks and
k is the number of treatments. The total number of
observations equals n * k.
Here’s an example data table for a scenario with n=4
blocks and k=2 treatments:
n <- 4 # Number of blocks
k <- 2 # Number of treatments
set.seed(123) # Set a seed for reproducibility
data <- matrix(paste0("y", round(runif(n*k, 10, 20), 2)), nrow = n, ncol = k)
colnames(data) <- paste0("Treatment", 1:k)
rownames(data) <- paste0("Block", 1:n)
# Create the table using kableExtra
library(kableExtra)
kable(data, caption = "Block design")
| Treatment1 | Treatment2 | |
|---|---|---|
| Block1 | y12.88 | y19.4 |
| Block2 | y17.88 | y10.46 |
| Block3 | y14.09 | y15.28 |
| Block4 | y18.83 | y18.92 |
What graphical insights can we glean from the data before model fitting? Constructing a plot for Randomized Complete Block Design (RCBD) data involves creating side-by-side dot plots for each treatment group and connecting observations from the same block.
Key Points Revealed by the Plot:
Our aim is to discern differences in treatment effects. To achieve this, it’s crucial to eliminate block effects and derive adjusted responses before comparing treatments.
Steps to Calculate Adjusted Values:
Plotting the adjusted values against treatments offers a clearer picture of each treatment’s effect, as the influence of blocks has been mitigated.
Example 2: Plant Growth Experiment
Let’s consider an experiment where we want to investigate the effects of three different fertilizers (Fertilizer A, B, and C) on plant growth. We have four pots available, and each pot serves as a block. Each fertilizer is randomly assigned to a pot, and the plant height is measured after a specific period. The data is presented in a wide format below:
library(ggplot2)
# Simulate some data
set.seed(123)
D <- data.frame(
Pot = factor(rep(1:4)),
Treatment = rep(c("Fertilizer A", "Fertilizer B", "Fertilizer C"), 4),
Y = c(rnorm(12, mean = 18, sd = 2))
)
D$Treatment <- factor(D$Treatment, levels = c("Fertilizer A", "Fertilizer B", "Fertilizer C"))
# Calculate block means (pot means in this case)
block_means <- aggregate(Y ~ Pot, data = D, FUN = mean)
# Calculate grand mean
grand_mean <- mean(D$Y)
# Calculate block effects
block_effects <- block_means$Y - grand_mean
# Adjusted data
D$Adjusted_Y <- D$Y - block_effects[match(D$Pot, block_means$Pot)]
# Original data plot
with(D, interaction.plot(Treatment, Pot, Y, xlab = "Treatment", ylab = "Height", col = 1:4))
# Block effect data plot
ggplot(D, aes(x = Treatment, y = Adjusted_Y)) +
geom_point() +
geom_rect(aes(xmin = -Inf, xmax = Inf, ymin = -Inf, ymax = Inf), fill = "transparent", color = "black") +
labs(title = "Block Effect Data",
x = "Fertilizer",
y = "Plant Height") +
theme(panel.background = element_rect(fill = "white"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
# Generate table
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:kableExtra':
##
## group_rows
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
plant_growth_summary <- D %>%
group_by(Treatment) %>%
summarize(Mean_Height = mean(Y), .sd = sd(Y))
# Create the table using kableExtra library(kableExtra)
kable(plant_growth_summary, caption = "Plant Growth Summary by Fertilizer")
| Treatment | Mean_Height | .sd |
|---|---|---|
| Fertilizer A | 17.76264 | 0.947857 |
| Fertilizer B | 17.92907 | 2.053752 |
| Fertilizer C | 19.47337 | 2.251353 |
The two-way ANOVA model is formulated as:
Y = μ + τ_i + β_j + ϵ
where:
Assumptions of the Model:
Similar to one-way ANOVA, an observation can be decomposed as:
Observed Value = Grand Mean + Block Effect + Treatment Effect + Residual
Example: Decomposing Plant Growth Data
Imagine we have a dataset where:
We can calculate the effects:
Sums of Squares and Degrees of Freedom
An important concept in ANOVA is the sum of squares (SS) for different components. We can show that:
∑(y - grand mean)^2 = ∑(block effect)^2 + ∑(treatment effect)^2 + ∑(residual)^2
where the summations are taken over all plants in the experiment.
The terms in the equation correspond to:
Their degrees of freedom (df) are:
Key Point:
The total degrees of freedom equals the sum of the individual degrees of freedom:
(n - 1) = (b - 1) + (k - 1) + (n - b - k + 1)
R code for Two-Way ANOVA
The following R code demonstrates how to perform a two-way ANOVA on plant growth data:
D <- data.frame(
Y = c(18, 20, 16, 19, 22, 25, 21, 23, 15, 19, 17, 18), # Plant height
Block = rep(c("Pot 1", "Pot 2", "Pot 3", "Pot 4"), 3),
Treatment = rep(c("Fertilizer A", "Fertilizer B", "Fertilizer C"), each = 4)
)
model <- aov(Y ~ Block + Treatment, data = D)
summary(model)
## Df Sum Sq Mean Sq F value Pr(>F)
## Block 3 21.58 7.19 9.25 0.011440 *
## Treatment 2 68.67 34.33 44.14 0.000258 ***
## Residuals 6 4.67 0.78
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Block Effects:
Block sum of squares: 21.58. This indicates the total variance in the response variable attributable to differences between blocks. Degrees of freedom (Df) for blocks: 3. There are 3 blocks in the experiment. F-value: 9.25. This compares the variance between blocks to the variance within blocks. Pr(>F): 0.01144. This is a statistically significant p-value (less than 0.05), signifying a significant effect of blocks on the response variable. Treatment Effects:
Treatment sum of squares: 68.67. This represents the total variance due to the different treatments. Df for treatments: 2. There are 2 treatments being compared. F-value: 44.14. This compares the variance between treatments to the unexplained variance within blocks. Pr(>F): 0.000258. This is a highly significant p-value (much less than 0.05), indicating a strong influence of treatments on the response variable. In conclusion:
The blocking design was effective, as evidenced by the significant block effect. This suggests that block-specific factors influenced the response variable. More importantly, there’s a statistically significant treatment effect. This implies that the different treatments have a substantial impact on the response variable, warranting further investigation into how each treatment affects the outcome.
The two-way ANOVA model for RCBD designs relies on several assumptions, including normality of residuals, constant variance, and additivity of effects. We can use residual plots to visually assess these assumptions.
Example 3: We will use a dataset with four subjects and three treatments, named “Low Dose”, “High Dose”, and “Control”. We will then generate a Q-Q plot, a plot of residuals versus fitted values, a tukey plot and a two-way anova model:
# Generate synthetic data for the two-way ANOVA model
set.seed(123) # for reproducibility
Subject <- factor(rep(1:4, each = 3))
Treatment <- rep(c("Low Dose", "High Dose", "Control"), 4)
Y <- c(12, 28, 23, 58, 82, 69, 16, 35, 42, 8, 15, 30)
D <- data.frame(Subject = Subject, Treatment = Treatment, Y = Y)
# Fit the two-way ANOVA model
model <- aov(Y ~ Subject + Treatment, data = D)
summary(model)
## Df Sum Sq Mean Sq F value Pr(>F)
## Subject 3 5142 1714.1 38.859 0.000252 ***
## Treatment 2 773 386.3 8.758 0.016609 *
## Residuals 6 265 44.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Q-Q plot to check for normality
plot(model, 2)
# Residuals versus fitted values
plot(model, 1)
# Tukey additivity plot
block_effects <- tapply(D$Y, D$Subject, mean)
treatment_effects <- tapply(D$Y, D$Treatment, mean)
grand_mean <- mean(D$Y)
comparison_values <- outer(block_effects, treatment_effects, FUN = "*") / grand_mean
comparison_values <- as.vector(comparison_values)
fit <- lm(D$Y ~ comparison_values + 0)
residuals <- resid(fit)
slope <- coef(fit)[1]
plot(comparison_values, residuals, xlab = "Comparison Values", ylab = "Residuals",
main = "Tukey Additivity Plot", pch = 16)
abline(a = 0, b = slope, col = "red")
legend("topright", legend = paste("Slope:", round(slope, 4)), col = "red", lwd = 1, bty = "n")
# Calculate suggested power transformation
power <- 1 - slope
power
## comparison_values
## 0.2742876
Normality Check: The Q-Q plot indicates that the normality assumption is satisfied as the points tend to follow the line.
Constant Variance Check: The residuals versus fitted values plot shows a fairly even spread of points along the horizontal axis, suggesting that the variance of the residuals is approximately constant across all levels of the predictor variables.
Tukey Additivity Plot: The slope of the fitted line in the Tukey additivity plot is approximately 0.7257. Therefore, the suggested power transformation is 1−0.7257 = 0.2743 which is not very close to 0.5, indicating that a different transformation may be appropriate.
# Fit a two-way ANOVA model
model <- aov(Y ~ Subject + Treatment, data = D)
summary(model)
## Df Sum Sq Mean Sq F value Pr(>F)
## Subject 3 5142 1714.1 38.859 0.000252 ***
## Treatment 2 773 386.3 8.758 0.016609 *
## Residuals 6 265 44.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can test hypotheses
We can also find a confidence interval
library(lsmeans)
## Loading required package: emmeans
## The 'lsmeans' package is now basically a front end for 'emmeans'.
## Users are encouraged to switch the rest of the way.
## See help('transition') for more information, including how to
## convert old 'lsmeans' objects and scripts to work with 'emmeans'.
model <- aov(Y ~ Treatment + Subject, data = D)
# Compute least square means and confidence intervals
lsmeans_model <- lsmeans(model, "Treatment")
lsmeans_model
## Treatment lsmean SE df lower.CL upper.CL
## Control 41.0 3.32 6 32.9 49.1
## High Dose 40.0 3.32 6 31.9 48.1
## Low Dose 23.5 3.32 6 15.4 31.6
##
## Results are averaged over the levels of: Subject
## Confidence level used: 0.95
# Calculate confidence intervals for pairwise comparisons
pairwise_CI <- pairs(lsmeans_model, adjust = "none", infer = TRUE)
pairwise_CI
## contrast estimate SE df lower.CL upper.CL t.ratio p.value
## Control - High Dose 1.0 4.7 6 -10.49 12.5 0.213 0.8384
## Control - Low Dose 17.5 4.7 6 6.01 29.0 3.726 0.0098
## High Dose - Low Dose 16.5 4.7 6 5.01 28.0 3.513 0.0126
##
## Results are averaged over the levels of: Subject
## Confidence level used: 0.95
The result shows that
Fitting a two way anova model for the data from RiverIron gives us:
# Fit a two-way ANOVA model
riveriron <- read.csv("RiverIron.csv")
model <- aov(Iron ~ River + Site, data = riveriron)
summary(model)
## Df Sum Sq Mean Sq F value Pr(>F)
## River 3 542458 180819 7.041 0.0216 *
## Site 2 442395 221198 8.613 0.0172 *
## Residuals 6 154083 25680
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1