Sarah Pennington
Imagine that you want to test how a warming climate affects primary
production in grasslands. You will perform an in situ warming
experiment, in which you use ceramic heaters to warm 2 x 2 meter
experimental plots by 4ºC, continuously for two years. Control plots are
unwarmed, and the treatment and control plots will be randomly assigned.
Biomass production is measured at the plot scale using appropriate
methods.
(1) You have limited time and money, which means that the number of
replicate treatment and control plots is limited. Before designing and
performing the study, let’s consider how the number of replicates will
affect our statistical power: our ability to detect an effect of
warming.
Assume that the true treatment mean is 460, and the true control
mean is 415 (the numbers are aboveground net primary production (ANPP) –
grams per square meter per year). Assume that the standard deviation of
the variation between plots is 110.
#Start by assuming three replicates each of the treatment and control plots. Perform 1000 simulations, where each time you draw three treatment ANPP values and three control ANPP values, using the appropriate means and standard deviation, from a normal distribution.For each random draw, fit a linear model testing for a treatment effect. Save the p-value from an F-test for the treatment effect (you can extract this with Anova(model)$P[1]). Also save the model coefficient that quantifies the difference between the treatment and control groups. At the end you should have 1000 p-values and 1000 coefficient values.
treat_m <- 460
control_m <- 415
sd <- 110
rep <- 3
sim <- 1000
p_values <- numeric(sim)
coefficients <- numeric(sim)
for (i in 1:sim) {
treatment_data <- rnorm(rep, mean = treat_m, sd = sd)
control_data <- rnorm(rep, mean = control_m, sd = sd)
data <- data.frame(
ANPP = c(treatment_data, control_data),
group = factor(rep(c("treat", "control"), each = rep))
)
model <- lm(ANPP ~ group, data = data)
p_values[i] <- Anova(model)$P[1]
coefficients[i] <- coef(model)[2]
}
p.coef <- data.frame(p_value = p_values, coefficient = coefficients)
str(p.coef)
## 'data.frame': 1000 obs. of 2 variables:
## $ p_value : num 0.366 0.459 0.73 0.295 0.354 ...
## $ coefficient: num -106 102.1 34.3 68.9 41.9 ...
#What proportion of the p-values are less than 0.05? This is your statistical power, under this design, effect size, and residual noise.
count <- sum(p_values < 0.05)
prop <- count/ 1000
print(prop)
## [1] 0.07
Now repeat this whole process using different numbers of replicates:
5, 10, 20, 50, 100. Plot the statistical power vs. the number of
replicates. How much replication do you need to achieve a power of 0.25?
This means that when there is a real treatment effect, you will detect
it (only) 25% of the time.
reps <- c(3,5, 10, 20, 50, 100)
power_data <- data.frame(replicates = integer(), power = numeric())
for (rep in reps) {
p_values <- numeric(sim)
coefficients <- numeric(sim)
for (i in 1:sim) {
treatment_data <- rnorm(rep, mean = treat_m, sd = sd)
control_data <- rnorm(rep, mean = control_m, sd = sd)
data <- data.frame(
ANPP = c(treatment_data, control_data),
group = factor(rep(c("treat", "control"), each = rep))
)
model <- lm(ANPP ~ group, data = data)
p_values[i] <- Anova(model)$P[1]
coefficients[i] <- coef(model)[2]
}
power <- mean(p_values < 0.05)
power_data <- rbind(power_data, data.frame(replicates = rep, power = power))
}
str(power_data)
## 'data.frame': 6 obs. of 2 variables:
## $ replicates: num 3 5 10 20 50 100
## $ power : num 0.084 0.093 0.116 0.239 0.529 0.819
#Make a plot
ggplot(power_data, aes(x = replicates, y = power)) +
geom_line() +
geom_point() + scale_x_continuous(limits= c(0,100), breaks= seq(0, 100, by = 10)) +
theme_clean()

# How much replication do you need to achieve a power of 0.25? This means that when there is a real treatment effect, you will detect it (only) 25% of the time.
###You would need about 20 reps to detect a p-value < 0.5, 25% of the time.
(2) Now let’s use the simulation results to answer a different
question. For situations where statistical power is low, a treatment
effect can only be ‘significant’ if it is quite large, and this may
cause the treatment effect to be exaggerated by chance. This has been
termed Type M error, where the M stands for magnitude, because you are
making an error about the magnitude of the effect (see the attached
paper).
#For each of the simulations you performed above, you saved the model coefficients. So you should have 1000 coefficients for each level of replication (3, 5, 10, 20, 50, 100). Take those coefficients, and for each level of replication calculate the mean of the coefficients using only models where p < 0.05. This is simulating the following process: if you perform the experiment, and p > 0.05, you report 'no effect’, but if p < 0.05 you report ‘significant effect’. We want to know if the reported significant effects are biased.
mean_coefficients <- data.frame(replicates = integer(), mean_coef = numeric())
for (rep in reps) {
p_values <- numeric(sim)
coefficients <- numeric(sim)
for (i in 1:sim) {
treatment_data <- rnorm(rep, mean = treat_m, sd = sd)
control_data <- rnorm(rep, mean = control_m, sd = sd)
data <- data.frame(
ANPP = c(treatment_data, control_data),
group = factor(rep(c("treat", "control"), each = rep))
)
model <- lm(ANPP ~ group, data = data)
p_values[i] <- Anova(model)$P[1]
coefficients[i] <- coef(model)[2]
}
significant_coefficients <- coefficients[p_values < 0.05]
mean_coef <- ifelse(length(significant_coefficients) > 0, mean(significant_coefficients), NA)
mean_coefficients <- rbind(mean_coefficients, data.frame(replicates = rep, mean_coef = mean_coef))
}
print(mean_coefficients)
## replicates mean_coef
## 1 3 124.88468
## 2 5 144.01812
## 3 10 112.83963
## 4 20 87.96205
## 5 50 61.70067
## 6 100 49.82677
#How does the mean of the significant coefficients change as the number of replicates increases? Recall that because this is a simulation, we know the true value of the treatment difference: it’s 461 - 415 = 46. How much larger is the simulated value from the significant experiments, compared to the true value? This is the type M error. What are the potential implications for our understanding of climate change, if most warming experiments have low power?
###The Coefficient mean decreases as the number of replicates increases. The simulated values are much higher than the actual value in the lower replicate sizes. The effect sizes are too long for the lower reps, and the ability to detect the actual effect (power) is lower. For climate change studies, that means we could be drawing the wrong conclusions with greater perceived certainty.