library(here)
library(tidyverse)
library(glmmTMB)
library(ggeffects)
library(arm)
library(car)
library(ggplot2)
library(tidyverse)
Imagine that you want to test how a warming climate affects primary production in grasslands. You will perform an in situ warming experiment, in which you use ceramic heaters to warm 2 x 2 meter experimental plots by 4oC, continuously for two years. Control plots are unwarmed, and the treatment and control plots will be randomly assigned. Biomass production is measured at the plot scale using appropriate methods.
(1) You have limited time and funds, which means that the number of replicate treatment and control plots is limited. Before designing and performing the study, let’s consider how the number of replicates will affect our statistical power: our ability to detect an effect of warming, if such an effect exists.
Based on previous studies in similar environments, we can get a sense for what the effect size might be: in other words, the true magnitude of the warming effect. We can also get a sense for how variable productivity is between plots in the same treatment, for reasons outside of our control (i.e., the residual noise). Assume that the true treatment mean is 460, and the true control mean is 415 (the numbers are aboveground net primary production (ANPP) – grams per square meter per year). Assume that the standard deviation of the variation between plots is 110. These numbers are based on real data synthesized in the attached paper by Lemoine et al.
Start by assuming three replicates each of the treatment and control plots. Perform 1000 simulations, where each time you draw three treatment ANPP values and three control ANPP values, using the appropriate means and standard deviation, from a normal distribution. For each random draw, fit a linear model testing for a treatment effect. Save the p-value from an F-test for the treatment effect (you can extract this with Anova(model)$P[1]). Also, save the model coefficient that quantifies the difference between the treatment and control groups. At the end you should have 1000 p-values and 1000 coefficient values. What proportion of the p-values are less than 0.05? This is your statistical power — the probability that you will detect a ‘significant’ effect — under this design, effect size, and residual noise.
#parameters
treatmentmean = 460
controlmean = 415
standarddev = 110
nsims = 1000
rep = 3
#vectors to store coefficients
p.value.saved = numeric(nsims)
coefficients = numeric(nsims)
#fit a model
for (i in 1:nsims) {
#generate a predictor and response variable
ANPPcontrol = rnorm(rep, mean = controlmean, sd = standarddev) #control plot
treatment = rnorm(rep, mean = treatmentmean, sd = standarddev) #treatment (warm)
#dataframe for predictor and response variable
df = data.frame(
ANPP = c(ANPPcontrol, treatment), #values for ANPP control, then treatment
treatment = factor(rep(c("control", "treated"), each = rep)) #descriptor of treatment, repeat treated and control 3 times
)
#fit a model with normal distribution
model = lm(ANPP ~ treatment, data = df) #treatment (warmed plots) is a predictor for control ANPP
#save values
p.value.saved[i] = Anova(model)$P[1]
coefficients[i] = coef(model)[2] #treatment coefficients, the slope
}
savedcoef <- data.frame(p_value = p.value.saved, coefficient = coefficients)
str(savedcoef)
## 'data.frame': 1000 obs. of 2 variables:
## $ p_value : num 0.0665 0.47 0.5503 0.3142 0.2812 ...
## $ coefficient: num -87.1 46.8 -64.8 141.4 62.7 ...
#What proportion of the p-values are less than 0.05? This is your statistical power, under this design, effect size, and residual noise.
pval.prop <- sum(p.value.saved < 0.05)/nsims
print(paste("Proportion of times p-val less than .05:", pval.prop))
## [1] "Proportion of times p-val less than .05: 0.065"
Now repeat this whole process using different numbers of replicates: 5, 10, 20, 50, 100. Plot the statistical power vs. the number of replicates. How much replication do you need to achieve a power of 0.25? This means that when there is a real treatment effect, you will detect it (only) 25% of the time.
#new number of replicates
reps = c(3, 5, 10, 20, 50, 100)
#data frame to store power
power_data = data.frame(replicates = integer(), power = numeric())
for (rep in reps) {
#vectors to store
p.value.saved = numeric(nsims)
coefficients = numeric(nsims)
for (i in 1:nsims) {
#generate a predictor and response variable
ANPPcontrol = rnorm(rep, mean = controlmean, sd = standarddev) #control plot
treatment = rnorm(rep, mean = treatmentmean, sd = standarddev) #treatment (warm)
#dataframe for predictor and response variable
df = data.frame(
ANPP = c(ANPPcontrol, treatment), #values for ANPP control, then treatment
treatment = factor(rep(c("control", "treated"), each = rep)) #descriptor of treatment, repeat treated and control 3 times
)
#fit a model with normal distribution
model = lm(ANPP ~ treatment, data = df) #treatment (warmed plots) is a predictor for control ANPP
#save values
p.value.saved[i] = Anova(model)$P[1]
coefficients[i] = coef(model)[2] #treatment coefficients, the slope
}
power = mean(p.value.saved < 0.05)
power_data = rbind(power_data, data.frame(replicates = rep, power = power))
}
str(power_data)
## 'data.frame': 6 obs. of 2 variables:
## $ replicates: num 3 5 10 20 50 100
## $ power : num 0.069 0.091 0.15 0.274 0.511 0.829
To achieve a power of 0.25, the simulation requires slightly above 20 replicates (given that the power is 0.228).
ggplot(power_data, aes(x = replicates, y = power)) +
geom_line() +
geom_point() +
geom_hline(yintercept = 0.25, color = "red", linetype = "dashed", size = 0.5) +
scale_x_continuous(limits= c(0,100), breaks= seq(0, 100, by = 10)) +
labs(title = "Statistical power vs number of replicates",
x = "Number of Replicates",
y = "Power")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
(2) Now let’s use the simulation results to answer a different question. For situations where statistical power is low, a treatment effect can only be ‘significant’ if it is quite large, and this may cause the treatment effect to be exaggerated by chance. This has been termed Type M error, where the M stands for magnitude, because you are making an error about the magnitude of the effect (see the attached paper).
For each of the simulations you performed above, you saved the model coefficients. So you should have 1000 coefficients for each level of replication (3, 5, 10, 20, 50, 100). Take those coefficients, and for each level of replication calculate the mean of the coefficients using only models where p < 0.05. This is simulating the following process: if you perform the experiment, and p > 0.05, you report ‘no effect’, but if p < 0.05 you report ‘significant effect’. We want to know if the reported significant effects are biased.
How does the mean of the significant coefficients change as the number of replicates increases? Recall that because this is a simulation, we know the true value of the treatment difference: it’s 461 - 415 = 46. How much larger is the simulated value from the significant experiments, compared to the true value? This is the type M error. What are the potential implications for our understanding of climate change, if most warming experiments have low power?
#
mean_coefficients <- data.frame(replicates = integer(), mean_coef = numeric())
for (rep in reps) {
#vectors to store
p.value.saved = numeric(nsims)
coefficients = numeric(nsims)
for (i in 1:nsims) {
#generate a predictor and response variable
ANPPcontrol = rnorm(rep, mean = controlmean, sd = standarddev) #control plot
treatment = rnorm(rep, mean = treatmentmean, sd = standarddev) #treatment (warm)
#dataframe for predictor and response variable
df = data.frame(
ANPP = c(ANPPcontrol, treatment), #values for ANPP control, then treatment
treatment = factor(rep(c("control", "treated"), each = rep)) #descriptor of treatment, repeat treated and control 3 times
)
#fit a model with normal distribution
model = lm(ANPP ~ treatment, data = df) #treatment (warmed plots) is a predictor for control ANPP
#save values
p.value.saved[i] = Anova(model)$P[1]
coefficients[i] = coef(model)[2] #treatment coefficients, the slope
}
significant_coefficients = coefficients[p.value.saved < 0.05]
mean_coef = ifelse(length(significant_coefficients) > 0, mean(significant_coefficients), NA)
mean_coefficients = rbind(mean_coefficients, data.frame(Replicates = rep,
Mean_Coefficients = mean_coef))
}
print(mean_coefficients)
## Replicates Mean_Coefficients
## 1 3 143.69524
## 2 5 138.06749
## 3 10 109.56193
## 4 20 88.92773
## 5 50 60.89845
## 6 100 50.39317
The mean of the signifcant coefficients decreases as the number of replicates increases. The simulated value is higher than the true value, which is a type M error. The simulated values are much higher than the actual value in the lower replicate sizes.The potential implications for understanding climate change if warming experiments have lower power means that these could be incorrect conclusions.