OCN 683 - Homework 8

Sarah Pennington

Imagine that you want to test how a warming climate affects primary production in grasslands. You will perform an in situ warming experiment, in which you use ceramic heaters to warm 2 x 2 meter experimental plots by 4ºC, continuously for two years. Control plots are unwarmed, and the treatment and control plots will be randomly assigned. Biomass production is measured at the plot scale using appropriate methods.
(1) You have limited time and money, which means that the number of replicate treatment and control plots is limited. Before designing and performing the study, let’s consider how the number of replicates will affect our statistical power: our ability to detect an effect of warming.
Assume that the true treatment mean is 460, and the true control mean is 415 (the numbers are aboveground net primary production (ANPP) – grams per square meter per year). Assume that the standard deviation of the variation between plots is 110.
#Start by assuming three replicates each of the treatment and control plots. Perform 1000 simulations, where each time you draw three treatment ANPP values and three control ANPP values, using the appropriate means and standard deviation, from a normal distribution.For each random draw, fit a linear model testing for a treatment effect. Save the p-value from an F-test for the treatment effect (you can extract this with Anova(model)$P[1]). Also save the model coefficient that quantifies the difference between the treatment and control groups. At the end you should have 1000 p-values and 1000 coefficient values. 

treat_m <- 460 
control_m <- 415 
sd <- 110 
rep <- 3 
sim <- 1000 

p_values <- numeric(sim)
coefficients <- numeric(sim)

for (i in 1:sim) {
  treatment_data <- rnorm(rep, mean = treat_m, sd = sd)
  control_data <- rnorm(rep, mean = control_m, sd = sd)
  data <- data.frame(
    ANPP = c(treatment_data, control_data),
    group = factor(rep(c("treat", "control"), each = rep))
  )
  model <- lm(ANPP ~ group, data = data)
  p_values[i] <- Anova(model)$P[1]  
  coefficients[i] <- coef(model)[2] 
}

p.coef <- data.frame(p_value = p_values, coefficient = coefficients)
str(p.coef)
## 'data.frame':    1000 obs. of  2 variables:
##  $ p_value    : num  0.366 0.459 0.73 0.295 0.354 ...
##  $ coefficient: num  -106 102.1 34.3 68.9 41.9 ...
#What proportion of the p-values are less than 0.05? This is your statistical power, under this design, effect size, and residual noise. 

count <- sum(p_values < 0.05)
prop <- count/ 1000 

print(prop)
## [1] 0.07
Now repeat this whole process using different numbers of replicates: 5, 10, 20, 50, 100. Plot the statistical power vs. the number of replicates. How much replication do you need to achieve a power of 0.25? This means that when there is a real treatment effect, you will detect it (only) 25% of the time.
reps <- c(3,5, 10, 20, 50, 100)
power_data <- data.frame(replicates = integer(), power = numeric())

for (rep in reps) {
  p_values <- numeric(sim)
  coefficients <- numeric(sim)
  for (i in 1:sim) {
    treatment_data <- rnorm(rep, mean = treat_m, sd = sd)
    control_data <- rnorm(rep, mean = control_m, sd = sd)
    data <- data.frame(
      ANPP = c(treatment_data, control_data),
      group = factor(rep(c("treat", "control"), each = rep))
    )
    model <- lm(ANPP ~ group, data = data)
    p_values[i] <- Anova(model)$P[1]  
    coefficients[i] <- coef(model)[2]
  }
  power <- mean(p_values < 0.05)
  power_data <- rbind(power_data, data.frame(replicates = rep, power = power))
  
}

str(power_data)
## 'data.frame':    6 obs. of  2 variables:
##  $ replicates: num  3 5 10 20 50 100
##  $ power     : num  0.084 0.093 0.116 0.239 0.529 0.819
#Make a plot 

ggplot(power_data, aes(x = replicates, y = power)) +
  geom_line() + 
  geom_point() + scale_x_continuous(limits= c(0,100), breaks= seq(0, 100, by = 10)) + 
  theme_clean() 

# How much replication do you need to achieve a power of 0.25? This means that when there is a real treatment effect, you will detect it (only) 25% of the time.  
  ###You would need about 20 reps to detect a p-value < 0.5, 25% of the time. 
(2) Now let’s use the simulation results to answer a different question. For situations where statistical power is low, a treatment effect can only be ‘significant’ if it is quite large, and this may cause the treatment effect to be exaggerated by chance. This has been termed Type M error, where the M stands for magnitude, because you are making an error about the magnitude of the effect (see the attached paper).
#For each of the simulations you performed above, you saved the model coefficients. So you should have 1000 coefficients for each level of replication (3, 5, 10, 20, 50, 100). Take those coefficients, and for each level of replication calculate the mean of the coefficients using only models where p < 0.05. This is simulating the following process: if you perform the experiment, and p > 0.05, you report 'no effect’, but if p < 0.05 you report ‘significant effect’. We want to know if the reported significant effects are biased. 

mean_coefficients <- data.frame(replicates = integer(), mean_coef = numeric())
for (rep in reps) {
  p_values <- numeric(sim)
  coefficients <- numeric(sim)
  for (i in 1:sim) {
    treatment_data <- rnorm(rep, mean = treat_m, sd = sd)
    control_data <- rnorm(rep, mean = control_m, sd = sd)
    data <- data.frame(
      ANPP = c(treatment_data, control_data),
      group = factor(rep(c("treat", "control"), each = rep))
    )
    model <- lm(ANPP ~ group, data = data)
    p_values[i] <- Anova(model)$P[1]  
    coefficients[i] <- coef(model)[2] 
  }
  significant_coefficients <- coefficients[p_values < 0.05]
  mean_coef <- ifelse(length(significant_coefficients) > 0, mean(significant_coefficients), NA)
  mean_coefficients <- rbind(mean_coefficients, data.frame(replicates = rep, mean_coef = mean_coef))
}

print(mean_coefficients)
##   replicates mean_coef
## 1          3 124.88468
## 2          5 144.01812
## 3         10 112.83963
## 4         20  87.96205
## 5         50  61.70067
## 6        100  49.82677
#How does the mean of the significant coefficients change as the number of replicates increases? Recall that because this is a simulation, we know the true value of the treatment difference: it’s 461 - 415 = 46. How much larger is the simulated value from the significant experiments, compared to the true value? This is the type M error. What are the potential implications for our understanding of climate change, if most warming experiments have low power?

  ###The Coefficient mean decreases as the number of replicates increases. The simulated values are much higher than the actual value in the lower replicate sizes. The effect sizes are too long for the lower reps, and the ability to detect the actual effect (power) is lower. For climate change studies, that means we could be drawing the wrong conclusions with greater perceived certainty.