In the article entitled, “Power failure: why small sample size undermines the reliability of neuroscience”, Button et al. explain concerns with stastical power, particularly when it’s low. Statistical power shows investigators how likely it is that there are false negative comparisons for two groups. This involves taking a method of measurement (such as the mean), and quantitatively comparing these methods to get the effect size. Sometimes, the rate of false negatives is high, which means there is low statistical power. This can be due to a small sample size, a small effect size, or a combination of the two. It has been shown that studies with low statistical power produce more false negatives than those with high power. This means that there is a lowered discovery rate, because these measurements mightnot meet the statistically significant threshold (usually p < 0.05). These low-power studies might also only be adequate for when there is a large difference between these groups. As such, this can inflate the results. The authors show instances of these in the field of neuroscience, to show how publication bias can affect results. Using meta-analysis, which is becoming increasingly popular, the authors highlight that neuroimaging studies use a small sample size usually, and that these sample sizes have egregiously low statistical power (usually less than 50%). In light of this, there is a high chance of overinflating resutls, which begs the question if these studies are really ethical: why perform experiments on animals, or even humans, if the power isn’t strong enough to detect more subtle changes.
Figure 1 shows the power curve for a decrease of relative risk by 90% (25% to 2.5% risk) with an \(\alpha\) of 0.05. By examining the figure, one can see that a sample size of 13 is required in both the control and experimental birth control group to achieve 80% power; therefore, 26 subjects are required. To see a statistical power of 95%, 26 human subjects are required for each group, which totals 52 subjects.
The power curve generated for an effect size of 50 and an \(\alpha\) = 0.05 is depicted above. According to Figure 2, to achieve 80% power, one would need to have a control and experimental group each containing 10 raptors, totalling 20 animals. To achieve 95% power, one would need 18 animals in each group, totalling 36 subjects.
#function for all of the stuff
power.n.fun <- function (n.sizes, risk.old, risk.new, deals = 10000){
RR.b = rep(NA,deals)
power = rep(NA, length(n.sizes))
deals = 10000
alpha = 0.05
for (j in 1:length(n.sizes)){
n.treat = c(n.sizes[j],n.sizes[j])
for(i in 1:deals){
new.b = sample(c("bad","good"),n.treat[1], replace = TRUE,
prob = c(risk.old, 1 - risk.old))
old.b = sample(c("bad","good"), n.treat[2], replace = TRUE,
prob = c(risk.old, 1 - risk.old))
n.bad.b = c(sum(new.b == "bad"),sum(old.b == "bad"))
risk.b = n.bad.b/n.treat
RR.b[i] = risk.b[1]/risk.b[2]
}
RR.null = RR.b
RR.crit = sort(RR.null)[alpha*deals]
for (i in 1:deals){
new.b = sample(c("bad","good"), n.treat[1], replace = TRUE,
prob=c(risk.new, 1-risk.new))
old.b = sample(c("bad","good"), n.treat[2], replace = TRUE,
prob=c(risk.old, 1-risk.old))
n.bad.b = c(sum(new.b == "bad"),sum(old.b == "bad"))
risk.b = n.bad.b/n.treat
RR.b[i] = risk.b[1]/risk.b[2]
}
RR.effect = RR.b
power[j] = sum(RR.effect <= RR.crit,na.rm = TRUE)/deals
}
return(power)
}
n.sizes = seq(1,100, 1)
bern.bc <-power.n.fun(n.sizes = n.sizes, risk.old = 0.25, risk.new = 0.025)
matplot(n.sizes, bern.bc, type = "l", lwd = 2, col = "navy",
xlab = "Sample Sizes", ylab = "Power", main = "Figure 1",
axes = F)
axis(side=2, at=seq(0, 1, by=0.1))
axis(side=1, at=seq(0, 100, by=5))
abline(h=.8, lty = 3)
abline(h=.95, lty = 3)
text(x= 100,y=0.8, labels=paste("80%"), pos = 1)
text(x= 100,y=0.95, labels=paste("95%"), pos = 1)
abline(v = 13)
text(x= 13, y= 0.5, label=paste("n=13"), pos = 4)
abline(v = 26)
text(x= 26, y= 0.5, label=paste("n=26"), pos = 4)
effectsizes <- 50
samplesizes <- seq(1,100, by=1)
xaxis <- samplesizes[seq(5, length(samplesizes), 5)]
X <- c(250, 276, 277, 278, 301, 311, 331, 336, 338, 340)
Y <- c(250, 276, 277, 278, 301, 311, 331, 336, 338, 340)
nboot = 10000
deals = 10000
dstat<- function(x1,x2, center=mean){center(x1) - center(x2)}
d0 <- dstat(X,Y)
dnull <- array(NA, c(length(samplesizes), nboot));
for (j in 1:length(samplesizes)){
for (k in 1:nboot){
x <- sample(X-d0, samplesizes[j]/2, replace=TRUE)
y <- sample(Y, samplesizes[j]/2, replace=TRUE)
dnull[j,k]<- dstat(x,y)
}
}
crit <- apply(dnull,1,quantile, 0.95, na.rm =T )
deffect <- array(NA, c(length(effectsizes), length(samplesizes), nboot));
for (i in 1:length(effectsizes)){
for (j in 1:length(samplesizes)){
for (k in 1:nboot){
x <- sample (X-d0+effectsizes[i], samplesizes[j]/2,replace = TRUE)
y <- sample (Y, samplesizes[j]/2, replace= TRUE)
deffect[i,j,k] <- dstat(x,y)
}
}
}
power.array <- array(NA, c(length(effectsizes), length(samplesizes)));
for (i in 1:length(effectsizes)){
for (j in 1:length(samplesizes)){
power.array[i,j] <- sum(deffect[i,j,] >= crit[j])/ nboot
}
}
power.rep =rep(NA, c(length(samplesizes)));
for(i in 1:length(samplesizes)) {
power.rep[i]= sum(deffect[1,i,]>=crit[i])/10000
}