Say I weigh 20 people that eat burgers and 20 that don’t. My results are not significant. So I weigh all these people a second time to get a better measurement.
I’ve just doubled my dataset! 40 measurements from burger eaters. 40 from sad people. My results show that burger eaters are significantly lighter.
Are they?
get_data <- function(n = 20, resample_n = 5) {
samples <- rnorm(n)
resamples <- rep(samples, resample_n)
resamples
}
is_significant <- function(...) {
a <- get_data(...)
b <- get_data(...)
t.test(a, b)$p.value < 0.05
}
positive_rate <- function(...) {
total <- 0
for(i in 1:10000) {
total = total + is_significant(...)
}
rate <- total / 10000
rate
}
to_test <- expand.grid(
n = 20,
resample_n = c(1, 2, 5, 10, 20, 100)
)
for(i in 1:nrow(to_test)) {
row = to_test[i,]
rate = with(row, positive_rate(n, resample_n))
to_test[i,'rate'] <- rate
}
library(ggplot2)
library(scales)
ggplot(to_test, aes(resample_n, rate)) +
geom_point() +
geom_hline(yintercept=0.05, color='black') +
scale_x_log10(breaks=c(1, 2, 5, 10, 20, 100)) +
scale_y_continuous(labels = percent, limits=c(0, 1)) +
geom_path(color="red") +
ylab("Rate of 'significant' results") +
xlab("Number of times each sample is duplicated")
No!
Here we see the number of ‘signficant’ results (y-axis) for several simulations (red line). Remember, 0.05 (black line) would be expected by chance. Everything above that is a false positive. False positives become common when we include repeated measurments of the same samples (x-axis). If, for instance, we duplicate each sample 10 times then we get significant results over 50% of the time!
Make sure your samples are independent!