Type I error (false alarms) occurs when a statistical test signals a significant difference where one does not exist.
If we took take samples from the same population, we'd expect the difference between the mean of samples 1 and 2 to be very small (and hence not statistically significant). However, once in awhile, our statistical tests will mess up. How often? It depends on your significance threshold \( \alpha \) (alpha).
The following code takes two random samples from same the population. Uses a T-test to see if their means are equal. It will continue until a Type 1 error is committed, the counter tracks the number successful tests before a Type I error occurs.
# An illustration of TYPE 1 Error Create a population with mean=100 and
# sd=20
pop <- rnorm(1e+06, mean = 100, sd = 20)
# Take two random samples from the population
samp1 <- sample(pop, 100, repl = F)
samp2 <- sample(pop, 100, repl = F)
# the samples should be similar but not the same
summary(samp1)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 57.6 87.7 98.7 99.0 107.0 144.0
summary(samp1)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 57.6 87.7 98.7 99.0 107.0 144.0
# test the sample means
test.result <- t.test(samp1, samp2)
The code above illustrates the use of the sample(), t.test(), rnorm(), and summary() functions. These should all be in your r-journals. See the lecture notes for help interpreting the t.test results.
The code below illustrates the prevalance of type 1 errors. It allows you to choose a signifigance threshold (alpha). It will continue taking random samples from the same population until it committs a Type 1 error. Note, the code below is complicated, if you are new to programming ignore it…
alpha <- 0.05 #signifigance threshold
counter <- 1 #count repetitions
repeat {
samp1 <- sample(pop, 100, repl = F)
samp2 <- sample(pop, 100, repl = F)
test.result <- t.test(samp1, samp2)
print(counter)
if (test.result$p.value < alpha)
break
counter <- counter + 1
}
## [1] 1
## [1] 2
## [1] 3
test.result #prints the result of the test containing type 1 error
##
## Welch Two Sample t-test
##
## data: samp1 and samp2
## t = 2.379, df = 197.2, p-value = 0.01833
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.13 12.10
## sample estimates:
## mean of x mean of y
## 105.66 99.04
An illustration of how alpha affects the prevalance of type 1 errors.
numTests <- 1000 #number of t-test
alphaSet = c(0.001, 0.01, 0.05, 0.1, 0.2) #set of alpha values to test
sigTests <- matrix(nrow = length(alphaSet) * numTests, ncol = 3)
counter <- 1
for (i in 1:numTests) {
for (alpha in alphaSet) {
# take two samples from the same population
samp1 <- sample(pop, 100, repl = F)
samp2 <- sample(pop, 100, repl = F)
# test sample means
test.result <- t.test(samp1, samp2)
# recored results of test
if (test.result$p.value < alpha) {
sigTests[counter, 1] <- 1
sigTests[counter, 2] <- test.result$p.value
sigTests[counter, 3] <- alpha
} else {
sigTests[counter, 1] <- 0
sigTests[counter, 2] <- test.result$p.value
sigTests[counter, 3] <- alpha
}
counter <- counter + 1
}
}
sigTests <- as.data.frame(sigTests) #convert to data.frame object (easier to manupulate)
names(sigTests) <- c("type_1_errors", "p-value", "alpha") #assign column names
The results below show how alpha relates to Type 1 errors. For example, with an alpha of .2 roughly 20% of the time you should get a false alarm.
aggregate(sigTests$type_1_errors ~ sigTests$alpha, FUN = sum) #produce results of experiment
## sigTests$alpha sigTests$type_1_errors
## 1 0.001 1
## 2 0.010 10
## 3 0.050 43
## 4 0.100 110
## 5 0.200 222
An illustration of how alpha affects the prevalance of Type 2 errors.
numTests <- 1000 #number of t-test
difference <- 5 #difference in group means
alphaSet = c(0.001, 0.01, 0.05, 0.1, 0.2) #set of alpha values to test
sigTests <- matrix(nrow = length(alphaSet) * numTests, ncol = 3)
counter <- 1
for (i in 1:numTests) {
for (alpha in alphaSet) {
# take two samples from DIFFERENT populations
samp1 <- rnorm(100, mean = 100, sd = 10)
samp2 <- rnorm(100, mean = 100 + difference, sd = 10)
test.result <- t.test(samp1, samp2)
if (test.result$p.value > alpha) {
sigTests[counter, 1] <- 1
sigTests[counter, 2] <- test.result$p.value
sigTests[counter, 3] <- alpha
} else {
sigTests[counter, 1] <- 0
sigTests[counter, 2] <- test.result$p.value
sigTests[counter, 3] <- alpha
}
counter <- counter + 1
}
}
sigTests <- as.data.frame(sigTests) #convert to data.frame object (easier to manupulate)
names(sigTests) <- c("type_2_errors", "p-value", "alpha") #assign column names
The results below show that when alpha is really low Type II errors become more common.
aggregate(sigTests$type_2_errors ~ sigTests$alpha, FUN = sum) #produce results of experiment
## sigTests$alpha sigTests$type_2_errors
## 1 0.001 374
## 2 0.010 162
## 3 0.050 56
## 4 0.100 42
## 5 0.200 9