TOST vs T-test for Equivalence Testing

Undesirable property of the using not rejecting the null to declare equivalence: as precision decreases, we are more likely to declare equivalence. As our standard error decreases, we should be making more precise statements.

Simulation of 5000 trials at differing standard deviations. Two groups, each 12 patients, equal variances, assume our new treatment has a mean of 105 and the old treatment 100.

set.seed(123)

# Num trials
n = 5000
# Number of patients in each group
n1 = 12
n2 = 12

# Function returns percent of trials rejected and not rejected, t-test
myTtest <- function(sd) {
pvals = numeric(n)
for (i in 1:n) {
# Assume the true means are 105 and 100, simulate
t1 = rnorm(n = n1, mean = 100, sd = sd)
t2 = rnorm(n = n2, mean = 105, sd = sd)
pvals[i] = t.test(t2, t1, var.equal = TRUE)$p.value

}

percent_rejected = length(pvals[pvals < 0.05]) / n
return(list(percent_rejected = percent_rejected))
}

# Vary standard deviation from 1 to 30
mySD = seq(10, 30, 0.5)

results <- lapply(mySD, myTtest)

perc_rej = numeric(length(mySD))
for (i in 1:length(mySD)) {
perc_rej[i] = results[[i]]$percent_rejected
}

plot(mySD,
1 - perc_rej,
type = 'l',
ylab = 'Percent Declared Equivalent',
xlab = 'SD')

library(equivalence)
# Num trials
n = 5000
# Number of patients in each group
n1 = 6
n2 = 6

# Function returns percent of trials rejected and not rejected, TOST

myTostTest <- function(sd) {
pvals2 = numeric(n)
for (i in 1:n) {
# Assume the true means are 105 and 100, simulate
t1 = rnorm(n = n1, mean = 100, sd = sd)
t2 = rnorm(n = n2, mean = 105, sd = sd)
# Calculate t-stat
pvals2[i] <- tost(t1, t2, epsilon = 20, var.equal = TRUE)$tost.p.value
}
percent_rejected2 = length(pvals2[pvals2 < 0.05]) / n
return(list(percent_rejected2 = percent_rejected2))
}

# Vary standard deviation from 1 to 30
mySD = seq(10, 30, 0.5)

results2 <- lapply(mySD, myTostTest)

perc_rej2 = numeric(length(mySD))
for (i in 1:length(mySD)) {
perc_rej2[i] = results2[[i]]$percent_rejected2
}

plot(mySD,
perc_rej2,
type = 'l',
ylab = 'Percent Declared Equivalent',
xlab = 'SD')

Our ability to detect equivalence decreases as our precision decreases with the TOST. The opposite is true, for the two-sided t-test.