Suppose \(x_1,x_2,...,x_n\) are independently sampled from a multivariate normal distribution with mean vector \(\bf{\mu}\) and covariance matrix \(\bf{\Sigma}\).

Following our usual hypothesis testing logic, we would like to find out if the population mean vector \(\bf{\mu}\) is equal to some specific mean vector, say \(\bf{\mu}_0\).

Next, consider our two hypothesis:

\[H_0: \bf{\mu}= \bf{\mu}_0\] \[H_1: \bf{\mu}\neq \bf{\mu}_0\]

Note that we are testing mean vectors, which is equivalent to testing multiple means together, i.e. \(H_0: \mu_1 = \mu_1^0,...\mu_p = \mu_p^0\) and \(H_1: \mu_1 \neq \mu_1^0,...\mu_p \neq \mu_p^0\).

If we follow the logic from univariate testing, we are left to compute the test statistic for each univariate mean \(T_j = \frac{\bar{x}_j-\mu_j^0}{\frac{s_j}{\sqrt{n}}} \sim{\rm}t_{n-1}\). However, this algorithm has several issues with reliability. The biggest issue with this scheme is that it does not control the family wide error rate (FWER). The FWER is the probability of rejecting at least one of the null hypothesis when all of them are true. The formula for FWER can be written as the following.

When the \(p\) test statistics are independent, rejecting each null hypothesis at significance level \(\alpha\) will result in a family wide error rate \(FWER = 1-(1-\alpha)^p \geq \alpha\).

We can see that \(\lim_{p\to\infty} 1-(1-\alpha)^p = 1\) since \(0\leq\alpha\leq1\). Hence, the probability of rejecting at least one true null hypothesis is large if the number of variables is large.

We can visualize this below (with \(\alpha = 0.05\)):

p <- seq(0,100)
a <- 0.05
FWER <- 1-(1-a)^p
plot(p,FWER, pch = 20,col="blue")

There are two methods of fixing this error rate, namely Bonferroni Correction and Holm’s Method.

To control the FWER at level \(\alpha\), the Bonferroni Correction sets the level of significance for each test at \(\frac{\alpha}{m}\), given that we have \(m\) hypotheses. This allows the FWER to approximate \(\alpha\) for any number of independent tests. We can visualize the effect of Bonferroni Correction below:

plot(p,FWER, pch = 20,col = "blue")
m <- 100
b <- a/m
BC <- 1-(1-b)^p
points(p, BC, pch = 20, col = "red")
legend(-2, 1.01, legend=c("No Correction", "Bonferroni Correction"),
       col=c("blue", "red"), pch=20, cex=0.8)

While the Bonferroni Correction addresses Type I error rate, it fails to protect type II error rate and needs to be operated under the assumption of independence. To combat these issues, we can use Holm’s Method to correct the FWER sequentially. Holm’s Method is explained as follows.

Suppose we have \(m\) null hypotheses. We will denote the corresponding p-values in ascending order: \(p_1 \le p_2 \le ... \le p_m\). Now, to control FWER at level \(\alpha\), we adjust the level of significance for the hypothesis testing with the \(j\)’th smallest p-value by \(\alpha^* = \frac{\alpha}{m-j+1}\). Below is a plot showing the FWER rate using Holm’s Method.

Holm’s Method will downscale the significance level \(\alpha\) (or upscale p-value) by a factor depending on the order of the p-value, which gives uniformly higher power to the overall hypothesis test, while Bonferroni’s Correction rejects null hypotheses with level of significance \(\frac{\alpha}{m}\) at the cost of an increased risk of type II errors.

Neither method will guarantee significant protection from type II errors, although Holm’s Method will provide increased security over Bonferroni’s Correction. Overll, FWER control is much more important when the risk of a type I error far outweighs the risk of a type II error. Both of these methods gain strict control of type I error rate at the cost of a higher type II error rate (although Holm’s is much better at controlling type II).