library(ggplot2)
library(MASS)
library(mvtnorm)
Bartlett’s test is used to examine if a positive, real number of samples from a population has equal variances, also known as a “homogeniety of variances”. It’s defined as:
\(H_0: \sigma_0^2 = \sigma_1^2 =...= \sigma_k^2\)
\(H_A: \sigma_i^2 \neq \sigma_j^2\)
\(T = \frac{(N-k)ln(s_p^2) - \sum^k_{i=1}(N_i - 1)ln(s^2_i)}{1 + \frac{1}{3(k-1)}*(\sum(\frac{1}{N_i - 1})) - \frac{1}{N-k}}\)
where…
\(s_i^2\) is the variance of the \(i^{th}\) group
\(N\) is the total sample size
\(N_i\) is the sample size of the \(i^{th}\) group
\(k\) is the number of groups
and \(s_p^2\) is the pooled variance, which is defined as…
\(s_p^2 = \sum_{i=1}^k(N_i - 1) * \frac{s_i^2}{N-k}\)
Why would anybody need to perform this analysis? Well, to verify an assumption that all the variances are equal across samples. Verifying an assumption arises when one is considering a statistical test, such as ANOVA which does indeed assume all variances are equal among samples of the population.
As shown above, Bartlett’s test utilizes a null hypothesis and an alternative hypothesis, which respectively say the variances among the groups are equal, and not equal (in the case of the alternative hypothesis). To illustrate this simply, an example in R is performed using the Oranges dataset. Here, we wish to test if the variances circumference of each tree. The null hypothesis states that the trees will have the same variances, otherwise for the alternative hypothesis. Via stats::bartlett.test(), this is achieved. We assume a 95% level of confidence: thus we maintain the null hypothesis if the p-value is greater than 0.05.
data <- Orange
head(data, 3)
## Tree age circumference
## 1 1 118 30
## 2 1 484 58
## 3 1 664 87
bartlett.test(circumference ~ Tree, data = data)
##
## Bartlett test of homogeneity of variances
##
## data: circumference by Tree
## Bartlett's K-squared = 2.4607, df = 4, p-value = 0.6517
margin = 0.6517 - 0.05
margin
## [1] 0.6017