One Sample Variance Comparison Chi-Square Test

Use the \(\chi^2\) statistic with \(n-1\) degrees of freedom for a variance comparison test or for a point estimate of \(\sigma^2\) if

  1. the sample data is normally distributed.

\(s^2\) is an unbiased estimator of \(\sigma^2\). Repeated samples of size \(n\) from a population would yield a statistic \(\chi^2 = \frac{{(n-1)s^2}}{{\sigma^2}}\) that is distributed chi-square with df=n-1. Express \(\sigma^2\) in a confidence interval that separately estimates each tail.

\(\frac{{(n-1)s^2}}{{\chi_L^2}} < \sigma^2 < \frac{{(n-1)s^2}}{{\chi_H^2}}\)

Do not use chi-square inference procedures if the sample data has substantial skewness or a substantial number of outliers. If the sample data is non-normal, use a bootstrapping technique.

Example

The size of prey (millimeters) of two species of net-casting spiders, deinopis (X) and menneus (Y) are sampled for \(n_X = n_Y = 10\) spiders. What is the difference in the mean size of the prey of the two species?

library(dplyr)
library(ggplot2)
library(EnvStats)

x <- c(12.43, 11.71, 14.41, 11.05, 9.53, 
       11.66, 9.33, 11.71, 14.35, 13.81)
(x_bar <- mean(x))
## [1] 11.999
(s <- sd(x))
## [1] 1.801471
(n <- length(x))
## [1] 10
df <- n - 1
alpha <- 0.05
sigma = 1.00

# Check the single condition for a single variance comparison test.
# The samples is approximately normal (see below), so assume normal populations.
qqnorm(x)
qqline(x)

# Apply the Anderson-Darling normality test.  The p-vaue is > 0.05 so do not reject H0 that the data is normally distributed.
library(nortest)
ad.test(x)
## 
## Results of Hypothesis Test
## --------------------------
## 
## Alternative Hypothesis:          
## 
## Test Name:                       Anderson-Darling normality test
## 
## Data:                            x
## 
## Test Statistic:                  A = 0.3383163
## 
## P-value:                         0.4225886
# Conduct a chi-square test on variance.  The p-value = 0.001196, so reject H0 that s^2 = sigma^2.  The 95% confidence interval is (1.535, 10.816).
varTest(x = x, alternative = "two.sided", sigma.squared =  sigma^2, conf.level = (1 - alpha))
## 
## Results of Hypothesis Test
## --------------------------
## 
## Null Hypothesis:                 variance = 1
## 
## Alternative Hypothesis:          True variance is not equal to 1
## 
## Test Name:                       Chi-Squared Test on Variance
## 
## Estimated Parameter(s):          variance = 3.245299
## 
## Data:                            x
## 
## Test Statistic:                  Chi-Squared = 29.20769
## 
## Test Statistic Parameter:        df = 9
## 
## P-value:                         0.001195555
## 
## 95% Confidence Interval:         LCL =  1.535407
##                                  UCL = 10.816103
# (1-alpha/2) confidence interval graph
(lcl = (n - 1) * s^2 / qchisq(p = alpha / 2, df = df, lower.tail = FALSE))
## [1] 1.535407
(ucl = (n - 1) * s^2 / qchisq(p = alpha / 2, df = df, lower.tail = TRUE))
## [1] 10.8161
s_rnd = round(s, 2)
dat <- data.frame(chi_sq = 100:3000 / 100) %>%
  mutate(sigma_sq = (n - 1) * s^2 / chi_sq) %>%
  mutate(prob = dchisq(x = chi_sq, df = df)) %>%
  mutate(rr = ifelse(sigma_sq < lcl | sigma_sq > ucl, prob, 0))
ggplot(dat) +
  geom_line(aes(x = chi_sq, y = prob)) +
  geom_area(aes(x = chi_sq, y = rr), alpha = 0.3) +
  geom_vline(aes(xintercept = (n - 1)), color = "blue") +
  labs(title = bquote('95% Interval Estimate'),
       subtitle = bquote('s^2 = '~.(s_rnd^2)~' LCL'~.(lcl)~' UCL'~.(ucl)~' using chisq dist with'~.(df)~'df.'),
       x = "chi^2",
       y = "Probability") +
  scale_x_continuous(breaks = c(1, (n - 1), 30), labels = dat$sigma_sq[c(1, (n - 1) * 100 - 100, 2900)])

# Hypothesis test graph
lcl = (n - 1) * sigma^2 / qchisq(p = alpha / 2, df = df, lower.tail = FALSE)
ucl = (n - 1) * sigma^2 / qchisq(p = alpha / 2, df = df, lower.tail = TRUE)
data.frame(chi_sq = 100:3000 / 100) %>%
  mutate(sigma_sq = (n - 1) * sigma^2 / chi_sq) %>%
  mutate(prob = dchisq(x = chi_sq, df = df)) %>%
  mutate(rr = ifelse(sigma_sq < lcl | sigma_sq > ucl, prob, 0)) %>%
ggplot() +
  geom_line(aes(x = chi_sq, y = prob)) +
  geom_area(aes(x = chi_sq, y = rr), alpha = 0.3) +
  geom_vline(aes(xintercept = (n - 1)), color = "blue") +
  geom_vline(aes(xintercept = (n - 1) * s^2 / sigma^2), color = "red") +
  labs(title = bquote('Hypothesis Test of H0: sigma^2 = 1'),
       subtitle = bquote('s^2 = '~.(s_rnd^2)~' LCL'~.(lcl)~' UCL'~.(ucl)~' using chisq dist with'~.(df)~'df.'),
       x = "chi^2",
       y = "Probability") +
  theme(legend.position="none")