Inferences Concerning Variances

Specific Variance

We have the following data:

set.seed(2021)
x <- rnorm(50, mean = 21, sd = sqrt(3))
x
 [1] 20.78789 21.95688 21.60388 21.62290 22.55547 17.67001 21.45335 22.58581
 [9] 21.02385 23.99638 19.12557 20.52745 21.31523 23.61287 23.77902 17.81047
[17] 23.81166 21.22757 23.56538 23.62114 19.36764 20.67838 19.09280 23.09252
[25] 18.18552 21.18252 18.47910 20.38683 20.83771 22.90641 17.59856 18.49209
[33] 22.76573 18.53803 19.95292 18.25734 18.77270 18.48041 20.84919 21.87423
[41] 21.20159 24.04878 20.40224 24.67195 20.94046 19.62795 23.55567 19.74330
[49] 21.54106 22.19852

Hypothesis to be tested: \(H_0: \sigma_x^2 = 4\)
\(H_1: \sigma_x^2 \neq 4\)

The appropriate test statistic is \(\chi^2\) with df=n-1.

In the package EnvStats there is a function called varTest that test the null hypothesis using the chi-squared test that the variance is equal to a specified value, and create a confidence interval for the variance.

So, install the package if not installed before -

install.packages("EnvStats")

Load the package and run the function -

require(EnvStats)
varTest(x, sigma.squared = 4)
$statistic
Chi-Squared 
   47.15924 

$parameters
df 
49 

$p.value
[1] 0.9039247

$estimate
variance 
3.849734 

$null.value
variance 
       4 

$alternative
[1] "two.sided"

$method
[1] "Chi-Squared Test on Variance"

$data.name
[1] "x"

$conf.int
     LCL      UCL 
2.686278 5.978053 
attr(,"conf.level")
[1] 0.95

attr(,"class")
[1] "htestEnvStats"

The result shows a p-value of 0.9039247 > \(\alpha = 0.05\). Hence we may not reject the null hypothesis at 5% significance level and conclude that the true variance of the population from where the sample is drawn is indeed 4.

Manual code:

set.seed(2021)
x <- rnorm(50, mean = 21, sd = sqrt(3))  # data
VAR <- 4  # specified variance
(n <- length(x))  # sample size of the data
[1] 50
(chisq_stat <- (n-1)*var(x)/VAR)  # value of the calculated test statistic
[1] 47.15924
pchisq(chisq_stat, df = n-1, lower.tail = T)*2  # p-value
[1] 0.9039247

Formula: Confidence interval for the variance

alpha <- 0.05  # specifying significance level
confint <- c((n-1)*var(x)/qchisq(alpha/2, n-1, lower.tail = F), (n-1)*var(x)/qchisq(0.05/2, n-1, lower.tail = T))
cat("Lower limit:",confint[1],"& Upper limit:",confint[2])
Lower limit: 2.686278 & Upper limit: 5.978053

Two Variances

Let’s say we have the population data:

set.seed(2021)
X_pop <- rnorm(100, mean = 120, sd = 18)
Y_pop <- rnorm(100, mean = 120, sd = 12)

Taking sample of size 25 and 18 without replacement from the populations X and Y:

(x <- sample(X_pop, 25, replace = F))
 [1] 149.10153 102.71600 146.55927 100.99419 104.37855 133.23332 110.99477
 [8] 136.16497 117.79572  82.60348 113.73183 102.16264 124.71140 121.89681
[15] 115.08915 158.16000 115.73499 120.24789 129.61127 105.74123  90.75111
[22] 128.69869 117.39155  79.39435 100.52031
(y <- sample(Y_pop, 18, replace = F))
 [1] 113.26209 137.72075 121.90967  87.80444 137.82216 118.28542 125.64948
 [8] 115.20241 126.98182 116.02115 105.64453 144.76055 119.04420  98.37236
[15] 124.04258 126.15100 124.17379 115.92429
nx <- length(x) # sample size in x
ny <- length(y) # sample size in y
(x_var <- var(x)) # sample variance in x
[1] 382.7156
(y_var <- var(y)) # sample variance in y
[1] 189.1567
# Boxplot of x and y
boxplot(x, y, names = c("X", "Y"), main = "Boxplot of X and Y")

To test:
\(H_0: \frac{\sigma_x}{\sigma_y} = 1\)
\(H_1: \frac{\sigma_x}{\sigma_y} \neq 1\)
or,
\(H_0: \sigma_x=\sigma_y\)
\(H_1: \sigma_x\ne\sigma_y\)

Appropriate test statistic is F with df = \((n_1-1,n_2-1)\).

The built in package stats contain a function called var.test() that performs an F test to compare the variances of two samples from normal populations.

var.test(x,y)

    F test to compare two variances

data:  x and y
F = 2.0233, num df = 24, denom df = 17, p-value = 0.1382
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.7903953 4.8285011
sample estimates:
ratio of variances 
          2.023273 
library(webr)
plot(var.test(x,y))

Since the p-value is greater than 0.05, hence we may not reject the null hypothesis and conclude that the true variances of the two populations are indeed same.

Manual Code:

(F_stat <- x_var/y_var) # F statistic, larger variance will be always placed on top
[1] 2.023273
DF1 <- nx - 1  # df for larger variance
DF2 <- ny - 1  # df for smaller variance
pf(F_stat, DF1, DF2, lower.tail = F)*2 # p-value
[1] 0.1381551

P-value > \(\alpha = 0.05\), hence do not reject the null hypothesis.

Critical value approach:

qf(0.05/2, DF1, DF2, lower.tail = T)  # lower critical value
[1] 0.4190272
qf(0.05/2, DF1, DF2, lower.tail = F)  # upper critical value
[1] 2.559824
F_stat
[1] 2.023273

Since the calculated F statistic lies in acceptance region, we may not reject the null hypothesis.

To test: \(H_0: \sigma_x=\sigma_y\)
\(H_1: \sigma_x\ge\sigma_y\)

In the function, specify an argument alternative to be greater -

var.test(x,y, alternative = "greater")

    F test to compare two variances

data:  x and y
F = 2.0233, num df = 24, denom df = 17, p-value = 0.06908
alternative hypothesis: true ratio of variances is greater than 1
95 percent confidence interval:
 0.9239676       Inf
sample estimates:
ratio of variances 
          2.023273 
library(webr)
plot(var.test(x,y, alternative = "greater"))

Manual Code:

(F_stat <- x_var/y_var) # larger variance will be always placed on top
[1] 2.023273
DF1 <- nx - 1  # df for larger variance
DF2 <- ny - 1  # df for smaller variance
pf(F_stat, DF1, DF2, lower.tail = F)  # p-value
[1] 0.06907757

Since p-value is greater than 0.05, do not reject the null hypothesis.

# upper critical value since it is a right tailed test
qf(0.05, DF1, DF2, lower.tail = F)  
[1] 2.189766
F_stat
[1] 2.023273

Since F statistic lies in the acceptance region, do not reject the null hypothesis.

… to be continued