We have the following data:
set.seed(2021)
x <- rnorm(50, mean = 21, sd = sqrt(3))
x [1] 20.78789 21.95688 21.60388 21.62290 22.55547 17.67001 21.45335 22.58581
[9] 21.02385 23.99638 19.12557 20.52745 21.31523 23.61287 23.77902 17.81047
[17] 23.81166 21.22757 23.56538 23.62114 19.36764 20.67838 19.09280 23.09252
[25] 18.18552 21.18252 18.47910 20.38683 20.83771 22.90641 17.59856 18.49209
[33] 22.76573 18.53803 19.95292 18.25734 18.77270 18.48041 20.84919 21.87423
[41] 21.20159 24.04878 20.40224 24.67195 20.94046 19.62795 23.55567 19.74330
[49] 21.54106 22.19852
Hypothesis to be tested: \(H_0: \sigma_x^2 = 4\)
\(H_1: \sigma_x^2 \neq 4\)
The appropriate test statistic is \(\chi^2\) with df=n-1.
In the package EnvStats there is a function called varTest that test the null hypothesis using the chi-squared test that the variance is equal to a specified value, and create a confidence interval for the variance.
So, install the package if not installed before -
install.packages("EnvStats")Load the package and run the function -
require(EnvStats)
varTest(x, sigma.squared = 4)$statistic
Chi-Squared
47.15924
$parameters
df
49
$p.value
[1] 0.9039247
$estimate
variance
3.849734
$null.value
variance
4
$alternative
[1] "two.sided"
$method
[1] "Chi-Squared Test on Variance"
$data.name
[1] "x"
$conf.int
LCL UCL
2.686278 5.978053
attr(,"conf.level")
[1] 0.95
attr(,"class")
[1] "htestEnvStats"
The result shows a p-value of 0.9039247 > \(\alpha = 0.05\). Hence we may not reject the null hypothesis at 5% significance level and conclude that the true variance of the population from where the sample is drawn is indeed 4.
Manual code:
set.seed(2021)
x <- rnorm(50, mean = 21, sd = sqrt(3)) # data
VAR <- 4 # specified variance
(n <- length(x)) # sample size of the data[1] 50
(chisq_stat <- (n-1)*var(x)/VAR) # value of the calculated test statistic[1] 47.15924
pchisq(chisq_stat, df = n-1, lower.tail = T)*2 # p-value[1] 0.9039247
Formula: Confidence interval for the variance
alpha <- 0.05 # specifying significance level
confint <- c((n-1)*var(x)/qchisq(alpha/2, n-1, lower.tail = F), (n-1)*var(x)/qchisq(0.05/2, n-1, lower.tail = T))
cat("Lower limit:",confint[1],"& Upper limit:",confint[2])Lower limit: 2.686278 & Upper limit: 5.978053
Let’s say we have the population data:
set.seed(2021)
X_pop <- rnorm(100, mean = 120, sd = 18)
Y_pop <- rnorm(100, mean = 120, sd = 12)Taking sample of size 25 and 18 without replacement from the populations X and Y:
(x <- sample(X_pop, 25, replace = F)) [1] 149.10153 102.71600 146.55927 100.99419 104.37855 133.23332 110.99477
[8] 136.16497 117.79572 82.60348 113.73183 102.16264 124.71140 121.89681
[15] 115.08915 158.16000 115.73499 120.24789 129.61127 105.74123 90.75111
[22] 128.69869 117.39155 79.39435 100.52031
(y <- sample(Y_pop, 18, replace = F)) [1] 113.26209 137.72075 121.90967 87.80444 137.82216 118.28542 125.64948
[8] 115.20241 126.98182 116.02115 105.64453 144.76055 119.04420 98.37236
[15] 124.04258 126.15100 124.17379 115.92429
nx <- length(x) # sample size in x
ny <- length(y) # sample size in y
(x_var <- var(x)) # sample variance in x[1] 382.7156
(y_var <- var(y)) # sample variance in y[1] 189.1567
# Boxplot of x and y
boxplot(x, y, names = c("X", "Y"), main = "Boxplot of X and Y")To test:
\(H_0: \frac{\sigma_x}{\sigma_y} = 1\)
\(H_1: \frac{\sigma_x}{\sigma_y} \neq 1\)
or,
\(H_0: \sigma_x=\sigma_y\)
\(H_1: \sigma_x\ne\sigma_y\)
Appropriate test statistic is F with df = \((n_1-1,n_2-1)\).
The built in package stats contain a function called var.test() that performs an F test to compare the variances of two samples from normal populations.
var.test(x,y)
F test to compare two variances
data: x and y
F = 2.0233, num df = 24, denom df = 17, p-value = 0.1382
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.7903953 4.8285011
sample estimates:
ratio of variances
2.023273
library(webr)
plot(var.test(x,y))Since the p-value is greater than 0.05, hence we may not reject the null hypothesis and conclude that the true variances of the two populations are indeed same.
Manual Code:
(F_stat <- x_var/y_var) # F statistic, larger variance will be always placed on top[1] 2.023273
DF1 <- nx - 1 # df for larger variance
DF2 <- ny - 1 # df for smaller variance
pf(F_stat, DF1, DF2, lower.tail = F)*2 # p-value[1] 0.1381551
P-value > \(\alpha = 0.05\), hence do not reject the null hypothesis.
Critical value approach:
qf(0.05/2, DF1, DF2, lower.tail = T) # lower critical value[1] 0.4190272
qf(0.05/2, DF1, DF2, lower.tail = F) # upper critical value[1] 2.559824
F_stat[1] 2.023273
Since the calculated F statistic lies in acceptance region, we may not reject the null hypothesis.
To test: \(H_0: \sigma_x=\sigma_y\)
\(H_1: \sigma_x\ge\sigma_y\)
In the function, specify an argument alternative to be greater -
var.test(x,y, alternative = "greater")
F test to compare two variances
data: x and y
F = 2.0233, num df = 24, denom df = 17, p-value = 0.06908
alternative hypothesis: true ratio of variances is greater than 1
95 percent confidence interval:
0.9239676 Inf
sample estimates:
ratio of variances
2.023273
library(webr)
plot(var.test(x,y, alternative = "greater"))Manual Code:
(F_stat <- x_var/y_var) # larger variance will be always placed on top[1] 2.023273
DF1 <- nx - 1 # df for larger variance
DF2 <- ny - 1 # df for smaller variance
pf(F_stat, DF1, DF2, lower.tail = F) # p-value[1] 0.06907757
Since p-value is greater than 0.05, do not reject the null hypothesis.
# upper critical value since it is a right tailed test
qf(0.05, DF1, DF2, lower.tail = F) [1] 2.189766
F_stat[1] 2.023273
Since F statistic lies in the acceptance region, do not reject the null hypothesis.
… to be continued