Consider two sample variances that are calculated from random samples from normal populations.
If we need to perfom a significance test to determine whether the underlying variances are in fact equal; that is, we want to test the hypothesis \(H{_0}\): \(\sigma_1^2\) = \(\sigma_2^2\) versus \(H{_1:}\) \(\sigma_1^2\) != \(\sigma_2^2\) we will proceed basing the significance test on the relative magnitudes of the sample variances (\(s_1^2\), \(s_2^2\)). It is prefereable to base the test on the ratio of the sample variances (\(s_1^2\) \(/\) \(s_2^2\)) rather than on the difference between the sample variances (\(s_1^2\)- \(s_2^2\)).
The ratio of two such variances is called an F ratio and the F ratio has a standard distribution called an F distribution. The shape of this distribution depends on the sample sizes of the two groups more generally on the degrees of freedom of the two variance estimates. The variance ratio follows an F distribution under the null hypothesis that \(\sigma_1^2\) = \(\sigma_2^2\) and is indexed by the two parameters termed the numerator and denominator degrees of freedom, respectively. If the sizes of the first and second samples are n1 and n2 respectively, then the variance ratio follows an F distribution with n1-1 (numerator df) and n2-1 (denominator df), which is called an \(F_{(n-1),(n-2)}\) distribution. If the two normal populations have different standard deviations, the F distribution is scaled by their ratio. However if the two groups really have the same population standard deviations, the distribution does not involve any unknown parameters.
# enter each variance and each degrees of freedom
var.rat <- function (v1, df1, v2, df2) {
V.x <- v1
DF.x <- df1
V.y <- v2
DF.y <- df2
ratio <- 1
conf.level <- 0.95
ESTIMATE <- V.x/V.y
STATISTIC <- ESTIMATE/ratio
PARAMETER <- c( DF.x, DF.y)
PVAL <- pf(STATISTIC, DF.x, DF.y)
PVAL <- 2 * min(PVAL, 1 - PVAL)
BETA <- (1 - conf.level)/2
CINT <- c(ESTIMATE/qf(1 - BETA, DF.x, DF.y),
ESTIMATE/qf(BETA, DF.x, DF.y))
c(ESTIMATE, CINT, PVAL)
}
s1 <- 10:12 ; s2 <- 13:16
n1 <- length(s1) ; n2 <- length(s2)
(vr <- var(s1)/var(s2)) # ratio of variances
[1] 0.6
vr*qf(0.025, n2-1, n1-1) # lower
[1] 0.03739691
vr*qf(0.975, n2-1, n1-1) # upper
[1] 23.4993
vr/qf(0.975, n1-1, n2-1) # lower
[1] 0.03739691
vr/qf(0.025, n1-1, n2-1) # upper
[1] 23.4993
# base R function, requires the actual samples s1 and s2 in our example
var.test(s1, s2)
F test to compare two variances
data: s1 and s2
F = 0.6, num df = 2, denom df = 3, p-value = 0.7926
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.03739691 23.49929674
sample estimates:
ratio of variances
0.6
# function defined earlier
var.rat(var(s1), n1-1, var(s2), n2-1)
[1] 0.60000000 0.03739691 23.49929674 0.79263678
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bootstrap_2015.2 boot_1.3-17 knitr_1.12.3
loaded via a namespace (and not attached):
[1] magrittr_1.5 formatR_1.3 tools_3.2.2 htmltools_0.3.5 yaml_2.1.13 Rcpp_0.12.4 stringi_1.0-1
[8] rmarkdown_0.9.6 stringr_1.0.0 digest_0.6.9 evaluate_0.9
[1] "~/X/"
This took 0.89 seconds to execute.