F-Test_for _Variance

Note that F-test requires two samples to be normally distributed

Compute F Test in R

The R function var.test() can be used to compare two variances as follow:

Method 1
var.test(values ~ groups, data, alternative = “two.sided”)
or Method 2
var.test(x, y, alternative = “two.sided”)

x,y: numeric vectors
alternative: the alternative hypothesis. Allowed value is one of “two.sided” (default), “greater” or “less”.

Here, we’ll use the built-in R data set named ToothGrowth:

my_data <- ToothGrowth

To have an idea of what the data look like, we start by displaying a random sample of 10 rows using the function sample_n()[in dplyr package]:

library("dplyr")

## Warning: package 'dplyr' was built under R version 3.6.3

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

sample_n(my_data, 10)

##     len supp dose
## 1  11.5   VC  0.5
## 2   5.2   VC  0.5
## 3  14.5   OJ  1.0
## 4  26.4   OJ  1.0
## 5  30.9   OJ  2.0
## 6  10.0   VC  0.5
## 7  25.8   OJ  1.0
## 8  11.2   VC  0.5
## 9  19.7   OJ  1.0
## 10 17.6   OJ  0.5

We want to test the equality of variances between the two groups OJ and VC in the column “supp”.

check F-test Assumptions

F-test is very sensitive to departure from the normal assumption. You need to check whether the data is normally distributed before using the F-test.

Shapiro-Wilk test can be used to test whether the normal assumption holds. It’s also possible to use Q-Q plot (quantile-quantile plot) to graphically evaluate the normality of a variable. Q-Q plot draws the correlation between a given sample and the normal distribution.

If there is doubt about normality, the better choice is to use Levene’s test or Fligner-Killeen test, which are less sensitive to departure from normal assumption.

Compute F test

# F-test
res.ftest <- var.test(len ~ supp, data = my_data)
res.ftest

## 
##  F test to compare two variances
## 
## data:  len by supp
## F = 0.6386, num df = 29, denom df = 29, p-value = 0.2331
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.3039488 1.3416857
## sample estimates:
## ratio of variances 
##          0.6385951

Interpretation of the result

The p-value of F-test is p = 0.2331433 which is greater than the significance level 0.05. In conclusion, there is no significant difference between the two variances.

Access to the values returned by var.test() function
The function var.test() returns a list containing the following components:

statistic: the value of the F test statistic.
parameter: the degrees of the freedom of the F distribution of the test statistic.
p.value: the p-value of the test.
conf.int: a confidence interval for the ratio of the population variances. estimate: the ratio of the sample variances

# ratio of variances
res.ftest$estimate

## ratio of variances 
##          0.6385951

# p-value of the test
res.ftest$p.value

## [1] 0.2331433

Page 163 of the Analytics book Preetha Dallal

# I have to generate two dataset with n and variance
a <- rnorm(80,mean=0,sd=sqrt(42))
b <- rnorm(52,mean=100,sd=sqrt(36))
name <- as.factor(c(rep("Manu",length(a)),rep("Info",length(b))))
cons <- c(a,b)
str(cons)

##  num [1:132] -1.39 -1.14 -1.59 1.24 2.67 ...

str(name)

##  Factor w/ 2 levels "Info","Manu": 2 2 2 2 2 2 2 2 2 2 ...

both <- cbind(name,cons)
str(both)

##  num [1:132, 1:2] 2 2 2 2 2 2 2 2 2 2 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:2] "name" "cons"

res.ftest <- var.test(cons~name, data = both)
res.ftest

## 
##  F test to compare two variances
## 
## data:  cons by name
## F = 0.952, num df = 51, denom df = 79, p-value = 0.8617
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.5838527 1.5955349
## sample estimates:
## ratio of variances 
##          0.9519954

We retain the null hypothesis, there is no difference.