Start Using the neg.indvars() Function

From the negligible R package


Introduction

What is the purpose/goal of neg.indvars()?

The purpose of the neg.indvars function is to evaluate if multiple population variances can be considered negligibly different (i.e., practically equivalent). For example, imagine that you are running a two independent samples t test and you want to test the assumption that the population variances are equal. You could use the neg.indvars to evaluate that assumption. .

library(negligible)
library(psych)

What is the theory behind neg.indvars()?

Mara and Cribbie (2017) proposed the use of a negligible effect testing based homogeneity of variance (HOV) test that was derived from Wellek’s (2010) one-way test of population mean equivalence and Levene’s HOV test. With this test, the research hypothesis (negligible difference in population variances) is aligned with the alternative hypothesis, not the null hypothesis. More specifically, the null hypothesis specifies that the difference in the variances falls outside of or at the bounds of an a priori determined interval (based on the smallest practically significant difference in population variances), whereas, the alternative hypothesis declares that the difference among the variances of the groups falls within this interval (i.e., a negligible difference in population variances). The test statistic quantifies the standardized squared Euclidian distance, and thus, the interval is one-sided.

Wellek (2010) suggests liberal and conservative interval bound values of epsilon (eps) = .50 and eps = .25, respectively. See Wellek, 2010, pp. 16, 17, 22, for details.

Kim, Y. J. & Cribbie, R. A. (2018). The variance homogeneity assumption and the traditional ANOVA: Exploring a better gatekeeper. British Journal of Mathematical and Statistical Psychology, 71, 1-12. DOI: 10.1111/bmsp.12103. New York: CRC Press

Mara, C. & Cribbie, R. A. (2017). Equivalence of population variances: Synchronizing the objective and analysis. Journal of Experimental Education, 86, 442-457. DOI: 10.1080/00220973.2017.1301356

Wellek, S. (2010). Testing statistical hypotheses of equivalence and noninferiority (2nd Ed.). Boca Raton, FL: CRC Press.

Null and Alternate Hypotheses of the Procedure

The null hypothesis specifies that the difference in the variances is non-negligible, while the alternate hypothesis states that the difference in the variances is negligible.

\(H_{0}: \psi^{*2} \ge \epsilon^2\)

\(H_{1}: \psi^{*2} \lt \epsilon^2\)

Using neg.indvars()

Now let’s use the function. To do this, we will need to set a negligible effect interval (in our function, this is going to be using the eps argument). This is going to be the smallest Euclidian distance that we would consider to be important. As discussed above, Wellek (2010) suggests liberal and conservative interval bound values of eps = .50 and eps = .25, respectively. Note that the default eps is .50.

The basic set-up of the function looks like this:

neg.indvars(dv, iv, eps = 0.5, alpha = 0.05, na.rm = TRUE, data = NULL)

Required arguments (no default)

dv - dependent/outcome variable (numeric)

iv - independent/predictor variable (factor)

Optional arguments (has a default)

eps - Wellek (2010) suggests conservative (eps = .25) and liberal (eps = .50) bounds for the test of negligible difference in independent population variances. The default is eps = .50, but any value could be used. See Mara & Cribbie (2017) or Wellek (2010).

alpha - nominal Type I error rate (\(\alpha\)). The default is .05, but any value can be used (e.g., .01, .10, .06)

na.rm - should cases with missing values be deleted. Right now the function only works with na.rm=TRUE if there are missing values.

data - name of the dataset where the dv and iv reside

Examples

Example 1

Let’s look at an example using the mtcars dataset from R. Our outcome variable is miles/gallon (mpg) and our predictor is number of cylinders in the vehicle (cyl). In this example, we want to know if the difference in the population variances of mpg across cyl (4, 6, 8) is negligible (i.e., are the population variances of mpg across cyl equivalent). We will use the default \(\alpha\) = .05.

d <- mtcars # open the dataset and store it in the object d
names(d) # look at the variable names
 [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
[11] "carb"
d$cyl <- factor(d$cyl) #make the variable cyl a factor
tapply(d$mpg, d$cyl, var) # explore the variances of mpg at each level of cyl
        4         6         8 
20.338545  2.112857  6.553846 

Now, let’s apply the neg.vars function to see if we can reject the null hypothesis that the population variances are non-negligibly different. We will use the liberal cut-off for eps (.5).

library(negligible)
neg.indvars(dv = mpg, iv = cyl, eps = .5, alpha = .05,
            na.rm = TRUE, data = d)
-- Equivalence of Population Variances --
-- Independent Groups --

Group Variances:  
 4 6 8 
 20.33855 2.112857 6.553846 

Group Standard Deviations:  
 4.509828 1.453567 2.560048 

Group Median Absolute Deviations:  
 6.52344 1.92738 1.55673 

**********************

Ratio of Largest to Smallest Variances:  
 9.626086 

**********************

Epsilon Value (establish the Equivalence Interval):  
 0.5 

Levene-Wellek-Welch (LWW) Statistic:  
 1.147718 

Critical Value for LWW:  
 0.03458143 

NHST Decision:  
 The null hypothesis that the differences between the population variances falls outside the equivalence interval cannot be rejected. A negligible difference among the population variances cannot be concluded. Be sure to interpret the magnitude (and precision) of the effect size. 

**********************

The LWW (Levene-Wellek-Welch) statistic must be less than the critical value to reject the null hypothesis. In this case, LWW = 1.148 > LWW(crit) = .035, so we cannot reject Ho. There is a lack of support for the contention that the population variances are negligibly different.

Example 2

In this example, we will enter the data directly instead of calling the data from a dataset (i.e., no data argument). We will also use the conservative, instead of liberal, bound for eps (i.e., .25). In this example we are checking to see if the variances in the incomes (in 000s of dollars) of males and females can be considered negligible. We will use \(\alpha\) = .10.

group <- rep(c("male", "female"),each=10)
group <- factor(group)
income <- c(52,43,75,63,45,102,142,65,79,44,54,121,102,54,62,44,29,55,74,81)
library(negligible)
neg.indvars(dv = income, iv = group, eps = .25, alpha = .10)
-- Equivalence of Population Variances --
-- Independent Groups --

Group Variances:  
 female male 
 762.4889 970.2222 

Group Standard Deviations:  
 27.6132 31.14839 

Group Median Absolute Deviations:  
 22.239 25.2042 

**********************

Ratio of Largest to Smallest Variances:  
 1.272441 

**********************

Epsilon Value (establish the Equivalence Interval):  
 0.25 

Levene-Wellek-Welch (LWW) Statistic:  
 0.001602909 

Critical Value for LWW:  
 0.003030741 

NHST Decision:  
 The null hypothesis that the differences between the population variances falls outside the equivalence interval can be rejected. A negligible difference among the population variances can be concluded. Be sure to interpret the magnitude (and precision) of the effect size. 

**********************

In this example, the LWW statistic (0.0016) is less than the critical LWW value (0.0030), and thus we can reject the null hypothesis that the differences in the variances is non-negligible (and conclude that the difference in the population variances is negligible).

Extractable Elements

A number of elements of the output can be extracted, including:

vars: Sample variances

sds: Sample standard deviations

mads: Sample median absolute deviations

ratio: Ratio of the largest to smallest variance

eps: Epsilon (e) can be described as the minimum difference in the variances that one would consider non-negligible.

LWW_md: Levene-Wellek-Welch statistic based on the median.

crit_LWW_md Critical value for the Levene-Wellek-Welch statistic based on the median.

alpha Nominal Type I error rate

---
title: "Start Using the `neg.indvars()` Function"
subtitle: | 
    From the [`negligible`](https://cran.r-project.org/web/packages/negligible/index.html) R package![](G:/My Drive/Research/Cribbie Lab/negligible Vignettes/Template/neg.logo.png){width=10%}  
author: "[Rob Cribbie](https://cribbie.info.yorku.ca/)"
date: "`r format(Sys.time())`"
output:
  rmdformats::robobook:
    code_download: yes
    highlight: tango
---

```{r setup, echo=FALSE, cache=FALSE, messages=FALSE, warning=FALSE}
#install.packages("rmdformats")
suppressPackageStartupMessages(library(rmdformats, warn.conflicts=FALSE))
suppressPackageStartupMessages(library(knitr, warn.conflicts=FALSE))
suppressPackageStartupMessages(library(tidyverse, warn.conflicts=FALSE))
suppressPackageStartupMessages(library(plotly, warn.conflicts=FALSE))
suppressPackageStartupMessages(library(readxl, warn.conflicts=FALSE))
suppressPackageStartupMessages(library(plotly, warn.conflicts=FALSE))
suppressPackageStartupMessages(library(MetBrewer, warn.conflicts=FALSE))
suppressPackageStartupMessages(library(gganimate, warn.conflicts=FALSE))
suppressPackageStartupMessages(library(dplyr, warn.conflicts=FALSE))

## Global options
options(max.print="75")
opts_chunk$set(echo=TRUE,
	             cache=TRUE,
               prompt=FALSE,
               comment=NA,
               message=FALSE,
               warning=FALSE)
opts_knit$set(width=75)
```

<br/>

## **Introduction**



### **What is the purpose/goal of `neg.indvars()`?**


The purpose of the neg.indvars function is to evaluate if multiple population variances can be considered *negligibly* different (i.e., practically equivalent). For example, imagine that you are running a two independent samples *t* test and you want to test the assumption that the population variances are equal. You could use the neg.indvars to evaluate that assumption. .

```{r}

library(negligible)
library(psych)

```


### **What is the theory behind `neg.indvars()`?**

Mara and Cribbie (2017) proposed the use of a
negligible effect testing based homogeneity of variance (HOV) test that was derived from Wellek’s (2010) one-way test of population mean equivalence and Levene’s HOV test. With this test, the research hypothesis (negligible difference in population variances) is aligned with the alternative hypothesis, not the null hypothesis. More specifically, the null hypothesis specifies that the difference in the variances falls outside of or at the bounds of an a priori determined interval (based on the smallest practically significant difference in population variances), whereas, the alternative hypothesis declares that the difference among the variances of the groups falls within this interval (i.e., a negligible difference in population variances). The test statistic quantifies the standardized squared Euclidian distance, and thus, the interval is one-sided.

Wellek (2010) suggests liberal and conservative interval bound values of epsilon (eps) = .50 and eps = .25, respectively. See Wellek, 2010, pp. 16, 17, 22, for details.


Kim, Y. J. & Cribbie, R. A. (2018). The variance homogeneity assumption and the traditional ANOVA: Exploring a better gatekeeper. British Journal of Mathematical and Statistical Psychology, 71, 1-12. DOI: 10.1111/bmsp.12103. New York: CRC Press

Mara, C. & Cribbie, R. A. (2017). Equivalence of population variances: Synchronizing the objective and analysis. Journal of Experimental Education, 86, 442-457. DOI:
10.1080/00220973.2017.1301356  

Wellek, S. (2010). Testing statistical hypotheses of equivalence and noninferiority (2nd Ed.). Boca Raton, FL: CRC Press. 


#### *Null and Alternate Hypotheses of the Procedure*

The null hypothesis specifies that the difference in the variances is non-negligible, while the alternate hypothesis states that the difference in the variances is negligible. 

$H_{0}: \psi^{*2} \ge \epsilon^2$ 

$H_{1}: \psi^{*2} \lt \epsilon^2$ 


### **Using `neg.indvars()`**

Now let's use the function. To do this, we will need to set a negligible effect interval (in our function, this is going to be using the *eps* argument). This is going to be the smallest Euclidian distance that we would consider to be important. As discussed above, Wellek (2010) suggests liberal and conservative interval bound values of *eps* = .50 and *eps* = .25, respectively. Note that the default *eps* is .50.


#### The basic set-up of the function looks like this:
neg.indvars(dv, iv, eps = 0.5, alpha = 0.05, na.rm = TRUE, data = NULL)

#### *Required arguments (no default)*

*dv* - dependent/outcome variable (numeric)

*iv* - independent/predictor variable (factor)


#### *Optional arguments (has a default)*

*eps* - Wellek (2010) suggests conservative (*eps* = .25) and liberal (*eps* = .50) bounds for the test of negligible difference in independent population variances. The default is *eps* = .50, but any value could be used. See Mara & Cribbie (2017) or Wellek (2010).

*alpha* - nominal Type I error rate ($\alpha$). The default is .05, but any value can be used (e.g., .01, .10, .06)

*na.rm* - should cases with missing values be deleted. Right now the function only works with na.rm=TRUE if there are missing values.

*data* - name of the dataset where the *dv* and *iv* reside




## **Examples**

### **Example 1**

Let's look at an example using the **mtcars** dataset from R. Our outcome variable is miles/gallon (mpg) and our predictor is number of cylinders in the vehicle (cyl). In this example, we want to know if the difference in the population variances of mpg across cyl (4, 6, 8) is negligible (i.e., are the population variances of *mpg* across *cyl* equivalent). We will use the default $\alpha$ = .05. 

```{r}
d <- mtcars # open the dataset and store it in the object d
names(d) # look at the variable names
d$cyl <- factor(d$cyl) #make the variable cyl a factor
tapply(d$mpg, d$cyl, var) # explore the variances of mpg at each level of cyl
```

Now, let's apply the neg.vars function to see if we can reject the null hypothesis that the population variances are non-negligibly different. We will use the liberal cut-off for *eps* (.5).

```{r}
library(negligible)
neg.indvars(dv = mpg, iv = cyl, eps = .5, alpha = .05,
            na.rm = TRUE, data = d)
```

The LWW (Levene-Wellek-Welch) statistic must be less than the critical value to reject the null hypothesis. In this case, LWW = 1.148 > LWW(crit) = .035, so we cannot reject *Ho*. There is a lack of support for the contention that the population variances are negligibly different.


### **Example 2**

In this example, we will enter the data directly instead of calling the data from a dataset (i.e., no *data* argument). We will also use the conservative, instead of liberal, bound for *eps* (i.e., .25). In this example we are checking to see if the variances in the incomes (in 000s of dollars) of males and females can be considered negligible. We will use $\alpha$ = .10. 

```{r}
group <- rep(c("male", "female"),each=10)
group <- factor(group)
income <- c(52,43,75,63,45,102,142,65,79,44,54,121,102,54,62,44,29,55,74,81)
library(negligible)
neg.indvars(dv = income, iv = group, eps = .25, alpha = .10)
```

In this example, the LWW statistic (0.0016) is less than the critical LWW value (0.0030), and thus we can reject the null hypothesis that the differences in the variances is non-negligible (and conclude that the difference in the population variances is negligible).


## **Extractable Elements**

A number of elements of the output can be extracted, including:


*vars*: Sample variances

*sds*: Sample standard deviations

*mads*: Sample median absolute deviations

*ratio*: Ratio of the largest to smallest variance

*eps*: Epsilon (e) can be described as the minimum difference in the variances that one would consider non-negligible.

*LWW_md*: Levene-Wellek-Welch statistic based on the median.

*crit_LWW_md* Critical value for the Levene-Wellek-Welch statistic based on the median.

*alpha* Nominal Type I error rate