The purpose of the neg.cat function is to test for a negligible
relationship among two categorical variables.
What is the theory behind
neg.semfit()?
The test is based upon the popular Cramer’s V statistic, which was
found by Shiskina, Farmus, & Cribbie (http://dx.doi.org/10.20982/tqmp.14.3.p167) to perform
well in this situation. A negligible association can be concluded if the
upper bound of the 100(1-2alpha)% CI for Cramer’s V falls below eiU. In
other words, in such a situation we can reject Ho: The relationship is
non-negligible (V >= eiU). eiU is set to .2 by default, but should be
set based on the context of the research. Since Cramer’s V statistic is
in a correlation metric, setting eiU is a matter of determining what
correlation is the minimally meaningful effect size (MMES) given the
context of the research.
Shiskina, T., Farmus, L., & Cribbie, R. A. (2018). Testing for a
lack of relationship among categorical variables. The Quantitative
Methods for Psychology, 14, 167-179. http://dx.doi.org/10.20982/tqmp.14.3.p167
Null and Alternate Hypotheses of the Procedure
The null hypothesis specifies that the association among the two
categorical variables is non-negligible. \[H_{0}: V_{pop} \ge eiU\]
The alternate hypothesis specifies that the association among the two
categorical variables is negligible. \[H_{1}:
V_{pop} \lt eiU\]
Using neg.cat()
Now let’s use the function. Users must specify either v1/v2 (the two
categorical variables) or tab (a table of frequencies across the two
categorical variables).
Required arguments (no default)
v1: first categorical variable
v2: second categorical variable
OR
tab: contingency table for the two predictor variables
Optional arguments (has a default)
eiU: the upper bound of the negligible effect (equivalence)
interval (since Cramer’s V has a lower bound of 0, only the upper bound
needs to be specified, and tested). The default is an upper bound of .2,
but depends on the context of the research.
data: an optional data file containing the two categorical
variables; the default is NULL
plot: should a plot showing the effect of interest and the
proportional distance be produced; default is TRUE
save: should the plot be saved to ‘jpg’ or ‘png’; default is
FALSE
nbootpd: number of bootstrap samples for calculating the CI
for the proportional distance; default is 1000 bootstrap samples
alpha: nominal Type I error rate. The default is .05, but
any value can be used (e.g., .01, .10, .06)
Examples
Example 1
Let’s look at whether the class of the passengers on the Titanic was
neligibly related to their sex.
Sex
Class Male Female
1st 180 145
2nd 179 106
3rd 510 196
neg.cat(tab = tab)
********************
** Negligible Effect Test of the Relationship **
** Between Two Categorical Variables **
********************
Nominal Type I error rate (alpha): 0.05
********************
Cramer's V: 0.151
90% CI for Cramer's V: (0.102, 0.194)
*******************
Proportion of Shared Variability: 0.023
*******************
Upper Bound of the Equivalence Interval (Correlation Metric): 0.2
Upper Bound of the 90% CI for Cramer's V: 0.194
NHST Decision:
The null hypothesis that the relationship between the categorical variables is substantial can be rejected. A negligible relationship among the variables is concluded. Be sure to interpret the magnitude (and precision) of the effect size.
*******************
Proportional Distance
Proportional Distance: 0.755
Confidence Interval for the Proportional Distance: (0.502,1.034)
Note: Confidence Interval for the Proportional Distance may not be precise with small N
*******************
Since the upper bound of the 100(1-2alpha)% CI for Cramer’s V falls
below the upper bound of the negligible effect (equivalence) bound (eiU
= .2), we can reject \(H_{0}: V_{pop} \ge
eiU\) and conclude that class and
sex are negligibly related. Based on the proportional
distance, Cramer’s v is about 75% of the distance from 0 to eiU (but the
CI on the proportional distance is quite wide).
Example 2
Let’s explore whether there is a negligible association between
education level (1 = high school to 5 = graduate work) and sex (1 =
male; 2 = female) using the sat.act dataset available in the
psych package. We will also change the alpha level (nominal
Type I error rate) to .10.
********************
** Negligible Effect Test of the Relationship **
** Between Two Categorical Variables **
********************
Nominal Type I error rate (alpha): 0.1
********************
Cramer's V: 0.152
80% CI for Cramer's V: (0.077, 0.183)
*******************
Proportion of Shared Variability: 0.023
*******************
Upper Bound of the Equivalence Interval (Correlation Metric): 0.2
Upper Bound of the 80% CI for Cramer's V: 0.183
NHST Decision:
The null hypothesis that the relationship between the categorical variables is substantial can be rejected. A negligible relationship among the variables is concluded. Be sure to interpret the magnitude (and precision) of the effect size.
*******************
Proportional Distance
Proportional Distance: 0.76
Confidence Interval for the Proportional Distance: (0.494,1.199)
Note: Confidence Interval for the Proportional Distance may not be precise with small N
*******************
Since the upper bound of the 100(1-2alpha)% CI for Cramer’s V falls
below the upper bound of the negligible effect (equivalence) bound (eiU
= .2), we can reject \(H_{0}: V_{pop} \ge
eiU\) and conclude that gender and
education are negligibly related. Based on the
proportional distance, Cramer’s v is 76% of the distance from 0 to eiU
(but the CI on the proportional distance is quite wide).
Extractable Elements
A number of elements of the output can be extracted, including:
cramv Cramer’s V statistic
propvar Proportion of variance explained (V^2)
cil Lower bound of the confidence interval for Cramer’s
V
ciu Upper bound of the confidence interval for Cramer’s
V
eiU Upper bound of the negligible effect (equivalence)
interval