Start Using the neg.normal()
Function
From the negligible
R package
Introduction
What is the purpose/goal of
neg.normal()?
The purpose of the neg.normal function is to test whether the distribution of scores is negligibly different (or equivalent) to a theoretical normal distribution
What is the theory behind
neg.normal()?
neg.normal() will test whether a distribution has a Shapiro-Wilk W statistic that is negligibly different from 1 (i.e., we are testing the null hypothesis that W is less than or equal to some prespecified lower bound for W, with that lower bound being the least extreme value of W that is non-negligibly different from 1). We recommend .95 and .975 as liberal and conservative bounds for the W statistic, respectively.
Null and Alternate Hypotheses of the Procedure
\(H_{0}\): \(\rho_{W}\) \(\leqslant\) \(\zeta\) (population W is less than or equal to the equivalence bound; i.e., the distribution is non-negligibly different from a normal distribution)
\(H_{1}\): \(\rho_{W}\) > \(\zeta\) (population W is greater than the equivalence bound; the difference in the distributions is negligible)
Using neg.normal()
Function Set-up:
neg.normal <- function(x, eiL = .95, nboot = 1000, plot = TRUE, alpha = .05, data = NULL)
Required arguments (no default)
There is one argument that is required (i.e., does not have a default):
x: empirical data for univariate distribution, either in vector format or a column from a dataframe
Optional arguments (has a default)
eiL: the liberal lower bound of the equivalence interval (since SW statistic has an upper bound of 1, only the lower bound needs to be specified, and tested). The default is a liberal lower bound of .95, but any value could be used. The suggested conservative bound is .975.
nboot: default is 1000 bootstrap samples to be drawn from the population
plot: default is TRUE
alpha: nominal Type I error rate. The default is .05, but any value can be used (e.g., .01, .10, .06)
data: default is NULL
Examples
Example 1
Using a simulated normal distribution, we will test whether the difference between an empirical normal distribution (object xx) is negligible to normal when compared to a population normal distribution.
library(negligible)
set.seed(1234) # ensures values generated using rnorm() will be the same each time
xx <- rnorm(n=1000) # store a random normal distribution with N = 1000 in an object xx
neg.normal(x=xx)*******************************
Descriptive Measures Regarding the Distribution
Sample Skewness (Type 2): -0.005209845
Sample Kurtosis (Type 2): 0.2492001
Sample Measures of Central Tendency:
Sample Mean: -0.0265972
Sample Median: -0.03979419
Standardized Difference between the Sample Mean and Median ((Median - Mean)/SD): -0.01323221
Sample Trimmed Mean: -0.03943285
Standardized Difference between the Sample Mean and Trimmed Mean ((Trimmed Mean - Mean)/SD): -0.01286991
Sample Measures of Variability:
Sample Standard Deviation: 0.9973377
Sample Min: -3.396064
Sample Max: 3.195901
*******************************
Traditional Shapiro Wilk Test Statistic:
0.9973713
Traditional Shapiro Wilk Test p-Value:
0.1053028
The (traditional NHST) null hypothesis that the distribution is normal in form cannot be rejected at alpha = 0.05
*******************************
Statistical Test of a Negligible Difference Between the Target/Population Distribution and a Theoretical Normal Distribution
Shapiro-Wilk Statistic: 0.9973713
Lower Bound of the 100(1-2*0.05)% CI: 0.9929602
Lower Bound of the Negligible Effect (Equivalence) Interval: 0.95
Negligible Effect (Equivalence) Testing Decision: The null hypothesis that the degree of nonnormality is extreme (i.e., W <= eiL) can be rejected
*******************************
Example 2
Here, the eiL is modified from the default of .95 to .975 (conservative bound).
set.seed(1234)
xx <- rnorm(n=1000) # store a random normal distribution with N = 1000 in an object xx
neg.normal(x=xx, eiL = .975)*******************************
Descriptive Measures Regarding the Distribution
Sample Skewness (Type 2): -0.005209845
Sample Kurtosis (Type 2): 0.2492001
Sample Measures of Central Tendency:
Sample Mean: -0.0265972
Sample Median: -0.03979419
Standardized Difference between the Sample Mean and Median ((Median - Mean)/SD): -0.01323221
Sample Trimmed Mean: -0.03943285
Standardized Difference between the Sample Mean and Trimmed Mean ((Trimmed Mean - Mean)/SD): -0.01286991
Sample Measures of Variability:
Sample Standard Deviation: 0.9973377
Sample Min: -3.396064
Sample Max: 3.195901
*******************************
Traditional Shapiro Wilk Test Statistic:
0.9973713
Traditional Shapiro Wilk Test p-Value:
0.1053028
The (traditional NHST) null hypothesis that the distribution is normal in form cannot be rejected at alpha = 0.05
*******************************
Statistical Test of a Negligible Difference Between the Target/Population Distribution and a Theoretical Normal Distribution
Shapiro-Wilk Statistic: 0.9973713
Lower Bound of the 100(1-2*0.05)% CI: 0.9929602
Lower Bound of the Negligible Effect (Equivalence) Interval: 0.975
Negligible Effect (Equivalence) Testing Decision: The null hypothesis that the degree of nonnormality is extreme (i.e., W <= eiL) can be rejected
*******************************
Example 3
This example uses a positively skewed distribution (chi square).
*******************************
Descriptive Measures Regarding the Distribution
Sample Skewness (Type 2): 1.721087
Sample Kurtosis (Type 2): 5.296784
Sample Measures of Central Tendency:
Sample Mean: 3.089235
Sample Median: 2.407753
Standardized Difference between the Sample Mean and Median ((Median - Mean)/SD): -0.2696034
Sample Trimmed Mean: 2.567392
Standardized Difference between the Sample Mean and Trimmed Mean ((Trimmed Mean - Mean)/SD): -0.2064482
Sample Measures of Variability:
Sample Standard Deviation: 2.527718
Sample Min: 0.03106691
Sample Max: 23.0513
*******************************
Traditional Shapiro Wilk Test Statistic:
0.8630149
Traditional Shapiro Wilk Test p-Value:
1.195838e-28
The (traditional NHST) null hypothesis that the distribution is normal in form can be rejected at alpha = 0.05
*******************************
Statistical Test of a Negligible Difference Between the Target/Population Distribution and a Theoretical Normal Distribution
Shapiro-Wilk Statistic: 0.8630149
Lower Bound of the 100(1-2*0.05)% CI: 0.8266155
Lower Bound of the Negligible Effect (Equivalence) Interval: 0.95
Negligible Effect (Equivalence) Testing Decision: The null hypothesis that the degree of nonnormality is extreme (i.e., W <= eiL) cannot be rejected
*******************************
Output
The output of neg.normal() is divided into three sections:
Descriptive measures of the distribution (skew, kurtosis indices), central tendency (mean, median, trimmed mean, standardized difference between the mean and median, and standardized difference between the mean and trimmed mean), and variability (sample standard deviation, minimum and maximum values).
Traditional (difference-based) results: SW test statistic, p value, and statistical conclusion based on the null hypothesis that the distribution is of a normal form.
Equivalence based results: first, the difference-based SW test statistic is reported (same as for section 2), the lower bound for the 100(1-2alpha)% confidence interval of the test statistic (this is the lower bound of the percentile bootstrap confidence interval that is compared to the lower bound of the equivalence interval), the lower bound of the equivalence interval (.95 is the default), and the decision based on the null hypothesis that the distribution is nonnormal in form. If the lower bound of the 95% bootstrap CI contains the lower bound of the equivalence interval, then the decision will be that the null hypothesis that the degree of nonnormality is extreme cannot be rejected (concluding that the distribution is not normal) whereas if the lower bound of the 95% bootstrap CI does not contain the lower bound of the equivalence interval, than the decision will be that the null hypothesis that the degree of nonnormality is extreme can be rejected (concluding that the distribution is negligible to normal).
Extractable Elements
A number of elements of the output can be extracted, including:
sw: Sample Shapiro-Wilk W statistic
sskew: Sample skewness
skurt: Sample kurtosis
sddiff_mn_mdn: Standardized difference between the sample mean and median
sddiff_mn_trmn: Standardized difference between the sample mean and trimmed mean
lb: Lower bound of 1-alpha CI for W
eiL: Maximum W for which the degree of nonnormality is considered extreme