EPCS test

This is a function to run the Egon Pearson N-1 corrected version of the Pearson Chi-square test. It is generally suitable when expected values are > 1.

For expected values between 1 and 5 the usual Pearson test of independence is a little liberal (though generally not overly so when N is reasonable) and most popular alternatives tend to be conservative (including Yates’ correction which makes the test behave more like the Fisher exact test). The Fisher exact test also makes the (usually unreasonable) assumption that marginals are fixed. For very small expected values an exact test might approriate (but note that in these cases statistical power is likely to be low for any test because you have one or more cells with very little information).

In addition, the function returns various residuals and residual analyses that may be informative for some users. As standardized residuals tend to be conservative I report these without any correction for multiple testing. However, for asjusted standardized residuals I include a p value adjustment (Hommel) by default. These can be changed with the z.adjust.method which defaults to c(‘none’, ‘hommel’). See ?p.adjust for alternative options. Also note that residual analysis isn’t always appropriate (e.g., it is less useful for a 2x2 table than say a 4x6 table). These recommendations are based on MacDonald and Gardner (2000). Also if you have specific hypotheses to test you might be better off with a loglinear model or Poisson regression.

MacDonald, P.L., & Gardner R. C. (2000). Type I Error Rate Comparisons of Post Hoc Procedures for I x j Chi-Square Tables. Educational and Psychological Measurement. 60(5), 735–754.

epcs.test <- function(data.cells, z.adjust.method=c('none','hommel')){
  ucs.test <- suppressWarnings(stats::chisq.test(data.cells, simulate.p.value=FALSE, correct = FALSE))
  N <- sum(data.cells) ; nrows <- dim(data.cells)[1] ; ncols <- dim(data.cells)[2]
  corrected.stat <- ucs.test$stat[[1]] * (N-1)/N
  pval <- pchisq(corrected.stat, ucs.test$par, lower.tail = FALSE)
  p.resids <- ucs.test$resid ; as.resids <- ucs.test$stdres
  pr.pv <- pchisq(ucs.test$resid^2,1, lower.tail=F)
  pr.apv <- matrix(as.matrix(p.adjust(pchisq(p.resids^2, 1, lower.tail=F), z.adjust.method[1])), nrows, ncols, byrow=F)
  asr.apv <- matrix(as.matrix(p.adjust(pchisq(as.resids^2,1, lower.tail=F), z.adjust.method[2])), nrows, ncols, byrow=F)
  output.list <- list(
    'Uncorrected Pearson Chi-Square'= ucs.test$stat,
    'Egon Pearson Chi-Square' = corrected.stat,
    'df'=prod(dim(data.cells)-1), 'p'=pval,
    'Smallest expected value (should be greater than 1)' = min(ucs.test$expected),
    'Raw data' = data.cells,
    'Pearson (standardized) residuals (z)' = p.resids,
    'two-sided p (for z)' = pr.pv, 'adjusted p (for z)' = pr.apv,
    'Chi-square contribution per cell' = p.resids^2,
    'Adjusted standardized residuals' = as.resids,
    'adjusted p (for ASRs)' = asr.apv,
    'Pearson / ASR p adjustment' =z.adjust.method)
  return(output.list)
}

# enter cell counts as matrix - here it is 2x2 but need not be
data.cells <- matrix(c( 5,15,
                        15,5 ), 2,2, byrow=TRUE)
data.cells

##      [,1] [,2]
## [1,]    5   15
## [2,]   15    5

epcs.test(data.cells)

## $`Uncorrected Pearson Chi-Square`
## X-squared 
##        10 
## 
## $`Egon Pearson Chi-Square`
## [1] 9.75
## 
## $df
## [1] 1
## 
## $p
## [1] 0.001793227
## 
## $`Smallest expected value (should be greater than 1)`
## [1] 10
## 
## $`Raw data`
##      [,1] [,2]
## [1,]    5   15
## [2,]   15    5
## 
## $`Pearson (standardized) residuals (z)`
##           [,1]      [,2]
## [1,] -1.581139  1.581139
## [2,]  1.581139 -1.581139
## 
## $`two-sided p (for z)`
##           [,1]      [,2]
## [1,] 0.1138463 0.1138463
## [2,] 0.1138463 0.1138463
## 
## $`adjusted p (for z)`
##           [,1]      [,2]
## [1,] 0.1138463 0.1138463
## [2,] 0.1138463 0.1138463
## 
## $`Chi-square contribution per cell`
##      [,1] [,2]
## [1,]  2.5  2.5
## [2,]  2.5  2.5
## 
## $`Adjusted standardized residuals`
##           [,1]      [,2]
## [1,] -3.162278  3.162278
## [2,]  3.162278 -3.162278
## 
## $`adjusted p (for ASRs)`
##             [,1]        [,2]
## [1,] 0.001565402 0.001565402
## [2,] 0.001565402 0.001565402
## 
## $`Pearson / ASR p adjustment`
## [1] "none"   "hommel"