Chi-Squared Distribution and Chi-Squared Tests

These commands help visualize the \(X^2\) distribution, make \(X^2\) probability calculations, e.g., to calculate p-values, and run \(X^2\) tests for goodness-of-fit and contingency table analysis. Some commands require mosaic (noted in comments) so you should install that package.

library(mosaic)

The \(X^2\) distribution and probability calculations

Visualizing the \(X^2\) distribution using plotDist():

# plotDist() is in the mosaic package
plotDist(dist='chisq', df=8, col="blue")

plotDist(dist='chisq', df=2, col="red", add=TRUE)

Making \(X^2\) cummulative probability calculations using pchisq():

# Note that we are usually interested in upper tail probabilities with $X^2$
# so use lower.tail=FALSE
Chisq.prob <- pchisq(q=5.82, df=2, lower.tail=FALSE)
Chisq.prob

## [1] 0.05447573

Making and visualizing \(X^2\) cummulative probability calculations using xpchisq():

# xpchisq() is in the mosaic package
xpchisq(q=5.82, df=2, lower.tail=FALSE)

## [1] 0.05447573

The \(X^2\) test of goodness-of-fit

The goodness-of-fit test requires a vector of observed counts, and a vector of expected probabilities.

Make a vector using c()

# let's say we have a categorical variable with 300 observations of 3 levels: A, B, and C
ObsCounts <- c(185,15,100)
ObsCounts

## [1] 185  15 100

# let's say our expectation is equal probabilities of 1/3 each

ExpectProbs <- c(1/3,1/3,1/3)
ExpectProbs

## [1] 0.3333333 0.3333333 0.3333333

Run a \(X^2\) goodness-of-fit test using chisq.test()

chisq.test(x=ObsCounts, p=ExpectProbs)

## 
##  Chi-squared test for given probabilities
## 
## data:  ObsCounts
## X-squared = 144.5, df = 2, p-value < 2.2e-16

Run a \(X^2\) goodness-of-fit test using xchisq.test()

# xchisq.test() is in the mosaic package
xchisq.test(x=ObsCounts, p=ExpectProbs)

## 
##  Chi-squared test for given probabilities
## 
## data:  x
## X-squared = 144.5, df = 2, p-value < 2.2e-16
## 
##  185.00    15.00   100.00 
## (100.00) (100.00) (100.00)
## [72.25]  [72.25]  [ 0.00] 
## < 8.50>  <-8.50>  < 0.00> 
##      
## key:
##  observed
##  (expected)
##  [contribution to X-squared]
##  <Pearson residual>

Make a vector from data observations using table()

# we will look at the HELPrct data, which comes with the mosaic package
# this data set is about linkage to primary care for individuals undergoing drug treatment
data(HELPrct)

# "substance" is a variable that lists the drug of abuse for each individual
ObsSubstance <- table(HELPrct$substance)
ObsSubstance

## 
## alcohol cocaine  heroin 
##     177     152     124

If we again wanted to test this using equal probabilities as the expected distribution, we can use xchisq.test() again.

xchisq.test(x=ObsSubstance, p=ExpectProbs)

## 
##  Chi-squared test for given probabilities
## 
## data:  x
## X-squared = 9.3113, df = 2, p-value = 0.009508
## 
##   177      152      124   
## (151.00) (151.00) (151.00)
## [4.4768] [0.0066] [4.8278]
## < 2.116> < 0.081> <-2.197>
##      
## key:
##  observed
##  (expected)
##  [contribution to X-squared]
##  <Pearson residual>

The \(X^2\) test of independence

The \(X^2\) test of independence (or contingency table analysis), requires a two-way table as input.

Make a two-way table of counts from data observations using table()

# Again we will use the HELPrct data set, using the variables "substance" and "homeless"
SubsHomeTwoWay <- table(HELPrct$substance, HELPrct$homeless)
SubsHomeTwoWay

##          
##           homeless housed
##   alcohol      103     74
##   cocaine       59     93
##   heroin        47     77

Run a \(X^2\) test of independence using chisq.test()

chisq.test(x=SubsHomeTwoWay)

## 
##  Pearson's Chi-squared test
## 
## data:  SubsHomeTwoWay
## X-squared = 17.012, df = 2, p-value = 0.0002022

Run a \(X^2\) test of independence using xchisq.test()

# xchisq.test() is in the mosaic pacakage
xchisq.test(x=SubsHomeTwoWay)

## 
##  Pearson's Chi-squared test
## 
## data:  x
## X-squared = 17.012, df = 2, p-value = 0.0002022
## 
##   103       74   
## (81.66)  (95.34) 
##  [5.58]   [4.78] 
## < 2.36>  <-2.19> 
##    
##    59       93   
## (70.13)  (81.87) 
##  [1.77]   [1.51] 
## <-1.33>  < 1.23> 
##    
##    47       77   
## (57.21)  (66.79) 
##  [1.82]   [1.56] 
## <-1.35>  < 1.25> 
##    
## key:
##  observed
##  (expected)
##  [contribution to X-squared]
##  <Pearson residual>