These commands help visualize the \(X^2\) distribution, make \(X^2\) probability calculations, e.g., to calculate p-values, and run \(X^2\) tests for goodness-of-fit and contingency table analysis. Some commands require mosaic (noted in comments) so you should install that package.
library(mosaic)
Visualizing the \(X^2\) distribution using plotDist()
:
# plotDist() is in the mosaic package
plotDist(dist='chisq', df=8, col="blue")
plotDist(dist='chisq', df=2, col="red", add=TRUE)
Making \(X^2\) cummulative probability calculations using pchisq()
:
# Note that we are usually interested in upper tail probabilities with $X^2$
# so use lower.tail=FALSE
Chisq.prob <- pchisq(q=5.82, df=2, lower.tail=FALSE)
Chisq.prob
## [1] 0.05447573
Making and visualizing \(X^2\) cummulative probability calculations using xpchisq()
:
# xpchisq() is in the mosaic package
xpchisq(q=5.82, df=2, lower.tail=FALSE)
## [1] 0.05447573
The goodness-of-fit test requires a vector of observed counts, and a vector of expected probabilities.
Make a vector using c()
# let's say we have a categorical variable with 300 observations of 3 levels: A, B, and C
ObsCounts <- c(185,15,100)
ObsCounts
## [1] 185 15 100
# let's say our expectation is equal probabilities of 1/3 each
ExpectProbs <- c(1/3,1/3,1/3)
ExpectProbs
## [1] 0.3333333 0.3333333 0.3333333
Run a \(X^2\) goodness-of-fit test using chisq.test()
chisq.test(x=ObsCounts, p=ExpectProbs)
##
## Chi-squared test for given probabilities
##
## data: ObsCounts
## X-squared = 144.5, df = 2, p-value < 2.2e-16
Run a \(X^2\) goodness-of-fit test using xchisq.test()
# xchisq.test() is in the mosaic package
xchisq.test(x=ObsCounts, p=ExpectProbs)
##
## Chi-squared test for given probabilities
##
## data: x
## X-squared = 144.5, df = 2, p-value < 2.2e-16
##
## 185.00 15.00 100.00
## (100.00) (100.00) (100.00)
## [72.25] [72.25] [ 0.00]
## < 8.50> <-8.50> < 0.00>
##
## key:
## observed
## (expected)
## [contribution to X-squared]
## <Pearson residual>
Make a vector from data observations using table()
# we will look at the HELPrct data, which comes with the mosaic package
# this data set is about linkage to primary care for individuals undergoing drug treatment
data(HELPrct)
# "substance" is a variable that lists the drug of abuse for each individual
ObsSubstance <- table(HELPrct$substance)
ObsSubstance
##
## alcohol cocaine heroin
## 177 152 124
If we again wanted to test this using equal probabilities as the expected distribution, we can use xchisq.test()
again.
xchisq.test(x=ObsSubstance, p=ExpectProbs)
##
## Chi-squared test for given probabilities
##
## data: x
## X-squared = 9.3113, df = 2, p-value = 0.009508
##
## 177 152 124
## (151.00) (151.00) (151.00)
## [4.4768] [0.0066] [4.8278]
## < 2.116> < 0.081> <-2.197>
##
## key:
## observed
## (expected)
## [contribution to X-squared]
## <Pearson residual>
The \(X^2\) test of independence (or contingency table analysis), requires a two-way table as input.
Make a two-way table of counts from data observations using table()
# Again we will use the HELPrct data set, using the variables "substance" and "homeless"
SubsHomeTwoWay <- table(HELPrct$substance, HELPrct$homeless)
SubsHomeTwoWay
##
## homeless housed
## alcohol 103 74
## cocaine 59 93
## heroin 47 77
Run a \(X^2\) test of independence using chisq.test()
chisq.test(x=SubsHomeTwoWay)
##
## Pearson's Chi-squared test
##
## data: SubsHomeTwoWay
## X-squared = 17.012, df = 2, p-value = 0.0002022
Run a \(X^2\) test of independence using xchisq.test()
# xchisq.test() is in the mosaic pacakage
xchisq.test(x=SubsHomeTwoWay)
##
## Pearson's Chi-squared test
##
## data: x
## X-squared = 17.012, df = 2, p-value = 0.0002022
##
## 103 74
## (81.66) (95.34)
## [5.58] [4.78]
## < 2.36> <-2.19>
##
## 59 93
## (70.13) (81.87)
## [1.77] [1.51]
## <-1.33> < 1.23>
##
## 47 77
## (57.21) (66.79)
## [1.82] [1.56]
## <-1.35> < 1.25>
##
## key:
## observed
## (expected)
## [contribution to X-squared]
## <Pearson residual>