The \(X^2\) or Chi-squared test is used to assess observed distributions of counts of a categorical variable. It can be used in a variety of different contexts, but we will focus on two scenarios: a contingency table analysis and a goodness of fit test. In both cases, we can use the xchisq.test() function from the mosaic package.
Contingency Table Analysis
First, let’s consider the contingency table analysis. Here we are making a test of association between two categorical variable. That is, we want to know whether the value of one variable is contingent upon the value of the other variable. Hence the name. In our case, we are going to use data collected in 1986 at the Baystate Medical Center, in Springfield, Massachusettes. The focus of the study was on risk factors associated with low infant birth weight. In particular, we will ask whether smoking during pregancy is associated with low birth weight (here identified as a birth weight less than 2.5 kg).
require(mosaic) # be sure to install and load the usual packages if you have not already
require(lattice)
require(datasets)
require(MASS) # this package contains the dataset we will use
data(birthwt)
You can find out more about the data using help() and once the data are loaded you can click on the data frame to view the actual dataset in the data window.
help(birthwt)
It is always helpful to look at a plot and summary of the data first. In this case we can use a bargraph to see the distribution of outcomes comparing smoking and non-smoking mothers. We can also use tally() to construct our contingency table.
bargraph(~low | smoke, data=birthwt, xlab="Birth weight outcome", ylab="Frequency", scales=list(x=list(labels=c("Normal","Low"))), strip=strip.custom(factor.levels=c("Non-smokers","Smokers")))
tally(~low | smoke, data=birthwt, format='count')
## smoke
## low 0 1
## 0 86 44
## 1 29 30
From the contingency table and the barchart, we can see that non-smoking mothers had more Normal weight births (low = 0) than the smoking mothers. We can now perform the \(X^2\)-test to make a more quantitative inference:
xchisq.test(tally(~low | smoke, data=birthwt, format='count')) # note that the xchisq.test function takes a contingency table from tally() as input. You can "nest" functions in R this way, and it always works from the inside out.
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: tally(~low | smoke, data = birthwt, format = "count")
## X-squared = 4.236, df = 1, p-value = 0.03958
##
## 86 44
## (79.10) (50.90)
## [0.60] [0.94]
## < 0.78> <-0.97>
##
## 29 30
## (35.90) (23.10)
## [1.33] [2.06]
## <-1.15> < 1.44>
##
## key:
## observed
## (expected)
## [contribution to X-squared]
## <residual>
Like most statistical tests, the \(X^2\)-test is based on a test statistic, \(X^2\), with a known probability distribution. The larger the \(X^2\) value, the less likely it is that the observed difference has simply occurred by chance. The \(X^2\) distribution is positive, so the value observed here X-squared = 4.24 is moderately big. The p-value, of the probability of observing a \(X^2\) value that is at least this large if the two categorical variables were completely independent of one another (the so-called “null hypothesis”). Since our p-value = 0.040 is quite small, there is very little chance that the observed observed outcomes simply occurred “by chance”. Scientists generally use p-value = 0.05 as a (somewhat arbitrary) cutoff for making confident inferences.
The \(X^2\) statistic is based on the difference between the observed and expected counts. The output has a nice table at the end which allows you to compare the observed values, at the top, to the corresponding expected values, in parentheses. Here we can see that non-smokers have more normal weight births and fewer low weight births than expected, while smokers have fewer normal weight births and more low weight births than expected. The contribution to X-squared entries, in square brackets, tell us where the big differences are - the larger the value, the bigger the difference from the expected. In our example, the smokers represent the big difference.
So, we can comfortably conclude that when mothers smoke during pregnancy, they are more likely to have low birth weight infants.
Goodness of Fit Test
For our second example, using the \(X^2\) test to assess goodness of fit, in this case, we will look at the fit of genetic data at a single locus to the expectations based on Hardy-Weinberg equilibrium. Heithaus et al. examined the genotype of 394 rainbow darters, Etheostoma caeruleum at the PGM locus, which has two alleles that we will call A and a. They found the following distribution:
darters = c(342, 47, 5) # the c() operator strings together numbers or characters into a vector
names(darters) = c("AA", "Aa", "aa") # the names() function attaches names to each element of its input
darters
## AA Aa aa
## 342 47 5
Under Hardy-Weinberg equilibrium we expect the relative frequency of the three genotypes to be \(p^2\), \(2pq\), and \(q^2\), where p is the frequency of the A allele, and q is the frequency of the a allele. In our case we can estimate p and q as:
p = (342+47/2)/389
q = 1-p
p; q
## [1] 0.9396
## [1] 0.06041
HWprobs = c(p^2, 2*p*q, q^2) # here we make a vector of the Hardy-Weinberg expectations
The \(X^2\) goodness of fit test compares the observed counts to the expected proportions, which in this case are drawn from the Hardy-Weinberg equilibirum:
xchisq.test(darters, p = HWprobs) # Here the xchisq.test function takes the darters table as input. The "p" parameter is the vector of expected probabilities, based on Hardy-Weinberg.
## Warning: Chi-squared approximation may be incorrect
##
## Chi-squared test for given probabilities
##
## data: darters
## X-squared = 9.037, df = 2, p-value = 0.0109
##
## 342.00 47.00 5.00
## (347.83) ( 44.73) ( 1.44)
## [0.098] [0.115] [8.824]
## <-0.31> < 0.34> < 2.97>
##
## key:
## observed
## (expected)
## [contribution to X-squared]
## <residual>
The results are in the same format as our first example, but note that the nature of the test has changed. Instead of asking whether two categorical variables are contingent on one another, we are testing whether the observed distribution of counts matches a particular (in this case theoretical) expectation. Since X-squared = 9.04 and p-value = 0.01, we would not expect to see such a large \(X^2\) value if the observed darters were drawn from a population in Hardy-Weinberg equilibrium. Thus, we can confidently conclude that the population does not meet the assumptions of the Hardy-Weinberg theory. In particular, from the table of observed and expected values, we can see that even though the AA genotype is dominant in the population, we actually observe substantially more aa individuals than we would expect under Hardy-Weinberg.
In summary, the \(X^2\) test provides a versatile method for comparing observed counts of categorical variables to theoretical expectations and to discover whether multiple categorical variables are related to one another.