The chi-square test1 See PSU STAT 500, Wikipedia, and Disha M’s notes. compares the \(R\) levels of \(C\) observed categorical variable frequency counts \(np_{rc}\) with their expected values \(n\pi_{rc}\). The test statistic \(\chi^2 = \sum_{r,c}{\frac{(np_{rc} - n\pi_{rc})^2}{n\pi_{rc}}}\) is distributed \(\chi_{(R-1)(C-1)}^2\). \(H_0: \forall r,c \in (R,C), n p_{rc} = n \pi_{rc}\) and \(H_a\) is at least one \(n p_{rc} \ne n \pi_{rc}\).2 The chi-square goodness of fit test is a nonparametric test, meaning it does not estimate a single population parameter. The chi-square test relies on the central limit theorem, so it is valid for independent, normally distributed samples, typically affirmed with at least 5 successes and failures3 From the PSU material, “Some statisticians hesitate to use the chi-square test if more than 20% of the cells have expected frequencies below five, especially if the p-value is small and these cells give a large contribution to the total chi-square value.”. The chi-squared test has a different name for its various applications.
The chi-square goodness-of-fit test4 Also known as Pearson’s chi-square test. evaluates whether the frequency counts of the \(R\) levels of a categorical variable follow a hypothesized distribution.
The chi-square test for independence5 More notes at PSU STAT 414/5, and Selva Prabhakaran at R-bloggers evaluates whether the frequency counts of the \(R\) levels of a categorical variable differs among the \(C\) levels of another categorical variable. If not, the two variables are independent. The expected values at each \((r,j)\) are the joint densities \(E_{i,j} = \frac{n_i n_j}{n}\).
The chi-square test for homogeneity6 More notes at PSU STAT 414/5, and Stat Trek evaluates whether frequency counts of the \(R\) levels of a categorical variable are distributed identically across \(C\) different populations.
The chi-square goodness-of-fit test evaluates whether the frequency counts of the \(R\) levels of a categorical variable follow a hypothesized distribution. Here \(C=1\), so the test statistic \(\chi^2 = \sum_{r}{\frac{(np_{r} - n\pi_{r})^2}{n\pi_{r}}}\) is distributed \(\chi_{(R-1)}^2\).
The chi-square goodness-of-fit test is derived from the z statistic and returns the same results as the one-sample proportion test’s normal approximation method when there are \(R=2\) levels.7 While the two tests are statistically similar, the two-proportion test also calculates a confidence interval estimating how large the difference in proportions are.
The test generalizes to all hypothesized distributions. However, with hypothesized distributions, we usually have to estimate one or more parameters. E.g., we may hypothesize \(X \sim b(n,p)\), but have to estimate \(p\), usually with its maximum likelihood estimator, \(p = n_r / n\). When making an estimate, reduce the degrees of freedom by the number of estimated parameters, \(d\), so \(df = (R-1) - d\).
Prior studies have shown a four possible responses to a therapy occur with frequency \(\pi_1 = .50\), \(\pi_2=.25\), \(\pi_3=.10\), and \(\pi_4=.15\). A random sample of \(n=200\) yields \(n_1=120\), \(n_2=60\), \(n_3=10\), and \(n_4=10\). At an \(\alpha=.05\) level of significance, does the random sample confirm the expected frequencies?
observed <- c(120, 60, 10, 10)
n <- 200
expected <- c(.50, .25, .10, .15) * n
alpha <- .05
library(ggplot2)
library(dplyr)
library(tidyr)
library(ggthemes)
r <- c(1, 2, 3, 4)
data.frame(r, observed, expected) %>%
gather(key = "response", value = "freq", c(-r)) %>%
ggplot() +
geom_col(aes(x = r, y = freq, fill = response), position = "dodge") +
labs(title = bquote("Frequency Counts"),
x = "Response to therapy",
y = "Frequency") +
theme_tufte()
This is a goodness-of-fit question because we have a single sample of categorical variable frequencies that we want to compare to a set of hypothesized probabilities.
There are \(R = 4\) levels to the categorical variable. \(H_0: \sum_{r=1}^5 {(\pi_r - p_r)} = 0\) with \(H_a: \sum_{r=1}^5 {(\pi_r - p_r)} \ne 0\).
The test statistic is \(\chi^2 = \sum_{r}{\frac{(np_{r} - n\pi_{r})^2}{n\pi_{r}}} = \frac{(120 - .50 \bullet 200)^2}{.50 \bullet 200} + \frac{(60 - .25 \bullet 200)^2}{.25 \bullet 200} + \frac{(10-.10 \bullet 200)^2}{.10 \bullet 200} + \frac{(10-.15 \bullet 200)^2}{.15 \bullet 200} = 24.33\) with \(df = R - 1 = 3\).
df <- 4 - 1
(chisq <- sum((observed - expected)^2 / expected))
## [1] 24.33333
(p_value <- pchisq(q = chisq, df = df, lower.tail = FALSE))
## [1] 2.128057e-05
\(P(\chi^2 > 24.33) < .001\), so reject \(H_0\) that the distributions are identical.
lrr = -Inf
urr = qchisq(p = alpha, df = df, lower.tail = FALSE)
data.frame(chi2 = 0:2500 / 100) %>%
mutate(density = dchisq(x = chi2, df = df)) %>%
mutate(rr = ifelse(chi2 < lrr | chi2 > urr, density, 0)) %>%
ggplot() +
geom_line(aes(x = chi2, y = density)) +
geom_area(aes(x = chi2, y = rr), fill = "red", alpha = 0.3) +
# geom_vline(aes(xintercept = pi_0), color = "black") +
geom_vline(aes(xintercept = chisq), color = "red") +
labs(title = bquote("Chi-Squared Goodness-of-Fit Test"),
subtitle = bquote("Chisq ="~.(round(chisq,2))~", n ="~.(n)~", alpha ="~.(alpha)~", chisq_crit ="~.(round(urr,2))~", p-value ="~.(round(p_value,3))),
x = "chisq",
y = "Density") +
theme(legend.position="none")
R function chisq.test
performs these same calculations.
(chisq.test.result <- chisq.test(x = observed,
p = expected / n))
##
## Chi-squared test for given probabilities
##
## data: observed
## X-squared = 24.333, df = 3, p-value = 2.128e-05
The chi-square goodness-of-fit test extends to all hypothesized distributions, not just hypothesized proportions. However, we typically do not know the probabilities of the hypothesized distributions. For example, we may hypothesize \(X~b(n,p)\), but we do not know \(p\). In such cases, estimate \(p\) with a maximum likelihood estimator. In the case of the binomial, the maximum likelihood estimator is \(p = \frac{\sum_{i=1}^k k n_k}{n}\). When making an estimate, reduce the degrees of freedom by one: \(df=k-1-1\).
A student population is hypothesized to be 60 percent female \(\pi_F=0.60\). A random sample of \(n=100\) students yields 53 percent females \(p_F=0.53\). Is the sample representative of the population at an \(\alpha=0.05\) level of significance?
observed <- c(53, 47)
n <- 100
expected <- c(.60, .40) * n
alpha <- .05
r <- c("female", "male")
data.frame(r, observed, expected) %>%
gather(key = "response", value = "freq", c(-r)) %>%
ggplot() +
geom_col(aes(x = r, y = freq, fill = response), position = "dodge") +
labs(title = bquote("Frequency Counts"),
x = "Gender",
y = "Frequency") +
theme_tufte()
This is a chi-square goodness of fit test with \(R=2\) levels. \(H_0: \sum_{r=1}^2{(\pi_{0_r} - p_r)} = 0\) with \(H_a: \sum_{r=1}^2{(\pi_{0_r} - p_r)} \ne 0\). The test statistic is \(\chi^2 = \sum_{r=1}^R{\frac{(np_r - n\pi_r)^2}{np_r - n\pi_r}} = \frac{(53 - 60)^2}{60} + \frac{(47 - 40)^2}{40} = 2.04\) with \(df = 2 - 1\). \(P(\chi_1^2 > 2.04) = 0.153\) so do not reject \(H_0\).
r <- 2
(chisq <- sum((observed - expected)^2 / expected))
## [1] 2.041667
(p.value <- pchisq(q = chisq, df = r - 1, lower.tail = FALSE))
## [1] 0.1530419
Or, using chisq.test
:
(chisq.test.result <- chisq.test(x = observed,
p = expected / n))
##
## Chi-squared test for given probabilities
##
## data: observed
## X-squared = 2.0417, df = 1, p-value = 0.153
In this instance, \(r = 2\), so we can also use the one-sample proportion test with test statistic \(Z = \frac{p_F - \pi_F}{se} = \frac{53 - 60}{\sqrt{.6 (1 - .6) / 100}} = -1.423\). \(P(Z>|-1.43|) = 0.153\). The advantage of the one-sample proportion test is that it also calculates a \(1 - \alpha)\%\) CI.
alpha <- .05
prop.test(x = observed[1],
n = n,
p = expected[1] / n,
alternative = "two.sided",
conf.level = 1 - alpha,
correct = FALSE)
##
## 1-sample proportions test without continuity correction
##
## data: observed[1] out of n, null probability expected[1]/n
## X-squared = 2.0417, df = 1, p-value = 0.153
## alternative hypothesis: true p is not equal to 0.6
## 95 percent confidence interval:
## 0.4328886 0.6248918
## sample estimates:
## p
## 0.53
When \(R = 2\), especially when entries are “small”, the \(\chi^2\) distribution is not continuous. Adjust for this by applying the Yates continuity correction. The Yates continuity correction adds/subtracts 0.5 to the observed counts to bring their fractional limit closer to the expected values.
observed2 <- c(53+.5, 47-.5)
(chisq <- sum((observed2 - expected)^2 / expected))
## [1] 1.760417
(p.value <- pchisq(q = chisq, df = r - 1, lower.tail = FALSE))
## [1] 0.1845726
In prop.test
and chisq.test
, specify correct = TRUE
to apply the Yates continuity correction.
prop.test(x = observed[1],
n = n,
p = expected[1] / n,
alternative = "two.sided",
conf.level = 1 - alpha,
correct = TRUE)
##
## 1-sample proportions test with continuity correction
##
## data: observed[1] out of n, null probability expected[1]/n
## X-squared = 1.7604, df = 1, p-value = 0.1846
## alternative hypothesis: true p is not equal to 0.6
## 95 percent confidence interval:
## 0.4280225 0.6296465
## sample estimates:
## p
## 0.53
chisq.test(x = observed,
p = expected / n,
correct = TRUE)
##
## Chi-squared test for given probabilities
##
## data: observed
## X-squared = 2.0417, df = 1, p-value = 0.153
library(ggplot2)
library(dplyr)
lrr = -Inf
urr = qchisq(p = alpha, df = r - 1, lower.tail = FALSE)
p_value <- chisq.test.result$p.value
data.frame(chi2 = 100:1000 / 100) %>%
mutate(density = dchisq(x = chi2, df = r - 1)) %>%
mutate(rr = ifelse(chi2 < lrr | chi2 > urr, density, 0)) %>%
ggplot() +
geom_line(aes(x = chi2, y = density)) +
geom_area(aes(x = chi2, y = rr, fill = "red"), alpha = 0.3) +
# geom_vline(aes(xintercept = pi_0), color = "black") +
geom_vline(aes(xintercept = chisq), color = "red") +
labs(title = bquote("Chi-Squared Goodness-of-Fit Test"),
subtitle = bquote("Chisq ="~.(round(chisq,2))~", n ="~.(n)~", alpha ="~.(alpha)~", chisq_crit ="~.(round(urr,2))~", p-value ="~.(round(p_value,3))),
x = "chisq",
y = "Density") +
theme(legend.position="none")
A sample has the following frequency counts. Is the random from a \(Poisson(\lambda)\) distribution?
x <- c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
obs <- c(0, 1, 4, 13, 19, 16, 15, 9, 12, 7, 2, 1, 1)
knitr::kable(data.frame(x, obs), caption = "Sample Frequence Counts")
Sample Frequence Counts
x | obs |
---|---|
0 | 0 |
1 | 1 |
2 | 4 |
3 | 13 |
4 | 19 |
5 | 16 |
6 | 15 |
7 | 9 |
8 | 12 |
9 | 7 |
10 | 2 |
11 | 1 |
12 | 1 |
We have the observed values, but need to calculate the expected values. The expected value in a Poisson distribution is \(E(r) = e^{-\lambda} \frac{\lambda^r}{r!}\). The parameter \(\lambda\) (the expected value) is not specified, but we can estimate it with the maximum likelihood estimator, the sample mean weighted average, \(\lambda \sim \bar{x} = 5.6\).
(x_bar <- sum(x * obs) / sum(obs))
## [1] 5.59
den <- dpois(x = x, lambda = x_bar) # expected densities
The counts for \(X<=2\) and \(X>=10\) are small \((<5)\), so collapse them to save degrees of freedom, \(df = 9-1\).
x2 <- c("0-2", 3, 4, 5, 6, 7, 8, 9, "10-12")
obs2 <- c(sum(obs[1:3]), obs[4:10], sum(obs[11:13]))
den2 <- c(sum(den[1:3]), den[4:10], sum(den[11:13]))
exp2 <- den2 * sum(obs2)
knitr::kable(data.frame(x2, obs2, den2, exp2), caption = "Sample Frequence Counts")
Sample Frequence Counts
x2 | obs2 | den2 | exp2 |
---|---|---|---|
0-2 | 5 | 0.0829701 | 8.297010 |
3 | 13 | 0.1087372 | 10.873717 |
4 | 19 | 0.1519602 | 15.196019 |
5 | 16 | 0.1698915 | 16.989150 |
6 | 15 | 0.1582822 | 15.828224 |
7 | 9 | 0.1263997 | 12.639968 |
8 | 12 | 0.0883218 | 8.832177 |
9 | 7 | 0.0548576 | 5.485764 |
10-12 | 4 | 0.0535084 | 5.350839 |
There are \(R = 9\) levels, so the degrees of freedom is \(df = 9 - 1 - 1 = 7\). \(H_0: \sum_{r = 1}^R{O_r - E_r} = 0\). The test statistic is \(\chi^2 = \sum_{i=1}^k{\frac{(O_r - E_r)^2)}{E_r}} = 5.7157\). The p-value is \(P(\chi_7^2 > 5.7157) = 0.5432\), so do not reject \(H_0\).8 chisq.test
uses df = R - 1 instead of (R - 1) - 1 adjusted for the estimated population parameter. Need to learn how to override df or work around.
(chisq <- sum((obs2 - exp2)^2 / exp2))
## [1] 5.722502
(p_value <- pchisq(q = chisq, df = length(obs2) - 1 , lower.tail = FALSE))
## [1] 0.6782835
chisq.test(x = obs2, p = den2, rescale.p = TRUE)
##
## Chi-squared test for given probabilities
##
## data: obs2
## X-squared = 5.6909, df = 8, p-value = 0.6818
library(ggplot2)
library(dplyr)
# Graph of Hypothesis Test
lrr = -Inf
urr = qchisq(p = alpha, df = (length(obs2) - 1) - 1, lower.tail = FALSE)
data.frame(chi2 = 0:2000 / 100) %>%
mutate(density = dchisq(x = chi2, df = (length(obs2) - 1) - 1)) %>%
mutate(rr = ifelse(chi2 < lrr | chi2 > urr, density, 0)) %>%
ggplot() +
geom_line(aes(x = chi2, y = density)) +
geom_area(aes(x = chi2, y = rr, fill = "red"), alpha = 0.3) +
# geom_vline(aes(xintercept = pi_0), color = "black") +
geom_vline(aes(xintercept = chisq), color = "red") +
labs(title = bquote("One-Sample Chi-Squared Goodness-of-Fit Test"),
subtitle = bquote("n ="~.(sum(obs2))~", r ="~.(length(obs2))~", alpha ="~.(alpha)~", Chisq_crit ="~.(round(urr,2))~", Chisq ="~.(round(chisq,2))~", p-value ="~.(round(p_value,4))),
x = "chisq",
y = "Density") +
theme(legend.position="none")
A project studied whether age is related to desire to ride a bicycle. The categorical variable age group has 4 levels: 18-24, 25-34, 35-49, and 50-64. The levels of categorical variable desire are yes and no.
age <- NULL
age[c(1:60, 202:241)] <- 1
age[c(61:114, 242:285)] <- 2
age[c(115:160, 286:338)] <- 3
age[c(161:201, 339:395)] <- 4
bike <- NULL
bike[1:201] <- 1
bike[202:395] <- 2
dat <- data.frame(age, bike)
dat$age <- factor(dat$age, levels = c(1, 2, 3, 4),
labels = c("18-24", "25-34", "35-49", "50-64"))
dat$bike <- factor(dat$bike, levels = c(1, 2),
labels = c("yes", "no"))
This is a test for independence because we have a single population from which we are comparing the levels of a factor variable .
df <- (4-1)*(2-1)
alpha <- 0.05
(test <- chisq.test(dat$age, dat$bike))
##
## Pearson's Chi-squared test
##
## data: dat$age and dat$bike
## X-squared = 8.0061, df = 3, p-value = 0.04589
# Graph of hypothesis test
lrr = -Inf
urr = qchisq(p = alpha, df = df, lower.tail = FALSE)
data.frame(xi = 0:2000 / 100) %>%
mutate(density = dchisq(x = xi, df = df)) %>%
mutate(rr = ifelse(xi < lrr | xi > urr, density, 0)) %>%
ggplot() +
geom_line(aes(x = xi, y = density), color = "black") +
geom_area(aes(x = xi, y = rr), fill = "red", alpha = 0.3) +
geom_vline(aes(xintercept = test$statistic), color = "blue") +
labs(title = bquote("Chi-Square Test for Homogeneity"),
subtitle = bquote("P-value ="~.(test$p.value)),
x = "xi^2",
y = "Density") +
theme(legend.position="none")
A project studied whether attending physicians order more unnecessary blood transfusions than residents. The categorical variable frequency of orders has 4 levels: frequently, occasionally, rarely, and never.
library(dplyr)
library(ggplot2)
library(stats)
pop <- NULL
pop[1:49] <- 1
pop[50:120] <- 2
lev <- NULL
lev[c(1:2, 50:64)] <- 1
lev[c(3:5, 65:92)] <- 2
lev[c(6:36, 93:115)] <- 3
lev[c(37:49, 116:120)] <- 4
dat <- data.frame(pop, lev)
dat$pop <- factor(dat$pop, levels = c(1, 2),
labels = c("attending", "resident"))
dat$lev <- factor(dat$lev, levels = c(1, 2, 3, 4),
labels = c("frequently", "occasionally", "rarely", "never"))
df <- (2-1)*(4-1)
alpha <- 0.05
(test <- chisq.test(dat$lev, dat$pop))
##
## Pearson's Chi-squared test
##
## data: dat$lev and dat$pop
## X-squared = 31.881, df = 3, p-value = 5.543e-07
# Graph of hypothesis test
lrr = -Inf
urr = qchisq(p = alpha, df = df, lower.tail = FALSE)
data.frame(xi = 0:400 / 10) %>%
mutate(density = dchisq(x = xi, df = df)) %>%
mutate(rr = ifelse(xi < lrr | xi > urr, density, 0)) %>%
ggplot() +
geom_line(aes(x = xi, y = density), color = "black") +
geom_area(aes(x = xi, y = rr), fill = "red", alpha = 0.3) +
geom_vline(aes(xintercept = test$statistic), color = "blue") +
labs(title = bquote("Chi-Square Test for Homogeneity"),
subtitle = bquote("P-value ="~.(test$p.value)),
x = "xi^2",
y = "Density") +
theme(legend.position="none")