1. Introduction

The goodness of fit test is used to test if sample data fits a distribution from a certain population (i.e. a population with a normal distribution or one with a Weibull distribution). In other words, it tells you if your sample data represents the data you would expect to find in the actual population. Goodness of fit tests commonly used in statistics are:

The chi-square.
Kolmogorov-Smirnov.
Anderson-Darling.
Shipiro-Wilk.

In this book, we will focus on the chi-square. The chi-square test is the most common of the goodness of fit tests and is the one you’ll come across in Advanced Placement Statistics or elementary statistics.

2. The $χ^2$ Test

The Chi-square ($χ^2$) test is conducting a hypothesis test for the proportions of one or more multinomial categorical variable. The test statistic

\[χ^2 = ∑_{r,c} \frac {(np_{rc}-nπ_{rc})^2} {nπ_{rc}} \]

is distributed $χ^2_{(R−1)(C−1)}$. The hypothesis is:

$H_0:∀_{r,c}∈(R,C),np_{rc}=nπ_{rc}$
$H_a$:is at least one $np_{rc}≠nπ{rc}$.

Annotation:

$χ^2$ : the Chi-square value
$np_{rc}$ : the observation frequency (counts)
$nπ_{rc}$ : the expectation frequency (counts)
$R$ : observation rows
$C$ : variable columns

The Chi-square goodness of fit test is also called as a non-parametric test, meaning it does not estimate a single population parameter. The chi-square test relies on the central limit theorem, so it is valid for independent, normally distributed samples, typically affirmed with at least successes and failures. The chi-squared test has a different name for its various applications.

2.1 Advantages

The chi square can be used for discrete distributions like the binomial distribution and the Poisson distribution, while the The Kolmogorov-Smirnov and Anderson-Darling goodness of fit tests can only be used for continuous distributions.

2.2 Disadvantages

Two potential disadvantages of chi square are:

The chi square test can only be used for data put into classes (bins). If you have non-binned data you’ll need to make a frequency table or histogram before performing the test.
Another disadvantage of the chi-square test is that it requires a sufficient sample size in order for the chi-square approximation to be valid.

3 Pearson’s $χ2$ test

The Chi-square goodness-of-fit test also known as Pearson’s chi-square test. This test use to evaluates whether the frequency counts of the R levels of a categorical variable follow a hypothesized distribution.

3.1 Example 1

Prior studies have shown a four possible responses to a therapy occur with frequency $π_1$=.50,$π_2$=.25,$π_3$=.10, and $π_4$=.15. A random sample of $n$=200 yields $n_1$=120,$n_2$=60,$n_3$=10, and $n_4$=10. At an $α$=.05 level of significance, does the random sample confirm the expected frequencies?

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 3.6.3

library(dplyr)

## Warning: package 'dplyr' was built under R version 3.6.3

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyr)

## Warning: package 'tidyr' was built under R version 3.6.3

library(ggthemes)

## Warning: package 'ggthemes' was built under R version 3.6.3

observed <- c(120, 60, 10, 10)
n <- 200  
expected <- c(.50, .25, .10, .15) * n
alpha <- .05
r <- c(1, 2, 3, 4)
data.frame(r, observed, expected) %>%
gather(key = "response", value = "freq", c(-r)) %>%
ggplot() +
  geom_col(aes(x = r, y = freq, fill = response), position = "dodge") +
  labs(title = bquote("Frequency Counts"),
       x = "Response to therapy",
       y = "Frequency") +
  theme_tufte()

This is a goodness-of-fit question because we have a single sample of categorical variable frequencies that we want to compare to a set of hypothesized probabilities. There are R=4 levels to the categorical variable.

$H_0:∑^5_{r=1}(πr−pr)=0$ $H_a:∑^5_{r=1}(πr−pr)≠0$

The test statistic is

\[χ^2 = ∑^R_{r=1} \frac {(np_{rc}-nπ_{rc})^2} {nπ_{rc}} \] \[ χ^2 = \frac {(120-0.50*200)^2} {0.50 *200} + \frac{(60-0.25 *200)^2} {0.25*200} + \frac{(10-0.10*200)^2} {0.10*200}+ \frac {(10-0.15*200)^2} {0.15*200}\]

\[ χ^2 = 24.33 \]

where $df = 4-1 =3$. In R, we can calculate $χ^2 $ as the following:

df <- 4 - 1
(chisq <- sum((observed - expected)^2 / expected))

## [1] 24.33333

(p_value <- pchisq(q = chisq, df = df, lower.tail = F))

## [1] 2.128057e-05

(chisq.test.result <- chisq.test(x = observed, p = expected / n))

## 
##  Chi-squared test for given probabilities
## 
## data:  observed
## X-squared = 24.333, df = 3, p-value = 2.128e-05

The result show us $P(χ^2>24.33)<.001$ , so reject $H_0$ that the distributions are identical. We can also visualize the result as we can see bellow:

lrr = -Inf
urr = qchisq (p = alpha, df = df, lower.tail = FALSE)
data.frame(chi2 = 0:2500 / 100) %>%
  mutate(density = dchisq(x = chi2, df = df)) %>%
  mutate(rr = ifelse(chi2 < lrr | chi2 > urr, density, 0)) %>%
ggplot() +
  geom_line(aes(x = chi2, y = density)) +
  geom_area(aes(x = chi2, y = rr), fill = "red", alpha = 0.3) +
#  geom_vline(aes(xintercept = pi_0), color = "black") +
  geom_vline(aes(xintercept = chisq), color = "red") +
  labs(title = bquote("Chi-Squared Goodness-of-Fit Test"),
       subtitle = bquote("Chisq ="~.(round(chisq,2))~", n ="~.(n)~", alpha ="~.(alpha)~", chisq_crit ="~.(round(urr,2))~", p-value ="~.(round(p_value,3))),
       x = "chisq",
       y = "Density") +
  theme(legend.position="none")

3.2 Example 2

A student population is hypothesized to be 60 percent female $πF=0.60$. A random sample of $n$=100 students yields 53 percent females $pF$=0.53. Is the sample representative of the population at an $α=0.05$ level of significance?

observed <- c(53, 47)
n <- 100  
expected <- c(.60, .40) * n
alpha <- .05
r <- c("female", "male")
data.frame(r, observed, expected) %>%
gather(key = "response", value = "freq", c(-r)) %>%
ggplot() +
  geom_col(aes(x = r, y = freq, fill = response), position = "dodge") +
  labs(title = bquote("Frequency Counts"), 
       x = "Gender",
       y = "Frequency") +
  theme_tufte()

This is a chi-square goodness of fit test with $R=2$ levels.

$H_0:∑^2_r=1(πr−pr)=0$
$H_0:∑^2_r=1(πr−pr)≠0$

The test statistic is

\[χ^2 = ∑^R_{r=1} \frac {(np_{rc}-nπ_{rc})^2} {nπ_{rc}} \]

\[ χ^2 = \frac {(53-60)^2} {60} + \frac {(47-40)^2} {40}\]

\[ χ^2 = 2.04 \]

where $df = 2-1 =1$. In R, we can calculate $χ^2$ as the following:

r <- 2
(chisq <- sum((observed - expected)^2 / expected))

## [1] 2.041667

(p.value <- pchisq(q = chisq, df = r - 1, lower.tail = FALSE))

## [1] 0.1530419

(chisq.test.result <- chisq.test(x = observed, p = expected / n))

## 
##  Chi-squared test for given probabilities
## 
## data:  observed
## X-squared = 2.0417, df = 1, p-value = 0.153

In this example, $r=2$, so we can also use the one-sample proportion test with test statistic:

\[ Z = \frac {pF-πF } { se}\] \[ Z = \frac {pF-πF} {\sqrt {πF(1−πF)/100}}\] \[ Z = \frac {0.53 - 0.6} {\sqrt {0.6(1-0.6)/100}} \] \[ Z = 1.423 \]

The result show us $P(Z>|−1.43|)=0.153$. The advantage of the one-sample proportion test is that it also calculates a $(1−α)%$ CI.

alpha <- .05
prop.test(x = observed[1], 
          n = n, 
          p = expected[1] / n, 
          alternative = "two.sided", 
          conf.level = 1 - alpha, 
          correct = FALSE)

## 
##  1-sample proportions test without continuity correction
## 
## data:  observed[1] out of n, null probability expected[1]/n
## X-squared = 2.0417, df = 1, p-value = 0.153
## alternative hypothesis: true p is not equal to 0.6
## 95 percent confidence interval:
##  0.4328886 0.6248918
## sample estimates:
##    p 
## 0.53

chisq.test(x = observed,  p = expected / n,correct = TRUE)

## 
##  Chi-squared test for given probabilities
## 
## data:  observed
## X-squared = 2.0417, df = 1, p-value = 0.153

When $R=2$, especially when entries are “small”, the $χ^2$ distribution is not continuous. Adjust for this by applying the Yates continuity correction. The Yates continuity correction adds/subtracts 0.5 to the observed counts to bring their fractional limit closer to the expected values.

observed2 <- c(53+.5, 47-.5)
(chisq <- sum((observed2 - expected)^2 / expected))

## [1] 1.760417

(p.value <- pchisq(q = chisq, df = r - 1, lower.tail = FALSE))

## [1] 0.1845726

In chisq.test, specify correct = TRUE to apply the Yates continuity correction.

prop.test(x = observed[1], 
          n = n, 
          p = expected[1] / n, 
          alternative = "two.sided", 
          conf.level = 1 - alpha, 
          correct = TRUE)

## 
##  1-sample proportions test with continuity correction
## 
## data:  observed[1] out of n, null probability expected[1]/n
## X-squared = 1.7604, df = 1, p-value = 0.1846
## alternative hypothesis: true p is not equal to 0.6
## 95 percent confidence interval:
##  0.4280225 0.6296465
## sample estimates:
##    p 
## 0.53

We can also visualize the result as we can see bellow:

library(ggplot2)
library(dplyr)

lrr = -Inf
urr = qchisq(p = alpha, df = r - 1, lower.tail = FALSE)
p_value <- chisq.test.result$p.value
data.frame(chi2 = 100:1000 / 100) %>%
  mutate(density = dchisq(x = chi2, df = r - 1)) %>%
  mutate(rr = ifelse(chi2 < lrr | chi2 > urr, density, 0)) %>%
ggplot() +
  geom_line(aes(x = chi2, y = density)) +
  geom_area(aes(x = chi2, y = rr, fill = "red"), alpha = 0.3) +
#  geom_vline(aes(xintercept = pi_0), color = "black") +
  geom_vline(aes(xintercept = chisq), color = "red") +
  labs(title = bquote("Chi-Squared Goodness-of-Fit Test"),
       subtitle = bquote("Chisq ="~.(round(chisq,2))~", n ="~.(n)~", alpha ="~.(alpha)~", chisq_crit ="~.(round(urr,2))~", p-value ="~.(round(p_value,3))),
       x = "chisq",
       y = "Density") +
  theme(legend.position="none")

3.3 Example 3

Here, we apply Chi-square goodness-of-fit with unspecified probability. Let’s check out the frequency of the following random sample come from a Poisson(λ) distribution or not!

x<- c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
obs <- c(0, 1, 4, 13, 19, 16, 15, 9, 12, 7, 2, 1, 1)
knitr::kable(data.frame(x, obs), caption = "Sample Frequence Counts")

Sample Frequence Counts
x	obs
0	0
1	1
2	4
3	13
4	19
5	16
6	15
7	9
8	12
9	7
10	2
11	1
12	1

We have the observed values, but need to calculate the expected values. The expected value in a Poisson distribution is $E(r)=e^{−λ} \frac {λ^r} {r!}$. The parameter $λ$ (the expected value) is not specified, but we can estimate it with the maximum likelihood estimator, the sample mean weighted average,$ λ∼x=5.6$

(x_bar <- sum(x * obs) / sum(obs))

## [1] 5.59

den <- dpois(x = x, lambda = x_bar)

The counts for $X≤2$ and $X≥10$ are small $(<5)$, so collapse them to save degrees of freedom, $df=9−1$.

x2 <- c("0-2", 3, 4, 5, 6, 7, 8, 9, "10-12")
obs2 <- c(sum(obs[1:3]), obs[4:10], sum(obs[11:13]))
den2 <- c(sum(den[1:3]), den[4:10], sum(den[11:13]))
exp2 <- den2 * sum(obs2)
knitr::kable(data.frame(x2, obs2, den2, exp2), caption = "Sample Frequence Counts")

Sample Frequence Counts
x2	obs2	den2	exp2
0-2	5	0.0829701	8.297010
3	13	0.1087372	10.873717
4	19	0.1519602	15.196019
5	16	0.1698915	16.989150
6	15	0.1582822	15.828224
7	9	0.1263997	12.639968
8	12	0.0883218	8.832177
9	7	0.0548576	5.485764
10-12	4	0.0535084	5.350839

There are $R=9$ levels, so the degrees of freedom is $df=9−1−1=7$.

$H_0:∑^R_{r=1}O_r−E_r=0$. The test statistic is $χ^2=∑^k_{i=1} \frac {(O_r−E_r)^2} {E_r}=5.7157.$ The p-value is $P(χ^2_7>5.7157)=0.5432$, so do not reject $H_0$.

(chisq <- sum((obs2 - exp2)^2 / exp2))

## [1] 5.722502

(p_value <- pchisq(q = chisq, df = length(obs2) - 1 , lower.tail = FALSE))

## [1] 0.6782835

chisq.test(x = obs2, p = den2, rescale.p = TRUE)

## 
##  Chi-squared test for given probabilities
## 
## data:  obs2
## X-squared = 5.6909, df = 8, p-value = 0.6818

We can visualize as the the following:

library(ggplot2)
library(dplyr)

# Graph of Hypothesis Test
lrr = -Inf
urr = qchisq(p = alpha, df = (length(obs2) - 1) - 1, lower.tail = FALSE)
data.frame(chi2 = 0:2000 / 100) %>%
  mutate(density = dchisq(x = chi2, df = (length(obs2) - 1) - 1)) %>%
  mutate(rr = ifelse(chi2 < lrr | chi2 > urr, density, 0)) %>%
ggplot() +
  geom_line(aes(x = chi2, y = density)) +
  geom_area(aes(x = chi2, y = rr, fill = "red"), alpha = 0.3) +
#  geom_vline(aes(xintercept = pi_0), color = "black") +
  geom_vline(aes(xintercept = chisq), color = "red") +
  labs(title = bquote("One-Sample Chi-Squared Goodness-of-Fit Test"),
       subtitle = bquote("n ="~.(sum(obs2))~", r ="~.(length(obs2))~", alpha ="~.(alpha)~", Chisq_crit ="~.(round(urr,2))~", Chisq ="~.(round(chisq,2))~", p-value ="~.(round(p_value,4))),
       x = "chisq",
       y = "Density") +
  theme(legend.position="none")

4 Independence $χ^2 $ test

The Chi-square test for Independence evaluates whether the frequency counts of the $R$ observations of a categorical variable differs among the $C$ levels of another categorical variable. If not, the two variables are independent. The expected values at each $(r,j)$ are the joint densities $E_{i,j}= .

4.1 Example 4

Here, we apply Chi-Square test of independence. A project studied whether age is related to desire to ride a bicycle. The categorical variable age group has 4 levels: 18-24, 25-34, 35-49, and 50-64. The levels of categorical variable desire are yes and no.

age <- NULL
age[c(1:60, 202:241)] <- 1
age[c(61:114, 242:285)] <- 2
age[c(115:160, 286:338)] <- 3
age[c(161:201, 339:395)] <- 4
bike <- NULL
bike[1:201] <- 1
bike[202:395] <- 2
data <- data.frame(age, bike)
data$age <- factor(data$age, levels = c(1, 2, 3, 4), 
                  labels = c("18-24", "25-34", "35-49", "50-64"))
data$bike <- factor(data$bike, levels = c(1, 2), 
                  labels = c("yes", "no"))
knitr::kable(data, caption = "Independence Sample")

Independence Sample
age	bike
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
18-24	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
25-34	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
35-49	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
50-64	yes
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
18-24	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
25-34	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
35-49	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no
50-64	no

This is a test for independence because we have a single population from which we are comparing the levels of a factor variable .

df <- (4-1)*(2-1)
alpha <- 0.05
(test <- chisq.test(data$age, data$bike))

## 
##  Pearson's Chi-squared test
## 
## data:  data$age and data$bike
## X-squared = 8.0061, df = 3, p-value = 0.04589

Let’s visualize the graph of hypothesis test:

library(tidyverse)

## Warning: package 'tidyverse' was built under R version 3.6.3

## -- Attaching packages -----------------

## v tibble  3.0.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.0
## v purrr   0.3.4

## Warning: package 'tibble' was built under R version 3.6.3

## Warning: package 'readr' was built under R version 3.6.3

## Warning: package 'purrr' was built under R version 3.6.3

## Warning: package 'stringr' was built under R version 3.6.3

## Warning: package 'forcats' was built under R version 3.6.3

## -- Conflicts --------------------------
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

lrr = -Inf
urr = qchisq(p = alpha, df = df, lower.tail = FALSE)
data.frame(xi = 0:2000 / 100) %>%
  mutate(density = dchisq(x = xi, df = df)) %>%
  mutate(rr = ifelse(xi < lrr | xi > urr, density, 0)) %>%
ggplot() +
  geom_line(aes(x = xi, y = density), color = "black") +
  geom_area(aes(x = xi, y = rr), fill = "red", alpha = 0.3) +
  geom_vline(aes(xintercept = test$statistic), color = "blue") +
  labs(title = bquote("Chi-Square Test for Homogeneity"),
       subtitle = bquote("P-value ="~.(test$p.value)),
       x = "xi^2",
       y = "Density") +
  theme(legend.position="none")

Two random variables $x_1$ and $x_2$ are called independent if the probability distribution of one variable is not affected by the presence of another.

Assume $f_{ij}$ is the observed frequency count of events belonging to both ith category of $x_1$ and $j^{th}$ category of $x_2$. Also assume $e_{ij}$ to be the corresponding expected count if $x_1$ and $x_2$ are independent. The null hypothesis of the independence assumption is to be rejected if the p-value of the following Chi-squared test statistics is less than a given significance level $α$.

\[ χ^2=∑_{i,j} \frac {(fi,j−ei,j)^2} {e_{i,j}}\]

4.2 Example 5

In the built-in data set survey, the Smoke column records the students smoking habit, while the Exer column records their exercise level. The allowed values in Smoke are “Heavy”, “Regul” (regularly), “Occas” (occasionally) and “Never”. As for Exer, they are “Freq” (frequently), “Some” and “None”.

library(MASS)                                     # load the MASS package (data survey)

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:dplyr':
## 
##     select

library(DT)                                       # load the function datatable()

## Warning: package 'DT' was built under R version 3.6.3

datatable(survey)                                 # to display R data via the DataTables

We apply Chi-squared test:

tbl = table(survey$Smoke, survey$Exer);tbl        # the contingency table

##        
##         Freq None Some
##   Heavy    7    1    3
##   Never   87   18   84
##   Occas   12    3    4
##   Regul    9    1    7

chisq.test(tbl,simulate.p.value = TRUE)           # perform the Chi-Squared test

## 
##  Pearson's Chi-squared test with simulated p-value (based on 2000
##  replicates)
## 
## data:  tbl
## X-squared = 5.4885, df = NA, p-value = 0.4928

As the p-value 0.4828 is greater than the .05 significance level, we do not reject the null hypothesis that the smoking habit is independent of the exercise level of the students.

5 Homogeneity $χ^2$ test

The Chi-square test for Homogeneity evaluates whether frequency counts of the R levels of a categorical variable are distributed identically across C different populations.

5.1 Example 6

Here, we apply of Chi-square test of Homogeneity. A project studied whether attending physicians order more unnecessary blood transfusions than residents. The categorical variable frequency of orders has 4 levels: frequently, occasionally, rarely, and never.

library(dplyr)
library(ggplot2)
library(stats)

pop <- NULL
pop[1:49] <- 1
pop[50:120] <- 2
lev <- NULL
lev[c(1:2, 50:64)] <- 1
lev[c(3:5, 65:92)] <- 2
lev[c(6:36, 93:115)] <- 3
lev[c(37:49, 116:120)] <- 4
dat <- data.frame(pop, lev)
dat$pop <- factor(dat$pop, levels = c(1, 2), 
                  labels = c("attending", "resident"))
dat$lev <- factor(dat$lev, levels = c(1, 2, 3, 4), 
                  labels = c("frequently", "occasionally", "rarely", "never"))

df <- (2-1)*(4-1)
alpha <- 0.05

(test <- chisq.test(dat$lev, dat$pop))

## 
##  Pearson's Chi-squared test
## 
## data:  dat$lev and dat$pop
## X-squared = 31.881, df = 3, p-value = 5.543e-07

Let’s visualize the graph of hypothesis test:

lrr = -Inf
urr = qchisq(p = alpha, df = df, lower.tail = FALSE)
data.frame(xi = 0:400 / 10) %>%
  mutate(density = dchisq(x = xi, df = df)) %>%
  mutate(rr = ifelse(xi < lrr | xi > urr, density, 0)) %>%
ggplot() +
  geom_line(aes(x = xi, y = density), color = "black") +
  geom_area(aes(x = xi, y = rr), fill = "red", alpha = 0.3) +
  geom_vline(aes(xintercept = test$statistic), color = "blue") +
  labs(title = bquote("Chi-Square Test for Homogeneity"),
       subtitle = bquote("P-value ="~.(test$p.value)),
       x = "xi^2",
       y = "Density") +
  theme(legend.position="none")

6 Multinomial Goodness of Fit

A population is called multinomial if the data is categorical and belongs to a collection of discrete non-overlapping classes. The null hypothesis for goodness of fit test for multinomial distribution is that the observed frequency fi is equal to an expected count $e_i$ in each category. It is to be rejected if the p-value of the following Chi-squared test statistics is less than a given significance level $α$.

\[ χ^2=∑_i \frac {(fi−ei)^2} {e_i} \]

6.1 Example 7

In the built-in data set survey, the Smoke column records the survey response about the student’s smoking habit. As there are exactly four proper response in the survey: “Heavy”, “Regul” (regularly), “Occas” (occasionally) and “Never”, the Smoke data is multinomial. Suppose the campus smoking statistics is as below. Determine whether the sample data in survey supports it at .05 significance level.

library(MASS)                                         # load the MASS package 
#levels(survey$Smoke)                                 # print levels of variable `survey$Smoke`
smoke.freq=table(survey$Smoke)                        # frequency of qualitative data 
smoke.prob=table(survey$Smoke)/length(survey$Smoke)   # probability of qualitative data 
chisq.test(smoke.freq, smoke.prob)                    # perform the Chi-Squared test

## Warning in chisq.test(smoke.freq, smoke.prob): Chi-squared approximation may be
## incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  smoke.freq and smoke.prob
## X-squared = 12, df = 9, p-value = 0.2133

As the p-value 0.991 is greater than the .05 significance level, we do not reject the null hypothesis that the sample data in survey supports the campus-wide smoking statistics.

Notes: The issue is that the chi-square approximation to the distribution of the test statistic relies on the counts being roughly normally distributed. If many of the expected counts are very small, the approximation may be poor. We can get rid off the warning letting R know that approximate values are fine by adding simulate.p.value = TRUE in your command line.

7 Exercise

7.1 Exercice 1

Please work out in R by doing a chi-squared test on the treatment (X) and improvement (Y) columns in treatment.csv.

Solution

treatment <-read.csv("C:/Users/user/Downloads/dataset-master/treatment.csv")

treatment

##      id   treatment  improvement
## 1     1     treated     improved
## 2     2     treated     improved
## 3     3 not-treated     improved
## 4     4     treated     improved
## 5     5     treated not-improved
## 6     6     treated not-improved
## 7     7 not-treated not-improved
## 8     8     treated not-improved
## 9     9 not-treated     improved
## 10   10     treated     improved
## 11   11 not-treated     improved
## 12   12 not-treated not-improved
## 13   13 not-treated not-improved
## 14   14 not-treated not-improved
## 15   15 not-treated     improved
## 16   16 not-treated     improved
## 17   17     treated     improved
## 18   18     treated     improved
## 19   19 not-treated not-improved
## 20   20 not-treated not-improved
## 21   21     treated not-improved
## 22   22 not-treated not-improved
## 23   23     treated not-improved
## 24   24 not-treated     improved
## 25   25     treated     improved
## 26   26     treated     improved
## 27   27 not-treated not-improved
## 28   28 not-treated     improved
## 29   29     treated not-improved
## 30   30     treated     improved
## 31   31 not-treated not-improved
## 32   32 not-treated not-improved
## 33   33     treated     improved
## 34   34 not-treated     improved
## 35   35     treated not-improved
## 36   36 not-treated     improved
## 37   37     treated     improved
## 38   38 not-treated not-improved
## 39   39 not-treated     improved
## 40   40     treated     improved
## 41   41 not-treated     improved
## 42   42 not-treated     improved
## 43   43 not-treated not-improved
## 44   44 not-treated     improved
## 45   45 not-treated     improved
## 46   46     treated     improved
## 47   47     treated not-improved
## 48   48 not-treated not-improved
## 49   49     treated     improved
## 50   50     treated     improved
## 51   51 not-treated not-improved
## 52   52     treated     improved
## 53   53 not-treated     improved
## 54   54     treated     improved
## 55   55     treated     improved
## 56   56 not-treated     improved
## 57   57     treated     improved
## 58   58 not-treated not-improved
## 59   59     treated     improved
## 60   60     treated     improved
## 61   61     treated     improved
## 62   62 not-treated     improved
## 63   63     treated not-improved
## 64   64     treated not-improved
## 65   65 not-treated     improved
## 66   66 not-treated     improved
## 67   67 not-treated     improved
## 68   68 not-treated not-improved
## 69   69 not-treated not-improved
## 70   70     treated     improved
## 71   71     treated not-improved
## 72   72 not-treated not-improved
## 73   73     treated not-improved
## 74   74 not-treated     improved
## 75   75 not-treated not-improved
## 76   76 not-treated not-improved
## 77   77     treated not-improved
## 78   78 not-treated     improved
## 79   79     treated     improved
## 80   80     treated     improved
## 81   81     treated     improved
## 82   82 not-treated not-improved
## 83   83     treated     improved
## 84   84 not-treated not-improved
## 85   85     treated     improved
## 86   86 not-treated     improved
## 87   87 not-treated not-improved
## 88   88     treated     improved
## 89   89 not-treated not-improved
## 90   90     treated     improved
## 91   91 not-treated not-improved
## 92   92 not-treated     improved
## 93   93     treated not-improved
## 94   94     treated not-improved
## 95   95 not-treated not-improved
## 96   96     treated     improved
## 97   97 not-treated     improved
## 98   98     treated     improved
## 99   99 not-treated not-improved
## 100 100 not-treated     improved
## 101 101     treated     improved
## 102 102     treated     improved
## 103 103 not-treated not-improved
## 104 104     treated     improved
## 105 105 not-treated not-improved

$H_0$ : The treatment variable and improvement variable are not related
$H_a$ : The treatment variable and improvement variable are related

table(treatment$treatment, treatment$improvement)

##              
##               improved not-improved
##   not-treated       26           29
##   treated           35           15

# Chi Square Test
chisq.test(treatment$treatment, treatment$improvement, correct =FALSE)

## 
##  Pearson's Chi-squared test
## 
## data:  treatment$treatment and treatment$improvement
## X-squared = 5.5569, df = 1, p-value = 0.01841

The Chi Square value is 5.5569. The p-value is 0.01841, which is less than the significance level of 0.05.Then, we reject the null hypothesis. So, we can conclude that the treatment variable and the improvement variable are related.

7.2 Exercice 2

Find out if the cyl and carb variables in mtcars dataset are dependent or not.

$H_0$ : The cyl variable and carb variable are independent
$H_a$ : The cyl variable and carb variable are related.

Solution:

#Chi Square Test
chisq.test(mtcars$carb, mtcars$cyl)

## Warning in chisq.test(mtcars$carb, mtcars$cyl): Chi-squared approximation may be
## incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  mtcars$carb and mtcars$cyl
## X-squared = 24.389, df = 10, p-value = 0.006632

The Chi Square valuue is 24.389. The p-value is 0.006632 which is less than 0.05 significance level. So, we reject the null hypothesis. In the end, we can conclude that carb variable and cyl variable are related or dependent.

7.3 Exercise 3

256 visual artists were surveyed to find out their zodiac sign. The results were: Aries (29), Taurus (24), Gemini (22), Cancer (19), Leo (21), Virgo (18), Libra (19), Scorpio (20), Sagittarius (23), Capricorn (18), Aquarius (20), Pisces (23). Test the hypothesis that zodiac signs are evenly distributed across visual artists. (Reference)