This report presents the Chi-Square calculation in R across different contexts, including Pearson’s Chi-Square test, Chi-Square with Yates’ correction, Fisher’s Exact test, and McNemar’s test. Using various contingency tables, we assess the association between categorical variables, evaluating statistical significance based on test statistics, p-values, and critical values. Each method is applied in relevant scenarios to determine dependencies between variables.
Download the whole code here: Chi-Square Calculation Report R code
A manufacturer claims that the variance of his items should not differ from \(\sigma^2 = 1\). We test the hypothesis:
A sample of \(n = 25\) items has a sample variance of \(s^2 = 1.2\). We use a Chi-Square test:
Given Data
n <- 25 # Sample size
s2 <- 1.2 # Sample variance
sigma_Claimed <- 1 # Claimed variance
Variance Test
test_stat <- (n - 1) * s2 / sigma_Claimed # Chi-Square test statistic
df <- n - 1 # Degrees of freedom
alpha <- 0.05 # Significance level
Critical value and P-value
# Critical values
critical_low <- qchisq(alpha / 2, df)
critical_high <- qchisq(1 - alpha / 2, df)
# P-value
p_value <- 2 * min(pchisq(test_stat, df), 1 - pchisq(test_stat, df)) # Two-tailed p-value
Extract and displays relevant values from chi square test, critical value and P-value
# Display results
result_table <- data.frame(
Statistic = round(test_stat, 3),
`Degrees of Freedom` = df,
`P-Value` = round(p_value, 4),
`Critical Value (Lower)` = round(critical_low, 3),
`Critical Value (Upper)` = round(critical_high, 3)
)
kable(result_table, caption = "Variance Test Results", format = "html") %>%
kable_styling(full_width = FALSE)
| Statistic | Degrees.of.Freedom | P.Value | Critical.Value..Lower. | Critical.Value..Upper. |
|---|---|---|---|---|
| 28.8 | 24 | 0.4555 | 12.401 | 39.364 |
Decision: The test statistic 28.8 falls within the critical interval [12.401, 39.364], so we fail to reject \(H_0\).
Conclusion: There is not enough evidence to conclude that the variance significantly differs from 1 cm².
A researcher studies the impact of falls on lifestyle changes among 233 polio survivors using the following contingency table:
| Yes (Change) | No (No Change) | |
|---|---|---|
| Faller | 131 | 52 |
| Non-Faller | 14 | 36 |
We test the hypothesis:
Reproduce the 2 by 2 table in R
observed <- matrix(c(131, 52, 14, 36), nrow = 2, byrow = TRUE)
colnames(observed) <- c("Yes", "No")
rownames(observed) <- c("Faller", "Non-Faller")
observed
## Yes No
## Faller 131 52
## Non-Faller 14 36
Chi square test and Critical value
# Chi-Square Test
chi_test <- chisq.test(observed, correct = FALSE)
# Critical value
critical_value <- qchisq(1 - alpha, chi_test$parameter)
# Display results
chi_results <- data.frame(
Statistic = round(chi_test$statistic, 3),
`Degrees of Freedom` = chi_test$parameter,
`P-Value` = round(chi_test$p.value, 4),
`Critical Value` = round(critical_value, 3)
)
kable(chi_results, caption = "Chi-Square Test Results", format = "html") %>%
kable_styling(full_width = FALSE)
| Statistic | Degrees.of.Freedom | P.Value | Critical.Value | |
|---|---|---|---|---|
| X-squared | 31.739 | 1 | 0 | 3.841 |
Decision: The test statistic 31.739 is greater than the critical value 3.841, so we reject \(H_0\).
Conclusion: There is sufficient evidence to conclude that lifestyle change is associated with falling.
A study examines whether a drug has an effect on a disease. The diagnosis results before and after treatment are summarized in the following contingency table:
| After (+) | After (-) | |
|---|---|---|
| Before (+) | 7 | 13 |
| Before (-) | 1 | 8 |
We test the hypothesis:
Reproduce the 2 by 2 table in R
observed <- matrix(c(7, 13, 1, 8), nrow = 2, byrow = TRUE)
colnames(observed) <- c("After (+)", "After (-)")
rownames(observed) <- c("Before (+)", "Before (-)")
observed
## After (+) After (-)
## Before (+) 7 13
## Before (-) 1 8
McNemar’s Test and Critical Value
# Significance level
alpha <- 0.05
# Perform McNemar's Test
mcnemar_test <- mcnemar.test(observed)
# Compute critical value (Chi-square with 1 df at alpha = 0.05)
critical_value <- qchisq(1 - alpha, df = 1)
# Display results
mcnemar_results <- data.frame(
Statistic = round(mcnemar_test$statistic, 3),
`Degrees of Freedom` = mcnemar_test$parameter,
`P-Value` = round(mcnemar_test$p.value, 4),
`Critical Value` = round(critical_value, 3)
)
kable(mcnemar_results, caption = "McNemar's Test Results", format = "html") %>%
kable_styling(full_width = FALSE)
| Statistic | Degrees.of.Freedom | P.Value | Critical.Value | |
|---|---|---|---|---|
| McNemar’s chi-squared | 8.643 | 1 | 0.0033 | 3.841 |
Decision: The test statistic 8.643 is greater than the critical value 3.841, so we reject \(H_0\).
Conclusion: There is sufficient evidence to conclude that The drug has an effect on the disease.
A survey was conducted among 40 voters to determine if gender is associated with political party preference. The observed data is as follows:
| Gender | P1 | P2 | P3 | Total |
|---|---|---|---|---|
| M | 8 | 9 | 4 | 21 |
| F | 11 | 3 | 5 | 19 |
| Total | 19 | 12 | 9 | 40 |
We test the hypothesis:
Reproducing the Contingency Table in R
# Creating the observed data matrix
observed <- matrix(c(8, 9, 4,
11, 3, 5),
nrow = 2, byrow = TRUE)
# Naming rows and columns
colnames(observed) <- c("P1", "P2", "P3")
rownames(observed) <- c("M", "F")
# Display the table
observed
## P1 P2 P3
## M 8 9 4
## F 11 3 5
Perform Chi-Square Test
chi_test <- chisq.test(observed, correct = FALSE)
# Critical value for alpha = 0.05
alpha <- 0.05
critical_value <- qchisq(1 - alpha, df = chi_test$parameter)
# Results table
chi_results <- data.frame(
Statistic = round(chi_test$statistic, 3),
`Degrees of Freedom` = chi_test$parameter,
`P-Value` = round(chi_test$p.value, 4),
`Critical Value` = round(critical_value, 3)
)
# Display results
kable(chi_results, caption = "Chi-Square Test Results", format = "html") %>%
kable_styling(full_width = FALSE)
| Statistic | Degrees.of.Freedom | P.Value | Critical.Value | |
|---|---|---|---|---|
| X-squared | 3.494 | 2 | 0.1743 | 5.991 |
Decision: The test statistic 3.494 is LESS than the critical value 3.841, so we fail to reject \(H_0\).
Conclusion: There is no association between Gender and political party preference.
A clinical trial examines whether a new drug is effective in healing a rare disease. The results are as follows:
| Treatment | Recovered | Not Recovered | Total |
|---|---|---|---|
| Drug | 3 | 1 | 4 |
| Placebo | 2 | 3 | 5 |
| Total | 5 | 4 | 9 |
We test the hypothesis:
Creating the Contingency Table in R
# Creating the matrix for Fisher's Exact Test
data_matrix <- matrix(c(3, 1, 2, 3), nrow = 2, byrow = TRUE)
# Naming rows and columns
rownames(data_matrix) <- c("Drug", "Placebo")
colnames(data_matrix) <- c("Recovered", "Not Recovered")
# Display the table
data_matrix
## Recovered Not Recovered
## Drug 3 1
## Placebo 2 3
Function to compute Fisher’s Exact Test statistic manually using factorial formula
# Function to compute Fisher's Exact Test statistic manually using factorial formula
fisher_statistic_manual <- function(a, b, c, d) {
n <- a + b + c + d
prob <- factorial(a + c) * factorial(b + d) * factorial(a + b) * factorial(c + d) /
(factorial(n) * factorial(a) * factorial(b) * factorial(c) * factorial(d))
return(prob)
}
# Extract observed values
a <- data_matrix[1,1] # Drug, Recovered
b <- data_matrix[1,2] # Drug, Not Recovered
c <- data_matrix[2,1] # Placebo, Recovered
d <- data_matrix[2,2] # Placebo, Not Recovered
Performing Test
# Perform Chi-Square Test
chi_test <- chisq.test(data_matrix, correct = FALSE)
# Perform Fisher’s Exact Test
fisher_res <- fisher.test(data_matrix)
# Compute Fisher’s statistic manually
fisher_stat <- fisher_statistic_manual(a, b, c, d)
# Critical value for Chi-Square at α = 0.05
alpha <- 0.05
critical_value <- qchisq(1 - alpha, df = chi_test$parameter)
# Create results table
results <- data.frame(
Test = c("Chi-Square", "Fisher Exact"),
Statistic = c(round(chi_test$statistic, 3), round(fisher_stat, 6)), # Fisher’s manual statistic
`Degrees of Freedom` = c(chi_test$parameter, NA), # No df for Fisher
`P-Value` = c(round(chi_test$p.value, 4), round(fisher_res$p.value, 4)),
`Critical Value` = c(round(critical_value, 3), round(critical_value, 3)),
`Odds Ratio` = c(NA, round(fisher_res$estimate, 3)) # Only for Fisher
)
# Display results as an HTML table
kable(results, caption = "Comparison of Chi-Square and Fisher's Exact Test", format = "html") %>%
kable_styling(full_width = FALSE)
| Test | Statistic | Degrees.of.Freedom | P.Value | Critical.Value | Odds.Ratio | |
|---|---|---|---|---|---|---|
| X-squared | Chi-Square | 1.10200 | 1 | 0.2937 | 3.841 | NA |
| Fisher Exact | 0.31746 | NA | 0.5238 | 3.841 | 3.764 |
Decision: P_Value (0.5238) > 0.05 and also the test statistic 0.3176 is less than the critical value 3.841, so we fail to reject \(H_0\).
Conclusion: The drug is not effective in healing the rare disease since it has no effect on recovery(independent of recovery).
A survey was conducted among 31 voters to determine if gender is associated with political party preference. The results are summarized in the following contingency table:
| P1 | P2 | Total | |
|---|---|---|---|
| M | 8 | 9 | 17 |
| F | 11 | 3 | 14 |
| Total | 19 | 12 | 31 |
We test the hypothesis:
Reproduce the 2 by 2 table in R
# Create the observed contingency table
observed <- matrix(c(8, 9, 11, 3), nrow = 2, byrow = TRUE)
# Set row and column names
colnames(observed) <- c("P1", "P2")
rownames(observed) <- c("Male", "Female")
# Display the table
observed
## P1 P2
## Male 8 9
## Female 11 3
Chi-Square Test with Yates’ Correction and Critical Value
# Define significance level
alpha <- 0.05
# Perform Chi-Square Test with Yates' correction
chi_test <- chisq.test(observed, correct = TRUE)
# Critical value for df = 1 at 5% significance level
critical_value <- qchisq(1 - alpha, df = 1)
# Display results
chi_results <- data.frame(
Statistic = round(chi_test$statistic, 3),
`Degrees of Freedom` = chi_test$parameter,
`P-Value` = round(chi_test$p.value, 4),
`Critical Value` = round(critical_value, 3)
)
# Print results in a formatted table
kable(chi_results, caption = "Chi-Square Test Results with Yates' Correction", format = "html") %>%
kable_styling(full_width = FALSE)
| Statistic | Degrees.of.Freedom | P.Value | Critical.Value | |
|---|---|---|---|---|
| X-squared | 2.022 | 1 | 0.155 | 3.841 |
Decision: The test statistic 2.022 is less than the critical value 3.841, so we fail to reject \(H_0\).
Conclusion: There is no association between Gender and political party preference.