This report presents the Chi-Square calculation in R across different contexts, including Pearson’s Chi-Square test, Chi-Square with Yates’ correction, Fisher’s Exact test, and McNemar’s test. Using various contingency tables, we assess the association between categorical variables, evaluating statistical significance based on test statistics, p-values, and critical values. Each method is applied in relevant scenarios to determine dependencies between variables.

Download the whole code here: Chi-Square Calculation Report R code



Exercise 1: Chi_square Test (Goodness of fit)

A manufacturer claims that the variance of his items should not differ from \(\sigma^2 = 1\). We test the hypothesis:

  • \(H_0: \sigma^2 = 1\)
  • \(H_1: \sigma^2 \neq 1\)

A sample of \(n = 25\) items has a sample variance of \(s^2 = 1.2\). We use a Chi-Square test:

Given Data

n <- 25  # Sample size
s2 <- 1.2  # Sample variance
sigma_Claimed <- 1  # Claimed variance

Variance Test

test_stat <- (n - 1) * s2 / sigma_Claimed  # Chi-Square test statistic
df <- n - 1  # Degrees of freedom
alpha <- 0.05  # Significance level

Critical value and P-value

# Critical values
critical_low <- qchisq(alpha / 2, df)
critical_high <- qchisq(1 - alpha / 2, df)

# P-value
p_value <- 2 * min(pchisq(test_stat, df), 1 - pchisq(test_stat, df))  # Two-tailed p-value

Extract and displays relevant values from chi square test, critical value and P-value

# Display results
result_table <- data.frame(
  Statistic = round(test_stat, 3),
  `Degrees of Freedom` = df,
  `P-Value` = round(p_value, 4),
  `Critical Value (Lower)` = round(critical_low, 3),
  `Critical Value (Upper)` = round(critical_high, 3)
)

kable(result_table, caption = "Variance Test Results", format = "html") %>%
  kable_styling(full_width = FALSE)
Variance Test Results
Statistic Degrees.of.Freedom P.Value Critical.Value..Lower. Critical.Value..Upper.
28.8 24 0.4555 12.401 39.364

Decision: The test statistic 28.8 falls within the critical interval [12.401, 39.364], so we fail to reject \(H_0\).

Conclusion: There is not enough evidence to conclude that the variance significantly differs from 1 cm².



Exercise 2: Chi-Square Test for Independence(Pearson)

A researcher studies the impact of falls on lifestyle changes among 233 polio survivors using the following contingency table:

Yes (Change) No (No Change)
Faller 131 52
Non-Faller 14 36

We test the hypothesis:

  • \(H_0\): Lifestyle change is independent of falling
  • \(H_1\): Lifestyle change is associated with falling

Reproduce the 2 by 2 table in R

observed <- matrix(c(131, 52, 14, 36), nrow = 2, byrow = TRUE)
colnames(observed) <- c("Yes", "No")
rownames(observed) <- c("Faller", "Non-Faller")
observed
##            Yes No
## Faller     131 52
## Non-Faller  14 36

Chi square test and Critical value

# Chi-Square Test
chi_test <- chisq.test(observed, correct = FALSE)

# Critical value
critical_value <- qchisq(1 - alpha, chi_test$parameter)

# Display results
chi_results <- data.frame(
  Statistic = round(chi_test$statistic, 3),
  `Degrees of Freedom` = chi_test$parameter,
  `P-Value` = round(chi_test$p.value, 4),
  `Critical Value` = round(critical_value, 3)
)

kable(chi_results, caption = "Chi-Square Test Results", format = "html") %>%
  kable_styling(full_width = FALSE)
Chi-Square Test Results
Statistic Degrees.of.Freedom P.Value Critical.Value
X-squared 31.739 1 0 3.841

Decision: The test statistic 31.739 is greater than the critical value 3.841, so we reject \(H_0\).

Conclusion: There is sufficient evidence to conclude that lifestyle change is associated with falling.



Exercise 3: Chi-Square Test for Independence(McNemar’s Test)

A study examines whether a drug has an effect on a disease. The diagnosis results before and after treatment are summarized in the following contingency table:

After (+) After (-)
Before (+) 7 13
Before (-) 1 8

We test the hypothesis:

  • \(H_0\): The drug has no effect on the disease.
  • \(H_1\): The drug has an effect on the disease.

Reproduce the 2 by 2 table in R

observed <- matrix(c(7, 13, 1, 8), nrow = 2, byrow = TRUE)
colnames(observed) <- c("After (+)", "After (-)")
rownames(observed) <- c("Before (+)", "Before (-)")
observed
##            After (+) After (-)
## Before (+)         7        13
## Before (-)         1         8

McNemar’s Test and Critical Value

# Significance level
alpha <- 0.05

# Perform McNemar's Test
mcnemar_test <- mcnemar.test(observed)

# Compute critical value (Chi-square with 1 df at alpha = 0.05)
critical_value <- qchisq(1 - alpha, df = 1)

# Display results

mcnemar_results <- data.frame(
  Statistic = round(mcnemar_test$statistic, 3),
  `Degrees of Freedom` = mcnemar_test$parameter,
  `P-Value` = round(mcnemar_test$p.value, 4),
  `Critical Value` = round(critical_value, 3)
)

kable(mcnemar_results, caption = "McNemar's Test Results", format = "html") %>%
  kable_styling(full_width = FALSE)
McNemar’s Test Results
Statistic Degrees.of.Freedom P.Value Critical.Value
McNemar’s chi-squared 8.643 1 0.0033 3.841

Decision: The test statistic 8.643 is greater than the critical value 3.841, so we reject \(H_0\).

Conclusion: There is sufficient evidence to conclude that The drug has an effect on the disease.



Exercise 4: Chi-Square Test for Independence(Pearson, n by n table)

A survey was conducted among 40 voters to determine if gender is associated with political party preference. The observed data is as follows:

Gender P1 P2 P3 Total
M 8 9 4 21
F 11 3 5 19
Total 19 12 9 40

We test the hypothesis:

  • \(H_0\): Gender and political party preference are independent.
  • \(H_1\): There is an association between gender and political party preference.

Reproducing the Contingency Table in R

# Creating the observed data matrix
observed <- matrix(c(8, 9, 4, 
                     11, 3, 5), 
                   nrow = 2, byrow = TRUE)

# Naming rows and columns
colnames(observed) <- c("P1", "P2", "P3")
rownames(observed) <- c("M", "F")

# Display the table
observed
##   P1 P2 P3
## M  8  9  4
## F 11  3  5

Perform Chi-Square Test

chi_test <- chisq.test(observed, correct = FALSE)

# Critical value for alpha = 0.05
alpha <- 0.05
critical_value <- qchisq(1 - alpha, df = chi_test$parameter)

# Results table
chi_results <- data.frame(
  Statistic = round(chi_test$statistic, 3),
  `Degrees of Freedom` = chi_test$parameter,
  `P-Value` = round(chi_test$p.value, 4),
  `Critical Value` = round(critical_value, 3)
)

# Display results
kable(chi_results, caption = "Chi-Square Test Results", format = "html") %>%
  kable_styling(full_width = FALSE)
Chi-Square Test Results
Statistic Degrees.of.Freedom P.Value Critical.Value
X-squared 3.494 2 0.1743 5.991

Decision: The test statistic 3.494 is LESS than the critical value 3.841, so we fail to reject \(H_0\).

Conclusion: There is no association between Gender and political party preference.



Exercise 5: Fisher’s Exact Test for Independence

A clinical trial examines whether a new drug is effective in healing a rare disease. The results are as follows:

Treatment Recovered Not Recovered Total
Drug 3 1 4
Placebo 2 3 5
Total 5 4 9

We test the hypothesis:

  • \(H_0\): The drug has no effect (independent of recovery).
  • \(H_1\): The drug is associated with recovery.

Creating the Contingency Table in R

# Creating the matrix for Fisher's Exact Test
data_matrix <- matrix(c(3, 1, 2, 3), nrow = 2, byrow = TRUE)

# Naming rows and columns
rownames(data_matrix) <- c("Drug", "Placebo")
colnames(data_matrix) <- c("Recovered", "Not Recovered")

# Display the table
data_matrix
##         Recovered Not Recovered
## Drug            3             1
## Placebo         2             3

Function to compute Fisher’s Exact Test statistic manually using factorial formula

# Function to compute Fisher's Exact Test statistic manually using factorial formula
fisher_statistic_manual <- function(a, b, c, d) {
  n <- a + b + c + d
  prob <- factorial(a + c) * factorial(b + d) * factorial(a + b) * factorial(c + d) /
          (factorial(n) * factorial(a) * factorial(b) * factorial(c) * factorial(d))
  return(prob)
}

# Extract observed values
a <- data_matrix[1,1]  # Drug, Recovered
b <- data_matrix[1,2]  # Drug, Not Recovered
c <- data_matrix[2,1]  # Placebo, Recovered
d <- data_matrix[2,2]  # Placebo, Not Recovered

Performing Test

# Perform Chi-Square Test
chi_test <- chisq.test(data_matrix, correct = FALSE)

# Perform Fisher’s Exact Test
fisher_res <- fisher.test(data_matrix)

# Compute Fisher’s statistic manually
fisher_stat <- fisher_statistic_manual(a, b, c, d)

# Critical value for Chi-Square at α = 0.05
alpha <- 0.05
critical_value <- qchisq(1 - alpha, df = chi_test$parameter)

# Create results table
results <- data.frame(
  Test = c("Chi-Square", "Fisher Exact"),
  Statistic = c(round(chi_test$statistic, 3), round(fisher_stat, 6)),  # Fisher’s manual statistic
  `Degrees of Freedom` = c(chi_test$parameter, NA),  # No df for Fisher
  `P-Value` = c(round(chi_test$p.value, 4), round(fisher_res$p.value, 4)),
  `Critical Value` = c(round(critical_value, 3), round(critical_value, 3)),
  `Odds Ratio` = c(NA, round(fisher_res$estimate, 3))  # Only for Fisher
)

# Display results as an HTML table
kable(results, caption = "Comparison of Chi-Square and Fisher's Exact Test", format = "html") %>%
  kable_styling(full_width = FALSE)
Comparison of Chi-Square and Fisher’s Exact Test
Test Statistic Degrees.of.Freedom P.Value Critical.Value Odds.Ratio
X-squared Chi-Square 1.10200 1 0.2937 3.841 NA
Fisher Exact 0.31746 NA 0.5238 3.841 3.764

Decision: P_Value (0.5238) > 0.05 and also the test statistic 0.3176 is less than the critical value 3.841, so we fail to reject \(H_0\).

Conclusion: The drug is not effective in healing the rare disease since it has no effect on recovery(independent of recovery).



Exercise 6: Chi-Square Test for Independence (Yates Correction)

A survey was conducted among 31 voters to determine if gender is associated with political party preference. The results are summarized in the following contingency table:

P1 P2 Total
M 8 9 17
F 11 3 14
Total 19 12 31

We test the hypothesis:

  • \(H_0\): Gender is independent of political party preference.
  • \(H_1\): Gender is associated with political party preference.

Reproduce the 2 by 2 table in R

# Create the observed contingency table
observed <- matrix(c(8, 9, 11, 3), nrow = 2, byrow = TRUE)

# Set row and column names
colnames(observed) <- c("P1", "P2")
rownames(observed) <- c("Male", "Female")

# Display the table
observed
##        P1 P2
## Male    8  9
## Female 11  3

Chi-Square Test with Yates’ Correction and Critical Value

# Define significance level
alpha <- 0.05  

# Perform Chi-Square Test with Yates' correction
chi_test <- chisq.test(observed, correct = TRUE)  

# Critical value for df = 1 at 5% significance level
critical_value <- qchisq(1 - alpha, df = 1)

# Display results
chi_results <- data.frame(
  Statistic = round(chi_test$statistic, 3),
  `Degrees of Freedom` = chi_test$parameter,
  `P-Value` = round(chi_test$p.value, 4),
  `Critical Value` = round(critical_value, 3)
)

# Print results in a formatted table

kable(chi_results, caption = "Chi-Square Test Results with Yates' Correction", format = "html") %>%
  kable_styling(full_width = FALSE)
Chi-Square Test Results with Yates’ Correction
Statistic Degrees.of.Freedom P.Value Critical.Value
X-squared 2.022 1 0.155 3.841

Decision: The test statistic 2.022 is less than the critical value 3.841, so we fail to reject \(H_0\).

Conclusion: There is no association between Gender and political party preference.