Chi-Squared Test and Probability – A Step-by-Step Guide

What is a Chi-Squared Test?

A chi-squared (\(\chi^2\)) test is a statistical method used to analyze tabular data with frequencies or counts. It helps determine whether two qualitative variables are related or independent. Common applications include examining:

  • Hair colour vs. eye colour
  • Gender vs. handedness
  • Social status vs. crime rates

Example Dataset: Handedness by Gender

Here’s a contingency table showing gender and handedness:

Right-Handed Left-Handed Total
Male 43 9 52
Female 44 4 48
Total 87 13 100

Hypotheses:

  • Null hypothesis (\(H_0\)): Gender and handedness are independent
  • Alternative hypothesis (\(H_1\)): There is a relationship between gender and handedness

Conducting the Chi-Squared Test

Step 1: State the Hypotheses

  • \(H_0\): The variables are independent
  • \(H_1\): The variables are dependent

Step 2: Compute the Test Statistic

Use this formula for each cell:

\[ \frac{(O_{ij} - E_{ij})^2}{E_{ij}} \]

Where:

  • \(O_{ij}\) = Observed frequency
  • \(E_{ij}\) = Expected frequency if \(H_0\) were true

Step 3: Calculate and Sum the Terms

\[ T = \frac{(20 - 15)^2}{15} + \frac{(10 - 15)^2}{15} + \cdots = 6.67 \]

Step 4: Compare with the Critical Value

  • Degrees of freedom: \((2-1)(3-1) = 2\)
  • Critical value at 5% significance: 7.378

Decision:
Since 6.67 < 7.378, we fail to reject the null hypothesis. There’s insufficient evidence of a relationship between gender and course choice.


Probability Insights

Student Distribution by Course and Gender

Program Male Female Total
Maths 20 10 30
Equine Studies 10 20 30
Chemistry 30 30 60
Total 60 60 120

Basic Probability

Probability a student is in Equine Studies:

\[ P(\text{Equine}) = \frac{30}{120} = 0.25 \]

Conditional Probability

Probability a student is in Equine Studies given they are female:

\[ P(\text{Equine} | \text{Female}) = \frac{20/120}{60/120} = 0.33 \]


Chi-Squared Test Formula (General)

\[ \chi^2 = \sum_{i=1}^{n} \frac{(O_i - E_i)^2}{E_i} \]

Where:

  • \(X^2\) = Chi-squared statistic
  • \(O_i\) = Observed frequency
  • \(E_i\) = Expected frequency
  • \(n\) = Number of categories

Summary

The chi-squared test is a valuable tool in analyzing the independence between categorical variables and determining whether observed differences are statistically significant.