Chi-Squared Test and Probability – A Step-by-Step Guide
What is a Chi-Squared Test?
A chi-squared (\(\chi^2\)) test is a statistical method used to analyze tabular data with frequencies or counts. It helps determine whether two qualitative variables are related or independent. Common applications include examining:
- Hair colour vs. eye colour
- Gender vs. handedness
- Social status vs. crime rates
Example Dataset: Handedness by Gender
Here’s a contingency table showing gender and handedness:
Right-Handed | Left-Handed | Total | |
---|---|---|---|
Male | 43 | 9 | 52 |
Female | 44 | 4 | 48 |
Total | 87 | 13 | 100 |
Hypotheses:
- Null hypothesis (\(H_0\)): Gender and handedness are
independent
- Alternative hypothesis (\(H_1\)): There is a relationship between gender and handedness
Conducting the Chi-Squared Test
Step 1: State the Hypotheses
- \(H_0\): The variables are
independent
- \(H_1\): The variables are dependent
Step 2: Compute the Test Statistic
Use this formula for each cell:
\[ \frac{(O_{ij} - E_{ij})^2}{E_{ij}} \]
Where:
- \(O_{ij}\) = Observed
frequency
- \(E_{ij}\) = Expected frequency if \(H_0\) were true
Step 3: Calculate and Sum the Terms
\[ T = \frac{(20 - 15)^2}{15} + \frac{(10 - 15)^2}{15} + \cdots = 6.67 \]
Step 4: Compare with the Critical Value
- Degrees of freedom: \((2-1)(3-1) =
2\)
- Critical value at 5% significance: 7.378
Decision:
Since 6.67 < 7.378, we fail to reject the null hypothesis. There’s
insufficient evidence of a relationship between gender and course
choice.
Probability Insights
Student Distribution by Course and Gender
Program | Male | Female | Total |
---|---|---|---|
Maths | 20 | 10 | 30 |
Equine Studies | 10 | 20 | 30 |
Chemistry | 30 | 30 | 60 |
Total | 60 | 60 | 120 |
Basic Probability
Probability a student is in Equine Studies:
\[ P(\text{Equine}) = \frac{30}{120} = 0.25 \]
Conditional Probability
Probability a student is in Equine Studies given they are female:
\[ P(\text{Equine} | \text{Female}) = \frac{20/120}{60/120} = 0.33 \]
Chi-Squared Test Formula (General)
\[ \chi^2 = \sum_{i=1}^{n} \frac{(O_i - E_i)^2}{E_i} \]
Where:
- \(X^2\) = Chi-squared
statistic
- \(O_i\) = Observed frequency
- \(E_i\) = Expected frequency
- \(n\) = Number of categories
Summary
The chi-squared test is a valuable tool in analyzing the independence between categorical variables and determining whether observed differences are statistically significant.