Chi-Square Tests

RSM2074 Lecture Week 9

Dr. Jun Ho Chai

Chi-Square Tests

The Plan for Today

  • What does “Parametric” and “Non-parametric” mean?
  • Deciding when to use non-parametric tests
  • The Chi-Square test for Goodness of Fit
  • The Chi-Square test for Independence

Parametric vs Non-Parametric

What is Parametric?

Parametric: Statistics based on parameters (Mean & SD)

Testing Normality

Shapiro-Wilk & Kolmogorov-Smirnov tests

When to Check Normality

Null hypothesis (H₀): Data comes from a normally distributed population

  • We don’t want to reject H₀!
  • If p < .05, data violates normality
    • Must use non-parametric tests
  • Exception: Large samples (>40)
    • Central Limit Theorem: distribution normalizes
    • Can still use parametric tests

Why Non-Parametric?

When parametric assumptions are violated:

  • Non-parametric tests are more robust
  • Don’t assume normal distribution
  • Don’t assume specific parameters
  • Can handle outliers better

Trade-off:

  • Less statistical power (harder to detect real effects)
  • But safer when assumptions violated

Data Types in Statistics

Categorical → Chi-Square | Continuous → t-test, ANOVA

Chi-Square Tests

What is Chi-Square?

Compares Observed vs Expected frequencies

Hypotheses

Null Hypothesis (H₀): Observed frequencies match Expected frequencies

Alternative Hypothesis (H₁): Observed frequencies differ from Expected

Two Types of Chi-Square

Which Chi-Square Test?

Identify the correct test for each scenario:

Chi-Square Goodness of Fit

Testing a Single Variable

Does my observed data match expected distribution?

Example 1: Fair Coin

Simple example with equal expected frequencies

Flip a Coin

Comparing Observed vs Expected

See how deviations accumulate into χ²:

Calculate χ²

Keep expected constant, adjust observed:

Practice Problem Part 1

Survey of 120 students - Expected frequencies:

Practice Problem Part 2

Calculate differences for each major:

Practice Problem Part 3

Divide by expected to get final χ² components:

Degrees of Freedom: Formula

Try different numbers of categories:

Why df Matters: Critical Values

Same χ² can mean different things with different df:

Chi-Square Test of Independence

Two Variables

Does Variable A’s distribution depend on Variable B?

Building a 2×2 Table

Start with totals, cells to be determined:

Expected Values Logic

Key insight: Each cell depends on row and column totals

Expected Values: Step-by-Step

Breaking down each cell calculation:

Calculate Expected Values

Adjust totals and see how expected changes:

Build Your Own 2×2 Table

Change observed and watch expected update:

Degrees of Freedom: Formula

Try different table sizes:

Why df Matters: Critical Values

Same χ² with different df = different conclusions:

Real-World Applications

Example: Coffee & Ethnicity

Malaysia beverage preference study (n=200)

Interpreting Coffee & Ethnicity

Comparing observed vs expected values:

Example: Student Performance

Does attendance impact final grade? (n=150)

Interpreting Student Performance

Comparing observed vs expected values:

Example: Social Media Platform

Platform preference by age group (n=300)

Interpreting Social Media Platform

Comparing observed vs expected values:

Summary & Practice

Decision Tree

Key Formulas

Key Concepts

Thank You!