2024-02-25

Introduction to chi sqared test GOF

  • Assumptions
    • random sample
    • categorical variable
    • frequencies ≥ 1
    • no more than 20% of the frequencies < 5
  • The chi-squared statistic \(X^2\) is calculated as:

\[X^2 = \sum \frac{(Observed - Expected)^2}{Expected}\]

Research Question

Is there a uniform distribution of zodiac signs among successful individuals? Let’s analyze with a significance level of α = 0.05.

  • Null Hypothesis(\({H_0}\)): Births are uniformly distributed over zodiac signs.
  • Alternative Hypothesis(\({H_a}\)): Births are not uniformly distributed over zodiac signs.

Data Table

Data are in the table below:
Aries Taurus Gemini Cancer Leo Virgo Libra Scorpio Sagittarius Capricorn Aquarius Pisces
Births 18 19 19 24 23 18 21 22 18 17 23 28

Data through Plotly

GGPLOT Visualization:Observed vs Expected Frequencies

Calculation of Test Statistic

Ho: Births are uniformly distributed over zodiac signs Ha: Births are not uniformly distributed over zodiac signs

Test statistic calculation: \[ \chi^2 = \sum \frac{(O_i - E_i)^2}{E_i} \] where \(O_i\) is the observed frequency and \(E_i\) is the expected frequency.

Substituting the observed and expected frequencies: \[ \chi^2 = \frac{(18 - 20.833)^2}{20.833} + \frac{(19 - 20.833)^2}{20.833} + \cdots + \frac{(28 - 20.833)^2}{20.833} \] \[ \chi^2 = 5.64800 \]

Code that complutes test statistic and critical value

chi_square_statistic <- sum((observed - expected)^2 / expected)
chi_square_statistic
## [1] 5.648
critical_value <- qchisq(p = 0.95, df = 11)
c(chi_square_statistic, critical_value)
## [1]  5.64800 19.67514

GGPLOT Visualization:Chi-squared distribution

Conclusion

  • From the Chi-Square distribution table, with df=11, the critical value is 19.675 and the test statistic (5.64800) is outside the rejection region
  • Fail to reject the null hypothesis. There is not enough evidence to suggest that births are not uniformly distributed over zodiac signs