2023-10-15

Types of Chi-squared Tests

There are 2 types of Chi-squared tests:

  • Goodness of fit
  • Test of independence

Each of these tests use the formula to find chi-squared, denoted by:

\(\chi^2=\sum_{i=1}^{n}\frac{(O_i-E_i)^2}{E_i}\)

Where \(\chi^2\) is chi-squared, \(O_i\) is the observed value and \(E_i\) is the expected value.

Goodness of Fit Tests vs. Tests of Independence

Goodness of fit tests are used to test how the observed data differs from the expected value for one variable. Example: Expected vs. Actual values of a die when rolled 12 times.

The tests of independence on the other hand test how close two variables relate to each other. Example: How many people in a city will come to an event when 1 group is given paper flyers and the other group is given an invitation over email.

Each test have similar methods and results: The observed/expected values are measured and they are used to find the chi-squared value, which tells you how close the values are to one another.

Method of Chi-Squared Tests:

  1. Create a null hypothesis \((H_0)\) and an alternative hypothesis \((H_a)\).
  2. Create a data table of \(O_i\) and \(E_i\) (chosen by tester) based on the variables chosen.
  3. Calculate \(\chi^2\) using the equation on the previous slide.
  4. Compare \(\chi^2\) to the critical value of the data, which is found using a Chi-square table using degree of freedom and significance level of the data (\(df= (r_d-1) *(c_d-1)\) where \(r_d\) is the number of rows of data & \(c_d\) is the number of columns of data and \(\alpha\)=0.05 (standard) where \(\alpha\) is the significant level).

Goodness of Fit Test Example:

If we have a bag of 120 candies with 3 different colors, we can expect that there are 40 of each color in the bag. The null hypothesis would be that there are 40 of each color and the alternative hypothesis would be that there are not 40 of each color.

Color <- c("Red", "Blue", "Green")
Observed <- c(60, 50, 10)
Expected <- c(40, 40, 40)
Candy <- data.frame(Color, Observed, Expected)
##   Color Observed Expected
## 1   Red       60       40
## 2  Blue       50       40
## 3 Green       10       40

The chi-squared value can now be computed using this data.

Goodness of Fit Example Cont.:

##   Color Observed Expected ChiSquared
## 1   Red       60       40       10.0
## 2  Blue       50       40        2.5
## 3 Green       10       40       22.5
## [1] 35

We can see that when we add each chi-squared value up from each color, the overall chi-squared value of the data is 35. Using a chi-square distribution table with degrees of freedom of 2 (3-1) and significance value of 0.05, the critical value is 5.991. Since our chi-squared value is greater than the critical value, we can conclude that the actual results of the data do not follow expectations.

Goodness of Fit Example Graph:

Test of Independence Example:

In a city with a population of 1000, 300 were given flyer invitations to a local party and 700 were given email invitations to a local party. We can make an expectation that 600 people given the emails will come to the party & 100 will stay home and 250 people given flyers will come to the party & 50 will stay home:

##   Method_of_Invitation Observed_Party_goers Observed_Ditchers
## 1                Flyer                  200               100
## 2                Email                  600               100
##   Expected_Party_goers Expected_Ditchers
## 1                  250                50
## 2                  600               100

Test of Independence Example Chi-Squared:

##   Method_of_Invitation Observed_Party_goers Observed_Ditchers
## 1                Flyer                  200               100
## 2                Email                  600               100
##   Expected_Party_goers Expected_Ditchers ChiSquared_partygoers
## 1                  250                50                    10
## 2                  600               100                     0
##   ChiSquared_Ditchers
## 1                  50
## 2                   0

Adding up the chi-squared values from the party goers and the people who stayed at home gives us a total chi-squared value of 60.

Test of Independence Example Results:

Like the Goodness of Fit example, we can use the chi-squared distribution table using significance value of 0.05 and degrees of freedom = (2-1)*(4-1)=3 to see that the critical value is 7.815, so the 2 variables do not follow expectations since 60>7.815. As we can see, this test was very similar to the goodness of fit example, except the test of independence can look at the relationship between multiple variables instead of just one.

Now we can visualize this data using a bar graph of the observed data vs the expected data for the party-goers and a bar graph showing the observed data vs the expected data for the Ditchers.

Test of Independence Example Bar Plot of Party-goers:

Test of Independence Example Bar Plot of Ditchers:

References: