Last Time

The chi-squared test for goodness-of-fit allows us to assess a null model for a single proportion when we have multiple categories instead of just “success” or “failure”. For example:

• Does SLCPD disproportionately use force against people of color?

• Is jury selection racially biased in small Southern towns?

• Is there statistically significant evidence that the distribution of cylinders in cars manufactured in the 70s is unequal?

When there are only two categories, one can still run a chi-square test, and the results are more or less the same as the one-proportion test.

Exercise 1. Recall the juror example from last time: We considered a group of 275 voters in a small county. We import the dataset (which you should have uploaded to the same folder as this .Rmd file) in the following code chunk:

library(readr)
jury <- read_csv("jury.txt")
## Parsed with column specification:
## cols(
##   race = col_character()
## )

The following table shows the number of voters in the pool (observed data) and the proportions of registered voters in the county (expected counts):

library(mosaic)
race_tally <- tally(~ race, data = jury)
race_tally
## race
##    black hispanic    other    white 
##       26       25       19      205
  1. Verify that the conditions for the chi-square distribution are satisfied in this case.

  2. Compute \(\chi^{2}\) in the context of a hypothesis test to determine whether there is evidence of racial bias in how jurors were sampled. How many degrees of freedom are there in this case? What conclusion would you draw here? (The following code chunk will compute \(\chi^{2}\) for you.) Note that we’re using the expected counts from our null hypothesis after the p command below:

jury_test <- chisq.test(race_tally, p = c(19.25, 33, 24.75, 198), rescale.p = TRUE)
chisq <- jury_test$statistic
chisq
## X-squared 
##   5.88961

For the test statistic \(\chi^{2}\approx 5.89\) obtained from the distribution of races in the jury, we can compute a p-value as follows:

  1. Plot the null distribution.
pdist("chisq", df = 3, q = chisq)

## X-squared 
## 0.8828938
  1. Why is the number of degrees of freedom (df) equal to 3?
  1. Calculate the p-value.
1 - pdist("chisq", df = 3, q = 5.89,          
        plot = FALSE)
## [1] 0.1170863
  1. Why did we use 1 minus pdist and not just pdist to compute the P-value here?

  2. Based on the P-value, would you decide to reject or fail to reject the null hypothesis of no discrimination in jury selection? Explain your answer.

Post-hoc analysis

“Post-hoc” is a fancy Latin phrase for “after the fact”; that is, we’re analyzing our data further after computing a chi-squared test statistic and P-value.

When we reject the null, we are left with a very vague alternative hypothesis: there were not the same number of cases in every category, i.e. there was a difference somewhere in one or more categories. Often, we want to follow up to figure out which categories are the ones deviating from the null expectation.

The best way to do this is to look at the residuals. A residual for a cell measures how far away the observed count is from the expected count. It does no good just to calculate \(O-E\) however; cells with a large count may be far away from their expected values only because they are large numbers. What we want is some kind of relative distance.

We could use the chi-square component from each category, i.e. \[\frac{(O-E)^{2}}{E}\].

In statistics, it is more traditional to look at the square root of this quantity: \[\sqrt{\frac{(O-E)^{2}}{E}}=\frac{O-E}{\sqrt{E}}\].

Why? Note that we squared the “z-scores” for each category and then added them together to compute the chi-squared test statistic. Taking the square root “undoes” this squaring, thus taking us back to the original z-scores.

Additionally, the above quantity can be positive or negative, and this gives us more information about the direction of deviation. We can view the residuals as follows:

jury_test$residuals
## race
##      black   hispanic      other      white 
##  1.5384678 -1.3926212 -1.1557935  0.4974683
  1. What is the largest residual in the case of the jury pool above? What should this mean in the context of racial discrimination in jury selection?