Testing to see if how a drive starts (drive_start2) and if it ends in a touchdown (drive_end2) are associated:

chisq_test(x = drives$drive_start2,
           y = drives$drive_end2)
## # A tibble: 1 × 6
##       n statistic        p    df method          p.signif
## * <int>     <dbl>    <dbl> <int> <chr>           <chr>   
## 1  4152      18.4 0.000018     1 Chi-square test ****

Let’s include another variable: yds_to_td -> how far away the drive starts from the endzone (closer to the endzone the easier it should be to score a TD)

Testing homogeneous associations

The test of homogeneous association assumes that the odds ratio between two variables, \(X\) and \(Y\), are the same across each level of a third variable, \(Z\)

\[\theta_{XY|Z_1} = \theta_{XY|Z_2} = ... = \theta_{XY|Z_K}\]

And if \(X\) and \(Y\) are are homogeneous across \(Z\), then any combo of two variables will be homogeneous across the third:

\[\theta_{XY|Z_1} = \theta_{XY|Z_2} = ... = \theta_{XY|Z_K} \\ \theta_{XZ|Y_1} = \theta_{XZ|Y_2} = ... = \theta_{XZ|Y_J} \\ \theta_{YZ|X_1} = \theta_{YZ|X_2} = ... = \theta_{YZ|X_I}\]

However, to calculate the odds ratio, we need the two variables not in the conditioning need to be binary

Which means, for our NFL example, we can only test to see if the association between drive_start2 and drive_end2 is constant across the different starting locations in yds_to_td

drives |> 
  summarize(
    .by = yds_to_td, 
    odds_ratio = oddsratio(table(drive_start2, drive_end2), rev = "col")$measure[2,1]
  )
##   yds_to_td odds_ratio
## 1      < 25  1.1221770
## 2     >= 75  1.2689733
## 3   25 - 50  0.9016621
## 4   50 - 75  1.0915382

The odds of scoring a touchdown from a turnover vs a kick are all around 1, with the last category being the farthest from 1, but not drastically different.

Breslow-Day Test

If we want to conduct a test of homogeneous association, we can use the Breslow-Day test using BreslowDayTest() in the DescTools package. It only has one argument, x, which should be a \(2\times 2 \times K\) table

drives_table <- 
  # forming the 2x2xK table
  xtabs(
    formula = ~ drive_start2 + drive_end2 + yds_to_td,  # formula = ~ X + Y + Z
    data = drives
  )  

# Conducting a Breslow-Day Test
BreslowDayTest(drives_table)
## 
##  Breslow-Day test on Homogeneity of Odds Ratios
## 
## data:  drives_table
## X-squared = 0.89892, df = 3, p-value = 0.8257

Unsurprisingly, we don’t reject our null hypothesis. The reason it is not surprising is because we already failed to reject the null hypothesis for conditional independence between \(X\) and \(Y\) conditioned on \(Z\), which is just a special case of homogeneous association where all the odds ratios are not only equal, but also equal to 1!

Estimated Odds-Ratio

If we fail to reject the null hypothesis, we can calculate the estimated homogeneous odds-ratio, \(\hat{\theta}\) as:

\[\hat{\theta} = \frac{\sum_{k = 1}^K\frac{n_{11k}\times n_{22k}}{n_{\bullet\bullet k}}}{\sum_{k = 1}^K\frac{n_{12k}\times n_{21k}}{n_{\bullet\bullet k}}}\]

# The numerator pairs: n11k * n22k
theta_num <- drives_table[1, 1, ] * drives_table[2, 2, ] 

# The denominator pairs: n12k * n21k
theta_den <- drives_table[1, 2, ] * drives_table[2, 1, ]

# The totals for each drive location: adding across X & Y for each k
ks <- margin.table(drives_table, margin = 3)


# Calculating the estimated homogeneous odds ratio:
theta_bd <- sum(theta_num / ks) / sum(theta_den / ks)
theta_bd
## [1] 0.9239877