Motivating Question: Does how an NFL drive end differ based on how the drive begins??

Both the explanatory and the response variable can be considered ordinal, since there are preferred ways for a drive to start and also preferred ways for a drive to end. The orders have been listed above, so let’s also order them using factor(levels = c(...))

drives <- 
  drives |> 
  mutate(
    drive_start = factor(drive_start,
                         levels = c("Kickoff", "Punt", "Interception", "Fumble"),
                         ordered = T), # Reordering the groups
    
    # And let's reorder drive end again as well
    drive_end = factor(drive_end, 
                       levels = c("Turnover", "Punt", "Field Goal", "Touchdown"),
                       ordered = T)
  )

# Look at our resulting 4x4 table
drive_freq <- 
  xtabs(
    formula = ~ drive_start + drive_end,
    data = drives
  )

sum(drive_freq)
## [1] 6065

One issue with Maentel-Haenszel’s \(r\) is that it requires assigning the ordinal groups a score for each X and Y. Unless X and Y are binary, the scores can have a large impact on what the correlation is.

I only recommend using MH’s \(r\) if there are obvious choices for the scores of \(X\) and \(Y\). Otherwise, deciding on the scores can be highly subjective.

An alternative is the groups are more nominal than ordinal (but still ordinal) is to use Goodman and Kruskal’s \(\gamma\) to measure an association between the two ordinal variables.

Goodman and Kruskal’s Gamma using R

Calculaing the number of concordant and discordant pairs in R would be fairly difficult to do on our own, so let’s just use a function in the DescTools package that we’ve already installed: GoodmanKruskalGamma() (they probably could have shortened the function name :( )

GoodmanKruskalGamma(x = drive_freq)
## [1] 0.08132836
# Can also calculate a confidence interval
GoodmanKruskalGamma(x = drive_freq,
                    conf.level = 0.95) 
##      gamma     lwr.ci     upr.ci 
## 0.08132836 0.04784259 0.11481414

Like all the others we’ve seen, there is a weak, positive, statistically significant association for how a drive starts vs how the drive ends.

The downside is that there isn’t a way of getting the number of concordant or discordant pairs to conduct a test :(

The good news is, since the confidence interval doesn’t contain 0, we know our test would be significant and can conclude that there is a positive association between how a drive starts and how the drive ends. (assuming the two are ordinal)

Custom function for a p-value

The file ‘gk_gamma.R’ has a function that can take one of two different options and calculate GK’s gamma, test stat, and p-value:

  • df_tab = a two column data frame with both columns being ordered factors
  • tab_pair = a two-way table created from ordered factors
source('gk_gamma.R')

gk_test(df_tab = drives[ ,2:3])
## $gk_gamma
## [1] 0.08132836
## 
## $z_stat
## [1] 2.967283
## 
## $p_val
## [1] 0.003004438
gk_test(tab_pair = drive_freq)
## $gk_gamma
## [1] 0.08132836
## 
## $z_stat
## [1] 2.967283
## 
## $p_val
## [1] 0.003004438