Motivating Question: Does how an NFL drive end differ based on how the drive begins??
Both the explanatory and the response variable can be considered
ordinal, since there are preferred ways for a drive to start and also
preferred ways for a drive to end. The orders have been listed above, so
let’s also order them using factor(levels = c(...))
drives <-
drives |>
mutate(
drive_start = factor(drive_start,
levels = c("Kickoff", "Punt", "Interception", "Fumble"),
ordered = T), # Reordering the groups
# And let's reorder drive end again as well
drive_end = factor(drive_end,
levels = c("Turnover", "Punt", "Field Goal", "Touchdown"),
ordered = T)
)
# Look at our resulting 4x4 table
drive_freq <-
xtabs(
formula = ~ drive_start + drive_end,
data = drives
)
sum(drive_freq)
## [1] 6065
One issue with Maentel-Haenszel’s \(r\) is that it requires assigning the ordinal groups a score for each X and Y. Unless X and Y are binary, the scores can have a large impact on what the correlation is.
I only recommend using MH’s \(r\) if there are obvious choices for the scores of \(X\) and \(Y\). Otherwise, deciding on the scores can be highly subjective.
An alternative is the groups are more nominal than ordinal (but still ordinal) is to use Goodman and Kruskal’s \(\gamma\) to measure an association between the two ordinal variables.
Calculaing the number of concordant and discordant pairs in R would
be fairly difficult to do on our own, so let’s just use a function in
the DescTools
package that we’ve already installed:
GoodmanKruskalGamma()
(they probably could have shortened
the function name :( )
GoodmanKruskalGamma(x = drive_freq)
## [1] 0.08132836
# Can also calculate a confidence interval
GoodmanKruskalGamma(x = drive_freq,
conf.level = 0.95)
## gamma lwr.ci upr.ci
## 0.08132836 0.04784259 0.11481414
Like all the others we’ve seen, there is a weak, positive, statistically significant association for how a drive starts vs how the drive ends.
The downside is that there isn’t a way of getting the number of concordant or discordant pairs to conduct a test :(
The good news is, since the confidence interval doesn’t contain 0, we know our test would be significant and can conclude that there is a positive association between how a drive starts and how the drive ends. (assuming the two are ordinal)
The file ‘gk_gamma.R’ has a function that can take one of two different options and calculate GK’s gamma, test stat, and p-value:
df_tab
= a two column data frame with both columns
being ordered factorstab_pair
= a two-way table created from ordered
factorssource('gk_gamma.R')
gk_test(df_tab = drives[ ,2:3])
## $gk_gamma
## [1] 0.08132836
##
## $z_stat
## [1] 2.967283
##
## $p_val
## [1] 0.003004438
gk_test(tab_pair = drive_freq)
## $gk_gamma
## [1] 0.08132836
##
## $z_stat
## [1] 2.967283
##
## $p_val
## [1] 0.003004438