Goodman & Kruskal’s Gamma for Ordinal Data

Motivating Question: Does how an NFL drive end differ based on how the drive begins??

Both the explanatory and the response variable can be considered ordinal, since there are preferred ways for a drive to start and also preferred ways for a drive to end. The orders have been listed above, so let’s also order them using factor(levels = c(...))

One issue with Maentel-Haenszel’s \(r\) is that it requires assigning the ordinal groups a score for each X and Y. Unless X and Y are binary, the scores can have a large impact on what the correlation is.

I only recommend using MH’s \(r\) if there are obvious choices for the scores of \(X\) and \(Y\). Otherwise, deciding on the scores can be highly subjective.

An alternative is the groups are more nominal than ordinal (but still ordinal) is to use Goodman and Kruskal’s \(\gamma\) to measure an association between the two ordinal variables.

Goodman and Kruskal’s Gamma using R

Calculaing the number of concordant and discordant pairs in R would be fairly difficult to do on our own, so let’s just use a function in the DescTools package that we’ve already installed: GoodmanKruskalGamma() (they probably could have shortened the function name :( )

GoodmanKruskalGamma(x = drive_freq)

## [1] 0.08132836

# Can also calculate a confidence interval
GoodmanKruskalGamma(x = drive_freq,
                    conf.level = 0.95)

##      gamma     lwr.ci     upr.ci 
## 0.08132836 0.04784259 0.11481414

Like all the others we’ve seen, there is a weak, positive, statistically significant association for how a drive starts vs how the drive ends.

The downside is that there isn’t a way of getting the number of concordant or discordant pairs to conduct a test :(

The good news is, since the confidence interval doesn’t contain 0, we know our test would be significant and can conclude that there is a positive association between how a drive starts and how the drive ends. (assuming the two are ordinal)

Custom function for a p-value

The file ‘gk_gamma.R’ has a function that can take one of two different options and calculate GK’s gamma, test stat, and p-value:

df_tab = a two column data frame with both columns being ordered factors
tab_pair = a two-way table created from ordered factors

source('gk_gamma.R')

gk_test(df_tab = drives[ ,2:3])

## $gk_gamma
## [1] 0.08132836
## 
## $z_stat
## [1] 2.967283
## 
## $p_val
## [1] 0.003004438

gk_test(tab_pair = drive_freq)

## $gk_gamma
## [1] 0.08132836
## 
## $z_stat
## [1] 2.967283
## 
## $p_val
## [1] 0.003004438

Goodman & Kruskal’s Gamma for Ordinal Data

Chapter 2

STAT 5350

Goodman and Kruskal’s Gamma using R

Custom function for a p-value