Motivating Question: Is a team more likely to score a touchdown after a turnover compared to a punt or kickoff?

Relative risk is often used for two binary variables, so let’s convert our data into two binary variables:

drives2 <- 
  drives |> 
  mutate(
    drive_start = if_else(drive_start %in% c("Interception", "Fumble"),
                          true = "turnover",
                          false = "kick"),
    # Converting drive_start to a factor with group order turnover, kick
    drive_start = factor(drive_start,
                         levels = c("turnover", "kick")),
    
    
    drive_end = if_else(drive_end == "Touchdown",
                        true = "TD",
                        false = "noTD"),
    # Converting drive_end to a factor and ordering the groups as success then failure
    drive_end = factor(drive_end,
                       levels = c("TD", "noTD"))
  )

Next, let’s display the results in a two-way table using xtabs()

# Displaying a two-way table using xtabs()
drive_freq <- 
  xtabs(formula = ~ drive_start + drive_end, data = drives2)

# The addmargins() function will add the margins to the table
drive_freq |> 
  addmargins() 
##            drive_end
## drive_start   TD noTD  Sum
##    turnover  239  571  810
##    kick     1200 4055 5255
##    Sum      1439 4626 6065

And converting the table from counts to conditional proportions using prop.table(margin = "drive_start")

drive_freq |> 
  prop.table(margin = "drive_start") |> 
  
  round(digits = 3)
##            drive_end
## drive_start    TD  noTD
##    turnover 0.295 0.705
##    kick     0.228 0.772

Let’s create a bar chart to compare the probability of a touchdown between the two groups:

drive_freq |> 
  # Calculate the conditional proportions
  prop.table(margin = "drive_start") |> 
  # Convert the table to a data frame to be used by ggplot()
  data.frame() |> 
  # Rename Freq to prop
  rename(prop = Freq) |> 
  
  ggplot(mapping = aes(x = drive_start,
                       fill = drive_end,
                       y = prop)) + 
  
  geom_col(width = 0.5) + 
  
  labs(y = "Proportion",
       x = "How Drive Began",
       fill = "Drive \nResult") + 
  
  theme_test() + 
  
  scale_y_continuous(expand = c(0, 0, 0, 0.05)) + 
  
  scale_fill_manual(labels = c("Touchdown", "No Touchdown"),
                    values = c("steelblue", "darkorange"))

There are some differences, but it isn’t a very noticeable difference.

Risk Difference: p1 - p2

The basic test for comparing “risks” is by using the risk difference:

\[P(Y = 1|X = 1) - P(Y = 1|X = 2) = \pi_{1|1} - \pi_{1|2}\]

which is the difference in the “success” rate of the response variable between two groups of the explanatory variable:

Our best estimate of the risk difference is:

\[p_{1|1} - p_{1|2} = \frac{n_{11}}{n_{1\bullet}} - \frac{n_{21}}{n_{2\bullet}}\]

Let’s use prop_test() from the rstatix package again to calculate the risk difference, p-value, and confidence interval. The arguments for 2 binary variables are:

rstatix::prop_test(x = drive_freq,
                   detailed = T)
## # A tibble: 1 × 13
##       n    n1    n2 estimate1 estimate2 statistic         p    df conf.low
## * <dbl> <dbl> <dbl>     <dbl>     <dbl>     <dbl>     <dbl> <dbl>    <dbl>
## 1  6065   810  5255     0.295     0.228      16.9 0.0000396     1   0.0326
## # ℹ 4 more variables: conf.high <dbl>, method <chr>, alternative <chr>,
## #   p.signif <chr>

Let’s improve the names and add the risk difference to the results

risk_diff_results <- 
  rstatix::prop_test(
    x = drive_freq,
    detailed = T
  ) |> 
  # Renaming n1 and n2 to their respective group names
  rename(
    turnover_n = n1,
    kick_n = n2,
    turnover_p = estimate1,
    kick_p = estimate2,
    test_stat = statistic,
    p_val = p
  ) |> 
  
  # Calculating the risk difference
  mutate(risk_diff = turnover_p - kick_p) |> 
  
  # Picking with columns to display:
  dplyr::select(
    n, 
    turnover_n, 
    kick_n, 
    turnover_p, 
    kick_p, 
    risk_diff, 
    test_stat, 
    p_val, 
    conf.low, 
    conf.high
  )


risk_diff_results |> 
  signif(digits = 3)
## # A tibble: 1 × 10
##       n turnover_n kick_n turnover_p kick_p risk_diff test_stat   p_val conf.low
##   <dbl>      <dbl>  <dbl>      <dbl>  <dbl>     <dbl>     <dbl>   <dbl>    <dbl>
## 1  6060        810   5260      0.295  0.228    0.0667      16.9 3.96e-5   0.0326
## # ℹ 1 more variable: conf.high <dbl>

From our results, we can see that there is a statistically significant difference in how often drives end in touchdowns depending if they start from a turnover or a kick.

We can be 95% confidence that drives that start from a turnover are 3.3% to 10.1% more likely to end in a touchdown than drives that start from a kickoff or a punt

Risk Ratio

An alternative to risk difference is risk ratio:

\[R = \frac{P(Y = 1|X = 1)}{P(Y = 1|X = 2)} = \frac{\pi_{1|1}}{\pi_{1|2}}\]

The probability of success in group 1 over the probability of success of group 2

Formula: Log Relative Risk

Hypothesis tests and confidence intervals are done using the natural log of the relative risk, \(\ln R\), since it is symmetric around 0 and will follow a Normal distribution if the sample size is large enough (all \(n_{ij} \ge 10\) is the rule of thumb)

The Wald standard error for the estimated log relative risk is:

\[SE(\ln \hat{R}) = \sqrt{\frac{1}{n_{11}} - \frac{1}{n_{1\bullet}} + \frac{1}{n_{21}} - \frac{1}{n_{2\bullet}}} \]

where

  • \(n_{11}\) and \(n_{21}\) are the counts of successes for groups 1 and 2, respectively

  • \(n_{1\bullet}\) and \(n_{2\bullet}\) are the total sample sizes of groups 1 and 2, respectively

Hypothesis Test

Our null hypothesis is that there is no association between Y and X.

\[R = 1 \textrm{ or } \ln R = 0\]

In our context, the probability of a drive ends in a touchdown is the same if the drive starts as a result of a turnover or a kick

The test statistic is:

\[z = \frac{\ln \hat{R}}{\sqrt{\frac{1}{n_{11}} - \frac{1}{n_{1\bullet}} + \frac{1}{n_{21}} - \frac{1}{n_{2\bullet}}}}\]

Confidence interval

The confidence interval formula uses the log risk ratio and then converts the results back by calculating the exponentiation of the end points.

The confidence formula for \(\ln R\) is:

\[\ln{\hat{R}} \ \pm \ z^* \sqrt{\frac{1}{n_{11}} - \frac{1}{n_{1\bullet}} + \frac{1}{n_{21}} - \frac{1}{n_{2\bullet}}}\]

While you may be expected to calculate the Confidence Interval or hypothesis test by hand on exams, we won’t do that in R. We’ll use a canned function for our purposes!

\

Relative risk using the riskratio() function

We can compute the relative risk using the riskratio() function in the epitools package. The arguments are:

  1. x = a 2x2 table with:

    • column order of Failure/Success
    • row order of baseline group/interest group
  2. conf.level = the confidence level

  3. verbose = T if you want additional results returned

Note: riskratio() wants the row and column order swapped from prop_test(). We can swap the order of the rows and columns using the additional argument rev = both

pacman::p_load(epitools)

drives_rr <- 
  riskratio(
    x = drive_freq,
    verbose = F,
    conf.level = 0.95,
    rev = "both"
  )

drives_rr
## $data
##            drive_end
## drive_start noTD   TD Total
##    kick     4055 1200  5255
##    turnover  571  239   810
##    Total    4626 1439  6065
## 
## $measure
##            risk ratio with 95% C.I.
## drive_start estimate    lower    upper
##    kick     1.000000       NA       NA
##    turnover 1.292124 1.148907 1.453194
## 
## $p.value
##            two-sided
## drive_start   midp.exact fisher.exact   chi.square
##    kick               NA           NA           NA
##    turnover 4.724261e-05 5.233613e-05 3.264256e-05
## 
## $correction
## [1] FALSE
## 
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"

riskratio() has a lot of output. You can specify a specific quantity by using $ followed by the named table you want to display:

Confidence interval: $measure

# Risk ratio estimate and Confidence interval from $measure
drives_rr$measure |> 
  data.frame()
##          estimate    lower    upper
## kick     1.000000       NA       NA
## turnover 1.292124 1.148907 1.453194

The first row is the reference group (“kick”) and the proceeding rows are the risk ratio of that group vs the reference group.

We can interpret the estimate as

“We estimate drives are 1.29 times more likely to result in a touchdown if they start by a turnover compared to a punt or kickoff”

Likewise, we can interpret the confidence interval as

“We are 95% confident that drives that begin from a turnover are 1.15 to 1.45 times more likely to result in a touchdown that drives that begin from a punt or kickoff”

For the p-value of a hypothesis test, we use $p.value

# Hypothesis test results using $p.value
drives_rr$p.value |> 
  data.frame()
##            midp.exact fisher.exact   chi.square
## kick               NA           NA           NA
## turnover 4.724261e-05 5.233613e-05 3.264256e-05

note that there are 3 different p-values. We’ll look at midp.exact and fisher.exact soon, but the chi.square p-value is the one that uses the Normal approximation of \(\ln\hat{R}\)