Motivating Question: Is a team more likely to score a touchdown after a turnover compared to a punt or kickoff?
Relative risk is often used for two binary variables, so let’s convert our data into two binary variables:
drive_start
: turnover - Interception/Fumble vs kick -
Kickoff/Puntdrive_end
: TD - Touchdown vs noTD - Field
Goal/Turnover/Puntdrives2 <-
drives |>
mutate(
drive_start = if_else(drive_start %in% c("Interception", "Fumble"),
true = "turnover",
false = "kick"),
# Converting drive_start to a factor with group order turnover, kick
drive_start = factor(drive_start,
levels = c("turnover", "kick")),
drive_end = if_else(drive_end == "Touchdown",
true = "TD",
false = "noTD"),
# Converting drive_end to a factor and ordering the groups as success then failure
drive_end = factor(drive_end,
levels = c("TD", "noTD"))
)
Next, let’s display the results in a two-way table using
xtabs()
# Displaying a two-way table using xtabs()
drive_freq <-
xtabs(formula = ~ drive_start + drive_end, data = drives2)
# The addmargins() function will add the margins to the table
drive_freq |>
addmargins()
## drive_end
## drive_start TD noTD Sum
## turnover 239 571 810
## kick 1200 4055 5255
## Sum 1439 4626 6065
And converting the table from counts to conditional proportions using
prop.table(margin = "drive_start")
drive_freq |>
prop.table(margin = "drive_start") |>
round(digits = 3)
## drive_end
## drive_start TD noTD
## turnover 0.295 0.705
## kick 0.228 0.772
Let’s create a bar chart to compare the probability of a touchdown between the two groups:
drive_freq |>
# Calculate the conditional proportions
prop.table(margin = "drive_start") |>
# Convert the table to a data frame to be used by ggplot()
data.frame() |>
# Rename Freq to prop
rename(prop = Freq) |>
ggplot(mapping = aes(x = drive_start,
fill = drive_end,
y = prop)) +
geom_col(width = 0.5) +
labs(y = "Proportion",
x = "How Drive Began",
fill = "Drive \nResult") +
theme_test() +
scale_y_continuous(expand = c(0, 0, 0, 0.05)) +
scale_fill_manual(labels = c("Touchdown", "No Touchdown"),
values = c("steelblue", "darkorange"))
There are some differences, but it isn’t a very noticeable difference.
The basic test for comparing “risks” is by using the risk difference:
\[P(Y = 1|X = 1) - P(Y = 1|X = 2) = \pi_{1|1} - \pi_{1|2}\]
which is the difference in the “success” rate of the response variable between two groups of the explanatory variable:
Our best estimate of the risk difference is:
\[p_{1|1} - p_{1|2} = \frac{n_{11}}{n_{1\bullet}} - \frac{n_{21}}{n_{2\bullet}}\]
Let’s use prop_test()
from the rstatix
package again to calculate the risk difference, p-value, and confidence
interval. The arguments for 2 binary variables are:
x =
a table of counts with 1st column = success and 2nd
column = failuresdetailed = T
to report the individual proportions and
confidence intervalrstatix::prop_test(x = drive_freq,
detailed = T)
## # A tibble: 1 × 13
## n n1 n2 estimate1 estimate2 statistic p df conf.low
## * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 6065 810 5255 0.295 0.228 16.9 0.0000396 1 0.0326
## # ℹ 4 more variables: conf.high <dbl>, method <chr>, alternative <chr>,
## # p.signif <chr>
Let’s improve the names and add the risk difference to the results
risk_diff_results <-
rstatix::prop_test(
x = drive_freq,
detailed = T
) |>
# Renaming n1 and n2 to their respective group names
rename(
turnover_n = n1,
kick_n = n2,
turnover_p = estimate1,
kick_p = estimate2,
test_stat = statistic,
p_val = p
) |>
# Calculating the risk difference
mutate(risk_diff = turnover_p - kick_p) |>
# Picking with columns to display:
dplyr::select(
n,
turnover_n,
kick_n,
turnover_p,
kick_p,
risk_diff,
test_stat,
p_val,
conf.low,
conf.high
)
risk_diff_results |>
signif(digits = 3)
## # A tibble: 1 × 10
## n turnover_n kick_n turnover_p kick_p risk_diff test_stat p_val conf.low
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 6060 810 5260 0.295 0.228 0.0667 16.9 3.96e-5 0.0326
## # ℹ 1 more variable: conf.high <dbl>
From our results, we can see that there is a statistically significant difference in how often drives end in touchdowns depending if they start from a turnover or a kick.
We can be 95% confidence that drives that start from a turnover are 3.3% to 10.1% more likely to end in a touchdown than drives that start from a kickoff or a punt
An alternative to risk difference is risk ratio:
\[R = \frac{P(Y = 1|X = 1)}{P(Y = 1|X = 2)} = \frac{\pi_{1|1}}{\pi_{1|2}}\]
The probability of success in group 1 over the probability of success of group 2
Hypothesis tests and confidence intervals are done using the natural log of the relative risk, \(\ln R\), since it is symmetric around 0 and will follow a Normal distribution if the sample size is large enough (all \(n_{ij} \ge 10\) is the rule of thumb)
The Wald standard error for the estimated log relative risk is:
\[SE(\ln \hat{R}) = \sqrt{\frac{1}{n_{11}} - \frac{1}{n_{1\bullet}} + \frac{1}{n_{21}} - \frac{1}{n_{2\bullet}}} \]
where
\(n_{11}\) and \(n_{21}\) are the counts of successes for groups 1 and 2, respectively
\(n_{1\bullet}\) and \(n_{2\bullet}\) are the total sample sizes of groups 1 and 2, respectively
Our null hypothesis is that there is no association between Y and X.
\[R = 1 \textrm{ or } \ln R = 0\]
In our context, the probability of a drive ends in a touchdown is the same if the drive starts as a result of a turnover or a kick
The test statistic is:
\[z = \frac{\ln \hat{R}}{\sqrt{\frac{1}{n_{11}} - \frac{1}{n_{1\bullet}} + \frac{1}{n_{21}} - \frac{1}{n_{2\bullet}}}}\]
The confidence interval formula uses the log risk ratio and then converts the results back by calculating the exponentiation of the end points.
The confidence formula for \(\ln R\) is:
\[\ln{\hat{R}} \ \pm \ z^* \sqrt{\frac{1}{n_{11}} - \frac{1}{n_{1\bullet}} + \frac{1}{n_{21}} - \frac{1}{n_{2\bullet}}}\]
While you may be expected to calculate the Confidence Interval or hypothesis test by hand on exams, we won’t do that in R. We’ll use a canned function for our purposes!
\
riskratio()
functionWe can compute the relative risk using the riskratio()
function in the epitools
package. The arguments are:
x =
a 2x2 table with:
conf.level =
the confidence level
verbose = T
if you want additional results
returned
Note: riskratio()
wants the row and
column order swapped from prop_test()
. We can swap the
order of the rows and columns using the additional argument
rev = both
pacman::p_load(epitools)
drives_rr <-
riskratio(
x = drive_freq,
verbose = F,
conf.level = 0.95,
rev = "both"
)
drives_rr
## $data
## drive_end
## drive_start noTD TD Total
## kick 4055 1200 5255
## turnover 571 239 810
## Total 4626 1439 6065
##
## $measure
## risk ratio with 95% C.I.
## drive_start estimate lower upper
## kick 1.000000 NA NA
## turnover 1.292124 1.148907 1.453194
##
## $p.value
## two-sided
## drive_start midp.exact fisher.exact chi.square
## kick NA NA NA
## turnover 4.724261e-05 5.233613e-05 3.264256e-05
##
## $correction
## [1] FALSE
##
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"
riskratio()
has a lot of output. You can specify a
specific quantity by using $
followed by the named table
you want to display:
Confidence interval: $measure
# Risk ratio estimate and Confidence interval from $measure
drives_rr$measure |>
data.frame()
## estimate lower upper
## kick 1.000000 NA NA
## turnover 1.292124 1.148907 1.453194
The first row is the reference group (“kick”) and the proceeding rows are the risk ratio of that group vs the reference group.
We can interpret the estimate as
“We estimate drives are 1.29 times more likely to result in a touchdown if they start by a turnover compared to a punt or kickoff”
Likewise, we can interpret the confidence interval as
“We are 95% confident that drives that begin from a turnover are 1.15 to 1.45 times more likely to result in a touchdown that drives that begin from a punt or kickoff”
For the p-value of a hypothesis test, we use
$p.value
# Hypothesis test results using $p.value
drives_rr$p.value |>
data.frame()
## midp.exact fisher.exact chi.square
## kick NA NA NA
## turnover 4.724261e-05 5.233613e-05 3.264256e-05
note that there are 3 different p-values. We’ll look at
midp.exact
and fisher.exact
soon, but the
chi.square
p-value is the one that uses the Normal
approximation of \(\ln\hat{R}\)