Motivating Question: Does how an NFL drive end differ based on how the drive begins??
Now we can summarize our results in an \(I \times J\) table. Specifically, a \(4 \times 4\) table since how a drive starts and ends both have 4 different outcomes.
Let’s change the order of the groups using
factor(levels = c(...))
drives <-
drives |>
mutate(drive_start = factor(drive_start,
levels = c("Punt", "Kickoff", "Interception", "Fumble")), # Reordering the groups
# And let's reorder drive end again as well
drive_end = factor(drive_end,
levels = c("Turnover", "Punt", "Field Goal", "Touchdown"),
ordered = T))
# Look at our resulting 4x4 table
drive_freq <-
xtabs(formula = ~ drive_start + drive_end,
data = drives)
# Make the table look nice
drive_freq |>
data.frame() |>
pivot_wider(names_from = "drive_end",
values_from = "Freq") |>
rename(`Drive Start` = drive_start) |>
gt() |>
cols_align(align = "center") |>
gt_add_divider(columns = everything(),
color = "black",
sides = "all") |>
tab_header(title = "Drive Result")
Drive Result | ||||
Drive Start | Turnover | Punt | Field Goal | Touchdown |
---|---|---|---|---|
Punt | 356 | 1074 | 432 | 537 |
Kickoff | 459 | 1238 | 496 | 663 |
Interception | 40 | 106 | 107 | 114 |
Fumble | 60 | 138 | 120 | 125 |
Let’s convert from counts to conditional proportions using
prop.table()
:
drive_freq |>
# Convert counts to conditional proportions
prop.table(margin = "drive_start") |>
# Display 3 significant digits
signif(digits = 3) |>
# Convert to a data frame
data.frame() |>
# Changing drive_end from 1 column to a column per way the drive ends
pivot_wider(names_from = "drive_end",
values_from = "Freq") |>
# Renaming drive_start to a table more friendly display
rename(`Drive Start` = drive_start) |>
# Creating a nice looking table
gt() |>
cols_align(align = "center") |>
gt_add_divider(columns = everything(),
color = "black",
sides = "all") |>
tab_header(title = "Drive Result")
Drive Result | ||||
Drive Start | Turnover | Punt | Field Goal | Touchdown |
---|---|---|---|---|
Punt | 0.148 | 0.448 | 0.180 | 0.224 |
Kickoff | 0.161 | 0.433 | 0.174 | 0.232 |
Interception | 0.109 | 0.289 | 0.292 | 0.311 |
Fumble | 0.135 | 0.312 | 0.271 | 0.282 |
Always a good idea to include a graph of the data!
drives |>
ggplot(mapping = aes(x = drive_start,
fill = fct_rev(drive_end))) +
geom_bar(position = "fill") +
# Adding labels and title
labs(y = "Proportion",
x = "How Drive Began",
fill = "Drive \nResult") +
# Removing the grid lines
theme_test() +
# Removing the space at the bottom of the graph
scale_y_continuous(expand = c(0, 0, 0, 0.05))
There is a more noticeable difference between how a drive begins and how a drive ends when examining all four groups.
drives are more likely to end with a field goal and less likely to end with a punt after some sort of turnover (interception/fumble) than if the drive started from a punt or kickoff.
Touchdowns occurred the most often and turnovers the least often when the drive began from an interception than the other 3 ways a drive could start
These differences are very large, so it is possible that the differences we see are just due to sample variation and not indicative of a difference in the population. So what do we do?
We want to test to see if the Independence Model is adequate for the observed counts or if the more complicated Association Model is necessary to describe the population proportions
When we only have two variables, we only have the two different models (Independence/Association).
Our Null Hypothesis assumes that the two variables are independent (Independent Model is true about the population).
\[H_0: \textrm{How an NFL drive begins and how it ends are Independent of one another} \rightarrow \pi_{ij} = \pi_{i\bullet} \times \pi_{\bullet j}\]
The alternative claims that the Association Model is necessary to accurately describe the probabilities formed by the groups of the two variables.
\[H_a: \textrm{How an NFL drive begins and how it ends are associated with one another} \rightarrow \pi_{ij} \ne \pi_{i\bullet} \times \pi_{\bullet j}\]
Our statistic is the same as it was for our goodness of fit tests, just we replace \(\pi_{i,0}\) with the estimated joint probability assuming the two variables are independent: \(\hat{\pi}_{ij,0} = p_{i\bullet} \times p_{\bullet j}\)
\[\chi^2 = N \sum_{i,j}^{I,J} \frac{ (p_{ij} - \hat{\pi}_{ij,0} )^2}{\hat{\pi}_{ij,0}}\]
\[G = 2\sum_{i,j}^{I,J} n_{ij} \ln \frac{p_{ij}}{\hat{\pi}_{ij,0}}\]
Both test statistics will follow a \(\chi^2_r\) distribution as long as all of the expected counts are at least 5:
\[N\hat{\pi}_{ij,0} \ge 5\]
If it follows a \(\chi^2_r\) distribution, what is the degrees of freedom, \(r\)?
It follows the same pattern as other tests we’ve seen:
The degrees of freedom will be the difference between \(r_1\) and \(r_0\):
\[r = r_1 - r_0 = (IJ - 1) - ((I - 1) + (J - 1)) = (I-1)(J-1)\]
For our example, \(I = J = 4\), so our degrees of freedom will be \((4 - 1)(4 - 1) = 9\)
drives_TOA <-
drives |>
# Calculate the joint counts:
summarize(
.by = c(drive_start, drive_end),
counts = n()
) |>
# Calculate the joint proportion:
mutate(joint_prop = counts/sum(counts)) |> # n_ij/N
# Calculating the marginal proportions for drive_start
mutate(
.by = drive_start,
prop_start = sum(joint_prop)
) |> # sum of p_ij over j = p_i.
# Calculating the marginal proportions for drive_end
mutate(
.by = drive_end,
prop_end = sum(joint_prop)
) |> # sum of p_ij over i = p_.j
# Calculating the remaining values needed for the test:
mutate(
# The joint proportions assuming Ho is true: p_i. * p_.j
prop_null = prop_start * prop_end,
# The individual zi2
zi2 = sum(counts) * (joint_prop - prop_null)^2/prop_null,
# The individual g_i
g_i = counts * log(joint_prop/prop_null)
)
drives_TOA |>
mutate(across(.cols = where(is.numeric),
.fns = round,
digits = 3))
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `across(.cols = where(is.numeric), .fns = round, digits = 3)`.
## Caused by warning:
## ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
## Supply arguments directly to `.fns` through an anonymous function instead.
##
## # Previously
## across(a:b, mean, na.rm = TRUE)
##
## # Now
## across(a:b, \(x) mean(x, na.rm = TRUE))
## drive_start drive_end counts joint_prop prop_start prop_end prop_null
## 1 Fumble Field Goal 120 0.020 0.073 0.190 0.014
## 2 Fumble Punt 138 0.023 0.073 0.421 0.031
## 3 Fumble Touchdown 125 0.021 0.073 0.237 0.017
## 4 Fumble Turnover 60 0.010 0.073 0.151 0.011
## 5 Interception Field Goal 107 0.018 0.061 0.190 0.012
## 6 Interception Punt 106 0.017 0.061 0.421 0.026
## 7 Interception Touchdown 114 0.019 0.061 0.237 0.014
## 8 Interception Turnover 40 0.007 0.061 0.151 0.009
## 9 Kickoff Field Goal 496 0.082 0.471 0.190 0.090
## 10 Kickoff Punt 1238 0.204 0.471 0.421 0.198
## 11 Kickoff Touchdown 663 0.109 0.471 0.237 0.112
## 12 Kickoff Turnover 459 0.076 0.471 0.151 0.071
## 13 Punt Field Goal 432 0.071 0.396 0.190 0.075
## 14 Punt Punt 1074 0.177 0.396 0.421 0.167
## 15 Punt Touchdown 537 0.089 0.396 0.237 0.094
## 16 Punt Turnover 356 0.059 0.396 0.151 0.060
## zi2 g_i
## 1 15.053 42.283
## 2 12.701 -41.707
## 3 3.765 21.666
## 4 0.699 -6.472
## 5 19.704 45.571
## 6 15.313 -40.050
## 7 8.325 30.714
## 8 4.265 -13.005
## 9 4.216 -45.715
## 10 0.982 34.870
## 11 0.316 -14.464
## 12 1.836 29.027
## 13 1.353 -24.169
## 14 3.923 64.901
## 15 1.821 -31.266
## 16 0.097 -5.878
Calculating the test statitsics and p-values
tibble(
test = c( 'chisq', 'G'),
test_stat = c(sum(drives_TOA$zi2), 2*sum(drives_TOA$g_i)),
p_val = pchisq(q = test_stat, df = 9, lower = F) |> signif(digits = 4)
)
## # A tibble: 2 × 3
## test test_stat p_val
## <chr> <dbl> <dbl>
## 1 chisq 94.4 2.15e-16
## 2 G 92.6 4.86e-16
Both tests agree that there is incredibly strong evidence of an association between how a drive starts and how a drive ends
We can use our two tests we used earlier to conduct a test of independence:
chisq_test()
in the rstatix
package with
the following arguments:
x =
which can be a couple of different
options:
a two way table of counts
a vector for the groups of the explanatory variables. If this
x
is used we need to specify y
y =
the vector of the other factor variable. Only
used if x
is a single vector as well
# Using the drive_freq table:
nfl_chisq <-
chisq_test(x = drive_freq)
nfl_chisq
## # A tibble: 1 × 6
## n statistic p df method p.signif
## * <int> <dbl> <dbl> <int> <chr> <chr>
## 1 6065 94.4 2.15e-16 9 Chi-square test ****
# Using the the raw data.frame
chisq_test(x = drives$drive_start,
y = drives$drive_end)
## # A tibble: 1 × 6
## n statistic p df method p.signif
## * <int> <dbl> <dbl> <int> <chr> <chr>
## 1 6065 94.4 2.15e-16 9 Chi-square test ****
The Gtest()
function in DescTools
package
works the same way as the chisq_test()
function:
pacman::p_load(DescTools)
nfl_G <-
GTest(x = drive_freq)
nfl_G
##
## Log likelihood ratio (G-test) test of independence without correction
##
## data: drive_freq
## G = 92.612, X-squared df = 9, p-value = 4.441e-16
# Using the raw data instead of the summary table
GTest(x = drives$drive_start,
y = drives$drive_end)
##
## Log likelihood ratio (G-test) test of independence without correction
##
## data: drives$drive_start and drives$drive_end
## G = 92.612, X-squared df = 9, p-value = 4.441e-16
We should also check that the sample size is large enough
\[\tilde{n}_{ij} = N (p_{i\bullet}\times p_{\bullet j}) \ge 5\]
These can be displayed using the object created by
chisq_test()
in the expected_freq()
function:
expected_freq(nfl_chisq) |>
floor()
## drive_end
## drive_start Turnover Punt Field Goal Touchdown
## Punt 361 1011 456 569
## Kickoff 430 1203 543 677
## Interception 55 154 69 87
## Fumble 66 186 84 105
Since we just need the smallest to be at least 5, we can use
min()
to find the smallest expected
expected_freq(nfl_chisq) |>
floor() |>
min()
## [1] 55
Since the smallest expected count is 55, we have a large enough sample size to use the \(\chi^2\) distribution to find the p-value.
Just rejecting our null hypothesis and claiming that the two variables are not independent usually is not a very strong claim. We’d like to see where the association occurs in combination of groups.
Using Pearson’s test, we can find two different residuals:
\[z_i = \frac{ p_{ij} - \hat{\pi}_{ij,0}}{\sqrt{\frac{\hat{\pi}_{ij,0}}{N}}}\]
\[z_i = \frac{ p_{ij} - \hat{\pi}_{ij,0}}{\sqrt{\frac{\hat{\pi}_{ij,0}(1-\hat{\pi}_{ij,0})}{N}}}\]
Typically we use the Standardized residuals since they are closer to standard Normal variables than Pearson’s residuals.
We can find either of these using either
pearson_residuals()
or std_residuals()
with
the object created by chisq_test()
# Pearson's Residuals
drive_pearson <-
pearson_residuals(nfl_chisq)
drive_pearson |>
round(digits = 2)
## drive_end
## drive_start Turnover Punt Field Goal Touchdown
## Punt -0.31 1.98 -1.16 -1.35
## Kickoff 1.36 0.99 -2.05 -0.56
## Interception -2.07 -3.91 4.44 2.89
## Fumble -0.84 -3.56 3.88 1.94
Standardized residuals
drive_standard <-
std_residuals(nfl_chisq)
drive_standard |>
round(digits = 2)
## drive_end
## drive_start Turnover Punt Field Goal Touchdown
## Punt -0.43 3.35 -1.66 -1.99
## Kickoff 2.02 1.79 -3.14 -0.88
## Interception -2.31 -5.31 5.09 3.41
## Fumble -0.94 -4.87 4.48 2.31
The larger the table, the harder it is to notice any patters for the associations. So instead of a table, let’s make a graph!
We’ll just use the standardized residuals since they more closely follow a standard Normal distribution
With 16 different category combinations, if the standardized residual is at least 3 (or -3), we say that that combination occurs more (or less) frequently than would be expected if the variables were independent.
# Start with the residual table
drive_standard |>
# Convert to a data frame
data.frame() |>
# Rename Freq to residual
rename(residual = Freq) |>
# Creating a column called
mutate(sig = if_else(abs(residual) >= 3,
true = "Yes",
false = "No")) |>
# use ggplot() to create a tile graph
ggplot(mapping = aes(x = drive_start,
y = drive_end,
fill = residual)) +
geom_tile(mapping = aes(color = sig), # Draws a different color line around the tile if it is significant
size = 1,
width = 0.98,
height = 0.98) +
# Adding the residuals to the plot using geom_text()
geom_text(mapping = aes(label = round(residual, digits = 2))) +
labs(x = "How the Drive Started",
y = "Drive Result",
fill = "Standardized \nResidual") +
scale_x_discrete(expand = c(0,0)) +
scale_y_discrete(expand = c(0,0)) +
# Change the color gradient for the fill aesthetic
scale_fill_gradient2(low = "orange",
high = "blue") +
# Changint the line to be red if it is significant and white it if is not
scale_color_manual(values = c("Yes" = "red",
"No" = "white")) +
theme_test() +
guides(color = "none")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
With 16 different category combinations, if the standardized residual is at least 3 (or -3), we say that that combination occurs more (or less) frequently than would be expected if the variables were independent.
The residuals show that:
Drives are more likely to end with a field goal and less likely to end with a punt if they began as a result of a turnover (Interception or Fumble)
Drives that began from a punt are more likely to end with a punt.
Drives that began from a kickoff are less likely to end in a field goal.
A drive is more likely to end in a Touchdown if it started as a result of an interception.