ESPN has a metric it uses to judge a quarterback’s (QB) performance called Quarterback Rating (QBR), and how it is calculated is kept a secret. The qbr data.csv file has the QBR rating and game statistics for all quarterback and game performances.
The columns in the csv file are:
qbr (response variable): The quarterback rating assigned by ESPN and will be between 0 and 100 (Larger -> Better)
pts_added: An advanced metric that measures how many points the quarterback added compared to the “average” QB play. Measured in points and the higher the better. Negative means performed below average, positive means performed above average
intercepted: If the quarterback threw at least one interception during the game. (no ints = “no”, at least 1 = “yes”)
yds_attempt: The number of yards gained from passes divided by the number of passes attempted. The larger the number, the better.
For this assignment, you’ll be using both points added and intercepted to predict ESPN’s QBR.
Create a single graph to display QBR, points added, and intercepted. Save it as gg_qbr. Describe any important characteristics for each variable.
gg_qbr <-
ggplot(
data = qbr_df,
mapping = aes(
x = pts_added,
y = qbr,
color = intercepted
)
) +
geom_point(
alpha = 0.5
) +
labs(
x = "Points added over expected",
y = "ESPN's Quarterback Rating",
color = "Did the QB throw\nan interception?"
)
gg_qbr
The association between points added and QBR is strong and positive without any obvious outliers
Games where the quarterback didn’t throw an interception have higher QBR and points added.
Calculate the correlation for quarterbacks when they threw at least one interception and the correlation when they did not throw an interception. Are the correlations similar?
qbr_df |>
summarize(
.by = intercepted,
corr = cor(qbr, pts_added)
)
## intercepted corr
## 1 no 0.9307264
## 2 yes 0.9400106
Yes, the correlations are both strong and very, very similar
Create the linear interaction model to predict QBR using
points added intercepted. Display the results using
get_regression_table()
or summary()
.
lm_int <-
lm(formula = qbr ~ pts_added * intercepted,
data = qbr_df)
get_regression_table(lm_int)
## # A tibble: 4 × 7
## term estimate std_error statistic p_value lower_ci upper_ci
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 intercept 50.6 0.209 243. 0 50.2 51
## 2 pts_added 6.16 0.054 115. 0 6.06 6.27
## 3 intercepted: yes -0.663 0.277 -2.39 0.017 -1.21 -0.119
## 4 pts_added:interceptedy… -0.163 0.072 -2.25 0.025 -0.305 -0.021
Use the output from question 2a) to predict the QBR for Patrick Mahomes and Brock Purdy from the 2024 superbowl. You can round your answers to 1 decimal place
Patrick Mahomes - 4.3 points added, 1 interception
\(\hat{qbr} = 50.6 + 6.2 \times 4.3 + (-0.663) + (-0.163) \times 4.3 =\) 75.9
Brock Purdy - 2.0 points added, 0 interceptions
\(\hat{qbr} = 50.6 + 6.2 \times 2.0 =\) 63
Using your answers from question 2b), what are the residuals for Patrick Mahomes (actual QBR = 75.8) and Brock Purdy (actual QBR = 69.8)
Patrick Mahomes:
\[ e = qbr - \hat{qbr} = 75.8 - 75.9 = -0.1\]
Brock Purdy:
\[e = qbr - \hat{qbr} = 69.8 - 63 = 6.8\]
Create the linear additive model to predict QBR using points
added intercepted. Display the results using
get_regression_table()
or summary()
.
lm_add <-
lm(formula = qbr ~ pts_added + intercepted,
data = qbr_df)
get_regression_table(lm_add)
## # A tibble: 3 × 7
## term estimate std_error statistic p_value lower_ci upper_ci
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 intercept 50.8 0.194 261. 0 50.4 51.1
## 2 pts_added 6.07 0.036 168. 0 6.00 6.14
## 3 intercepted: yes -0.791 0.271 -2.92 0.004 -1.32 -0.259
Interpret the model terms in context of the variables in the spaces between the hashtags below
If a quarterback has 0 points added and throws 0 interceptions, the expected QBR is 50.7.
For every additional point added, the predicted QBR increases by 6.074, keeping intercepted the same
If the quarterback throws at least one interception, the average QBR will be 0.8 lower, keeping the added points the same
Add the lines for the interaction model to gg_qbr. Do the same for the additive model. Which one (interaction or additive) appears to be the better choice? Justify your answer!
# Adding the lines for the interaction model:
gg_qbr +
geom_smooth(
method = "lm",
se = F,
formula = y ~ x
)
# Adding the lines for the additive model
gg_qbr +
geom_parallel_slopes(
se = F
)
From the two graphs, the additive model appears to be the better model since the lines are almost identical between the two!
Using the \(R^2\) values for both models, which one would you recommend, the interaction of the additive model? Make sure to justify your answer!
bind_rows(
.id = "model",
# Add your interaction model in the function below
"interaction" = get_regression_summaries(lm_int),
# Add your additive model in the function below
"additive" = get_regression_summaries(lm_add)
)
## # A tibble: 2 × 10
## model r_squared adj_r_squared mse rmse sigma statistic p_value df nobs
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 inter… 0.889 0.889 66.1 8.13 8.13 10788. 0 3 4047
## 2 addit… 0.889 0.889 66.2 8.13 8.14 16164. 0 2 4047
Since the \(R^2\) value is the same between the two models (to 3 decimal places), we should use the simpler additive model.