ESPN has a metric it uses to judge a quarterback’s (QB) performance called Quarterback Rating (QBR), and how it is calculated is kept a secret. The qbr data.csv file has the QBR rating and game statistics for all quarterback and game performances.
The columns in the csv file are:
qbr (response variable): The quarterback rating assigned by ESPN and will be between 0 and 100 (Larger -> Better)
pts_added: An advanced metric that measures how many points the quarterback added compared to the “average” QB play. Measured in points and the higher the better. Negative means performed below average, positive means performed above average
interceptions: How many interceptions the QB committed. The values are “none”, “one”, and “multiple”. More is bad, fewer interceptions is better.
yds_attempt: The number of yards gained from passes divided by the number of passes attempted. The larger the number, the better.
The games are split into two data sets:
qbr_df: The data set to use the build the linear regression models
qbr23: A data frame for the 2023 NFL season you’ll use to make predictions after building the model.
For this assignment, you’ll be using points added to predict ESPN’s QBR.
Create the appropriate graphs to display QBR and points added individually. Describe any important characteristics for each variable.
qbr_df |>
pivot_longer(
cols = c(qbr, pts_added),
names_to = "stat",
values_to = 'score'
) |>
ggplot(
mapping = aes(
x = score
)
) +
geom_density(
fill = "blue"
) +
# Separate density plots for points added and qbr
facet_wrap(
facet = vars(stat),
scales = "free"
) +
# Have the density curves sit on the x-axis
scale_y_continuous(
expand = c(0, 0, 0.05, 0)
) +
labs(x = NULL)
Both variables appear to be approximately unimodal and symmetric
Create the appropriate graph to display the association between QBR and points added. Describe any important characteristics for each variable. Save the graph as gg_qbr and display it.
gg_qbr <-
ggplot(
data = qbr_df,
mapping = aes(
x = pts_added,
y = qbr
)
) +
geom_point() +
labs(
x = "Points Added Above the Average by the Quarterback",
y = "ESPN's Quarterback Rating"
)
gg_qbr
There is a positive direction, no clear outliers, a somewhat linear (somewhat S shaped) trend, and strong relationship
Calculate the appropriate summary of the association between points added and QBR. Does it indicate a strong association?
cor(
x = qbr_df$pts_added,
y = qbr_df$qbr
)
## [1] 0.9419234
Yes, the correlation is over 0.9, indicating a strong association!
Create the linear model to predict QBR using points added and
display the results using get_regression_table()
or
summary()
. Add the best fitting line to gg_qbr.
qbr_lm <-
lm(formula = qbr ~ pts_added,
data = qbr_df)
get_regression_table(qbr_lm)
## # A tibble: 2 × 7
## term estimate std_error statistic p_value lower_ci upper_ci
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 intercept 50.4 0.137 367. 0 50.1 50.6
## 2 pts_added 6.08 0.036 170. 0 6.01 6.15
# Adding the best fitting line to the scatterplot
gg_qbr +
geom_smooth(
method = "lm",
se = F,
formula = y ~ x
)
Interpret the slope in context of the data:
For every additional point added above average, the QBR is predicted to increase by 6.08
Interpret the intercept in context of the data:
If the points added above average is 0 (AKA, the average), the predicted QBR is 50.4
Use the model you created to predict the QBR for the qbr23 and create a set that has the actual QBR and predicted QBR for the 2023 games. Display the first 10 rows.
qbr_pred <-
data.frame(
qbr = qbr23$qbr,
qbr_hat = predict(object = qbr_lm, newdata = qbr23)
)
head(qbr_pred, n = 10)
## qbr qbr_hat
## 2023_1_B.Purdy 93.4 84.42034
## 2023_1_T.Tagovailoa 85.9 90.50407
## 2023_1_J.Love 79.6 64.34402
## 2023_1_D.Prescott 75.6 60.69379
## 2023_1_J.Goff 68.3 57.65192
## 2023_1_T.Lawrence 66.2 59.47704
## 2023_1_P.Mahomes 66.2 74.68637
## 2023_1_D.Watson 61.7 63.12728
## 2023_1_B.Mayfield 58.3 57.04355
## 2023_1_D.Carr 56.5 61.30216
Create a graph that shows the predicted QBR on the x-axis and
the actual QBR on the y-axis (this type of graph is called an R-squared
plot). Add geom_smooth()
to draw the line through the
graph
ggplot(
data = qbr_pred,
mapping = aes(
x = qbr_hat,
y = qbr
)
) +
geom_point() +
geom_smooth(
method = "lm",
se = F,
formula = y ~ x
) +
labs(
x = "Predicted QBR",
y = "Actual QBR",
title = "The predicted and actual QBR for the 2023 season"
)
Create the residual plot for your linear model in the code chunk below
get_regression_points(qbr_lm) |>
ggplot(
mapping = aes(
x = pts_added,
y = residual
)
) +
# Adding the points and making them somewhat see thru
geom_point(alpha = 0.5) +
# Adding a horizontal line at 0
geom_hline(
mapping = aes(yintercept = mean(residual)),
color = "red",
linewidth = 1
) +
labs(
x = "Expected Points Added above Average",
y = "Residuals",
title = "Residual plot for QBR Model"
)
Is the linear model appropriate to predict QBR from points added?
No, there is a noticable curve in the residual plot
Calculate the two fit statistics for the linear model. How well does the model make predictions? Use the fit statistics to justify your answer
get_regression_summaries(qbr_lm)
## # A tibble: 1 × 9
## r_squared adj_r_squared mse rmse sigma statistic p_value df nobs
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.887 0.887 67.0 8.19 8.19 28950. 0 1 3682
The \(R^2\) value is high (almost 0.9) and the rmse is about 8.2, indicating that the typical prediction error is about 8.2 points off the actual QBR.