ESPN has a metric it uses to judge a quarterback’s (QB) performance called Quarterback Rating (QBR), and how it is calculated is kept a secret. The qbr data.csv file has the QBR rating and game statistics for all quarterback and game performances.
The columns in the csv file are:
qbr (response variable): The quarterback rating assigned by ESPN and will be between 0 and 100 (Larger -> Better)
pts_added: An advanced metric that measures how many points the quarterback added compared to the “average” QB play. Measured in points and the higher the better. Negative means performed below average, positive means performed above average
interceptions: How many interceptions the QB committed. The values are “none”, “one”, and “multiple”. More is bad, fewer interceptions is better.
yds_attempt: The number of yards gained from passes divided by the number of passes attempted. The larger the number, the better.
The games are split into two data sets:
qbr_df: The data set to use the build the linear regression models
qbr23: A data frame for the 2023 NFL season you’ll use to make predictions after building the model.
For this assignment, you’ll be using points added to predict ESPN’s QBR.
Create the appropriate graphs to display QBR and points added individually. Describe any important characteristics for each variable.
qbr_df |> 
  pivot_longer(
    cols = c(qbr, pts_added),
    names_to = "stat",
    values_to = 'score'
  ) |> 
  ggplot(
    mapping = aes(
      x = score
    )
  ) + 
  
  geom_density(
    fill = "blue"
  ) +
  
  # Separate density plots for points added and qbr
  facet_wrap(
    facet = vars(stat),
    scales = "free"
  ) + 
  
  # Have the density curves sit on the x-axis
  scale_y_continuous(
    expand = c(0, 0, 0.05, 0)
  ) + 
  
  labs(x = NULL)
Both variables appear to be approximately unimodal and symmetric
Create the appropriate graph to display the association between QBR and points added. Describe any important characteristics for each variable. Save the graph as gg_qbr and display it.
gg_qbr <- 
  ggplot(
    data = qbr_df,
    mapping = aes(
      x = pts_added,
      y = qbr
    )
  ) + 
  
  geom_point() + 
  
  labs(
    x = "Points Added Above the Average by the Quarterback",
    y = "ESPN's Quarterback Rating"
  )
gg_qbr
There is a positive direction, no clear outliers, a somewhat linear (somewhat S shaped) trend, and strong relationship
Calculate the appropriate summary of the association between points added and QBR. Does it indicate a strong association?
cor(
  x = qbr_df$pts_added,
  y = qbr_df$qbr
)
## [1] 0.9419234
Yes, the correlation is over 0.9, indicating a strong association!
Create the linear model to predict QBR using points added and
display the results using get_regression_table() or
summary(). Add the best fitting line to gg_qbr.
qbr_lm <- 
  lm(formula = qbr ~ pts_added,
     data = qbr_df)
get_regression_table(qbr_lm)
## # A tibble: 2 × 7
##   term      estimate std_error statistic p_value lower_ci upper_ci
##   <chr>        <dbl>     <dbl>     <dbl>   <dbl>    <dbl>    <dbl>
## 1 intercept    50.4      0.137      367.       0    50.1     50.6 
## 2 pts_added     6.08     0.036      170.       0     6.01     6.15
# Adding the best fitting line to the scatterplot
gg_qbr +
  geom_smooth(
    method = "lm",
    se = F,
    formula = y ~ x
  )
Interpret the slope in context of the data:
For every additional point added above average, the QBR is predicted to increase by 6.08
Interpret the intercept in context of the data:
If the points added above average is 0 (AKA, the average), the predicted QBR is 50.4
Use the model you created to predict the QBR for the qbr23 and create a set that has the actual QBR and predicted QBR for the 2023 games. Display the first 10 rows.
qbr_pred <- 
  data.frame(
    qbr = qbr23$qbr,
    qbr_hat = predict(object = qbr_lm, newdata = qbr23)
  )
head(qbr_pred, n = 10)
##                      qbr  qbr_hat
## 2023_1_B.Purdy      93.4 84.42034
## 2023_1_T.Tagovailoa 85.9 90.50407
## 2023_1_J.Love       79.6 64.34402
## 2023_1_D.Prescott   75.6 60.69379
## 2023_1_J.Goff       68.3 57.65192
## 2023_1_T.Lawrence   66.2 59.47704
## 2023_1_P.Mahomes    66.2 74.68637
## 2023_1_D.Watson     61.7 63.12728
## 2023_1_B.Mayfield   58.3 57.04355
## 2023_1_D.Carr       56.5 61.30216
Create a graph that shows the predicted QBR on the x-axis and
the actual QBR on the y-axis (this type of graph is called an R-squared
plot). Add geom_smooth() to draw the line through the
graph
ggplot(
  data = qbr_pred,
  mapping = aes(
    x = qbr_hat,
    y = qbr
  )
) + 
  geom_point() + 
  geom_smooth(
    method = "lm",
    se = F,
    formula = y ~ x
  ) + 
  
  labs(
    x = "Predicted QBR",
    y = "Actual QBR",
    title = "The predicted and actual QBR for the 2023 season"
  )
Create the residual plot for your linear model in the code chunk below
get_regression_points(qbr_lm) |> 
  ggplot(
    mapping = aes(
      x = pts_added,
      y = residual
    )
  ) +
  
  # Adding the points and making them somewhat see thru
  geom_point(alpha = 0.5) +
  
  # Adding a horizontal line at 0
  geom_hline(
    mapping = aes(yintercept = mean(residual)),
    color = "red",
    linewidth = 1
  ) +
  
  labs(
    x = "Expected Points Added above Average",
    y = "Residuals",
    title = "Residual plot for QBR Model"
  )
Is the linear model appropriate to predict QBR from points added?
No, there is a noticable curve in the residual plot
Calculate the two fit statistics for the linear model. How well does the model make predictions? Use the fit statistics to justify your answer
get_regression_summaries(qbr_lm)
## # A tibble: 1 × 9
##   r_squared adj_r_squared   mse  rmse sigma statistic p_value    df  nobs
##       <dbl>         <dbl> <dbl> <dbl> <dbl>     <dbl>   <dbl> <dbl> <dbl>
## 1     0.887         0.887  67.0  8.19  8.19    28950.       0     1  3682
The \(R^2\) value is high (almost 0.9) and the rmse is about 8.2, indicating that the typical prediction error is about 8.2 points off the actual QBR.