The Philadelphia Eagles had a shocking season in 2017-18 when they ultimately beat the New England Patriots and won the Super Bowl on February 4, 2018. “Could anyone have predicted that?” is a question that went wild during and after the season. We will use league data from the 2016-17 season to build your model, then use that model to predict the wins for the Eagles in the 2017-18 season.
nfl2016<-read.csv("https://raw.githubusercontent.com/kvaranyak4/STAT3220/main/nfl2016.csv")
head(nfl2016)
hist(nfl2016$W, xlab="Wins", main="Histogram of Wins")
We explore the scatter plots and correlations for each explanatory variable then classify the relationships as linear, curvilinear, or none.
names(nfl2016)
[1] "Tm" "Code" "W" "OffPY" "OffRY"
[6] "PA" "FGP" "Penalty" "TurnoverDiff" "OppPY"
[11] "OppRY"
par(mfcol=c(2, 2))
for (i in names(nfl2016)[4:11]) {
plot(nfl2016[,i], nfl2016$W,xlab=i,ylab="Wins")
}
round(cor(nfl2016[4:11],nfl2016$W),3)
[,1]
OffPY 0.375
OffRY 0.151
PA -0.675
FGP 0.328
Penalty 0.198
TurnoverDiff 0.760
OppPY 0.180
OppRY -0.421
\(wins=\beta_0+\beta_1 TurnoverDiff+\beta_2 PA+\beta_3 OppRy+\beta_4 OppRy^2+\epsilon\)
nfl1<-lm(W~TurnoverDiff+PA+OppRY+I(OppRY^2),data=nfl2016)
summary(nfl1)
Call:
lm(formula = W ~ TurnoverDiff + PA + OppRY + I(OppRY^2), data = nfl2016)
Residuals:
Min 1Q Median 3Q Max
-3.2532 -1.1877 0.3448 1.1127 3.1319
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.778e+01 9.809e+00 1.813 0.0810 .
TurnoverDiff 2.082e-01 3.991e-02 5.218 1.69e-05 ***
PA -1.637e-02 7.584e-03 -2.159 0.0399 *
OppRY -2.036e-03 1.011e-02 -0.201 0.8420
I(OppRY^2) -1.057e-07 2.654e-06 -0.040 0.9685
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.759 on 27 degrees of freedom
Multiple R-squared: 0.7373, Adjusted R-squared: 0.6984
F-statistic: 18.94 on 4 and 27 DF, p-value: 1.595e-07
round(coef(nfl1),4)
(Intercept) TurnoverDiff PA OppRY I(OppRY^2)
17.7827 0.2082 -0.0164 -0.0020 0.0000
The prediction equation is:
\(\widehat{wins}=17.78+0.21TurnoverDiff-0.016PA-0.002OppRy-.0000OppRy^2\)
\(wins=\beta_0+\beta_1 TurnoverDiff+\beta_2 PA+\beta_3 OppRy+\beta_4 OppRy^2+\epsilon\)
Hypotheses:
Distribution of test statistic: F with 4, 27 DF
Test Statistic: F=18.94
Pvalue: <0.0001
Decision: 0.0001<0.05 -> REJECT H0
Conclusion: The model with Turnover Differential, Points Against, Opponents Rushing Yards, and Opponents Rushing Yards squared is adequate at predicting wins.
Refitting the model can change significance of other variables and will always change the beta estimates. Here the estimates changed only slightly because the term we removed was not significant.
nfl2<-lm(W~TurnoverDiff+PA+OppRY,data=nfl2016)
summary(nfl2)
Call:
lm(formula = W ~ TurnoverDiff + PA + OppRY, data = nfl2016)
Residuals:
Min 1Q Median 3Q Max
-3.2464 -1.1798 0.3529 1.1188 3.1307
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 18.159540 2.546624 7.131 9.27e-08 ***
TurnoverDiff 0.208215 0.039182 5.314 1.18e-05 ***
PA -0.016401 0.007417 -2.211 0.0353 *
OppRY -0.002436 0.001171 -2.081 0.0467 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.727 on 28 degrees of freedom
Multiple R-squared: 0.7373, Adjusted R-squared: 0.7091
F-statistic: 26.19 on 3 and 28 DF, p-value: 2.812e-08
round(coef(nfl2),4)
(Intercept) TurnoverDiff PA OppRY
18.1595 0.2082 -0.0164 -0.0024
round(confint(nfl2),4)
2.5 % 97.5 %
(Intercept) 12.9430 23.3761
TurnoverDiff 0.1280 0.2885
PA -0.0316 -0.0012
OppRY -0.0048 0.0000
\(\widehat{wins}=18.16+0.21TurnoverDiff-0.016PA-0.002OppRy\)
We see from the global F test this is significant, and OppRy is significant in the model.
We will cover this in Unit 3
nfl2<-lm(W~TurnoverDiff+PA+OppRY,data=nfl2016)
# The Eagles are observation 25
predict(nfl2)
1 2 3 4 5 6 7 8
8.524117 9.717652 10.452034 8.020043 7.580750 2.700531 9.203390 2.685731
9 10 11 12 13 14 15 16
10.927055 8.625321 7.935686 9.770712 7.436671 5.997037 4.118728 11.668834
17 18 19 20 21 22 23 24
5.362968 6.869346 11.246377 13.105708 6.127438 9.633048 3.435469 10.593994
25 26 27 28 29 30 31 32
9.955402 9.939474 5.946754 2.780175 9.955950 7.956021 8.517495 7.210088
nfl2$fitted.values[25]
25
9.955402
# Or we can create a data frame with the new values.
# the explanatory variable names must match those in the model, but there is no response variable.
new<-data.frame(TurnoverDiff=6,PA=331,OppRY=1652)
new
predict(nfl2,new, interval="prediction")
fit lwr upr
1 9.955402 6.332131 13.57867
#to compute confidence intervals for estimation we would use interval="confidence"
Now it is time to go back to our original research question- Could we have predicted the number of wins for the Eagles in the 2017 season from the 2016 season? In the 2016 season the Eagles had 3585 offensive passing yards; 1813 offensive rushing yards; 331 points against; 85.4% FPG; 212 penalty yards; 6 turnover differential; 3832 opponents’ passing yards; and 1652 opponents’ rushing yards. Note this is also observation 25 in the data.
The Eagles won 13 regular season games in 2017. How accurate was this model at predicting?