In this data set we have data from the 2008 NFL season. More specifically we have factors that go into NFL fielgoals. Some variables include the kicking team, Name, Distance, timerem, defscore, and GOOD.
Kicking team - Name of the kicking team (categorical) Name - Name of the kicker Distance - How far the ball is from the goal Time Remaining - How much time is on the game clock remaining in the game Defensive Score - The score of the opposing team GOOD - If the field goal is made or not, a 1 for a make and 0 for a miss
From general knowledge most fans assume that the longer the distance it becomes less likely for a field goal to be made. Our question for this analysis is to see if this statement remains true. We will be exploring the association between a made field goal and distance
fieldgoals <- read.csv("https://raw.githubusercontent.com/TylerBattaglini/STA-321/refs/heads/main/nfl2008_fga.csv", header = TRUE)
clean_fieldgoals <- na.omit(fieldgoals)
clean_fieldgoals <- clean_fieldgoals %>% select(-GameDate, -AwayTeam, -HomeTeam, -qtr, -min, -sec, -def, -down, -togo, -kicker, -ydline, -homekick, -offscore, -season, -Missed, -Blocked)
head(clean_fieldgoals)
kickteam name distance kickdiff timerem defscore GOOD
1 IND A.Vinatieri 30 -3 2822 3 1
2 IND A.Vinatieri 46 0 3287 0 1
3 IND A.Vinatieri 28 7 2720 0 1
4 IND A.Vinatieri 37 14 2742 0 1
5 IND A.Vinatieri 39 0 3056 0 1
6 IND A.Vinatieri 40 -3 3043 3 1
We take out any observations with a missing value. We also take out many variables due to there being a high likeleyhood for multicollineairty. We already have a variable for time so we eliminated many variables related to time. We also already have a variable for a make so we do not need any for a miss or blocked, that would just be a repeat our data. The others are just categorical variables that are to identify the kicker or kicking team which again we already have variables that describe that.
library(psych)
pairs.panels(clean_fieldgoals[,-9],
method = "pearson",
hist.col = "#00AFBB",
density = TRUE,
ellipses = TRUE
)
All of our predictor values are unimodal except for defscore.
par(mfrow=c(1,2))
hist(clean_fieldgoals$defscore, xlab="defscore", main = "")
Based on the histogram above we discretize defscore
defscore = clean_fieldgoals$defscore
grp.defscore = defscore
grp.defscore[defscore %in% c(0:10)] = "1-10"
grp.defscore[defscore %in% c(11:18)] = "11-18"
grp.defscore[defscore %in% c(19:26)] = "19-26"
grp.defscore[defscore %in% c(27:99)] = "27+"
clean_fieldgoals$grp.defscore = grp.defscore
head(grp.defscore)
[1] "1-10" "1-10" "1-10" "1-10" "1-10" "1-10"
There is some correlation between the variables kickdiff vs defscore and kickdiff vs timerem.
In our smallest and final model we want to have kick difficulty, distance, time remaining because we know distance is a big indicator on whether or not a field goal is good or not and the same goes for kick difficulty and time remaining.
full.model = glm(GOOD ~grp.defscore + kickteam + name + distance + kickdiff + timerem,
family = binomial(link = "logit"),
data = clean_fieldgoals)
kable(summary(full.model)$coef,
caption="Summary of inferential statistics of the full model")
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | 7.2558776 | 0.9592165 | 7.5643791 | 0.0000000 |
grp.defscore11-18 | 0.3900319 | 0.3310042 | 1.1783290 | 0.2386655 |
grp.defscore19-26 | 1.0910882 | 0.5275585 | 2.0681844 | 0.0386227 |
grp.defscore27+ | -0.1364864 | 0.5812398 | -0.2348194 | 0.8143489 |
kickteamATL | 0.2156559 | 0.9976163 | 0.2161711 | 0.8288544 |
kickteamBAL | -0.5216642 | 1.5651327 | -0.3333035 | 0.7389052 |
kickteamBUF | -0.7825280 | 0.7983419 | -0.9801915 | 0.3269916 |
kickteamCAR | 0.1170202 | 0.9133814 | 0.1281176 | 0.8980559 |
kickteamCHI | 0.6797891 | 1.2205735 | 0.5569424 | 0.5775668 |
kickteamCIN | -0.5650050 | 0.9285361 | -0.6084901 | 0.5428625 |
kickteamCLE | -0.6753615 | 0.8309901 | -0.8127191 | 0.4163791 |
kickteamDAL | 0.3877854 | 1.0088755 | 0.3843739 | 0.7007013 |
kickteamDEN | -1.1697085 | 0.7938203 | -1.4735180 | 0.1406114 |
kickteamDET | 15.8239761 | 757.6724102 | 0.0208850 | 0.9833374 |
kickteamGB | -0.3063564 | 0.8779672 | -0.3489383 | 0.7271357 |
kickteamHOU | -0.0434635 | 0.9242212 | -0.0470271 | 0.9624916 |
kickteamIND | -0.6654705 | 0.8746319 | -0.7608578 | 0.4467420 |
kickteamJAC | -0.7257592 | 0.8553650 | -0.8484790 | 0.3961713 |
kickteamKC | -2.2742405 | 0.9940930 | -2.2877543 | 0.0221518 |
kickteamMIA | -0.0259070 | 0.9163971 | -0.0282705 | 0.9774464 |
kickteamMIN | 0.3136299 | 0.9160708 | 0.3423642 | 0.7320768 |
kickteamNE | -0.4881056 | 0.8596060 | -0.5678248 | 0.5701540 |
kickteamNO | -1.9313398 | 1.4334513 | -1.3473355 | 0.1778722 |
kickteamNYG | 12.6228135 | 3956.1804062 | 0.0031907 | 0.9974542 |
kickteamNYJ | -21.0269475 | 3956.1803921 | -0.0053150 | 0.9957593 |
kickteamOAK | -0.5166534 | 0.8982161 | -0.5751995 | 0.5651564 |
kickteamPHI | -0.2582728 | 0.8231102 | -0.3137767 | 0.7536907 |
kickteamPIT | -0.3420525 | 0.8700066 | -0.3931608 | 0.6942007 |
kickteamSD | -0.2351465 | 0.9250057 | -0.2542108 | 0.7993327 |
kickteamSEA | -0.2348942 | 0.9248464 | -0.2539819 | 0.7995096 |
kickteamSF | -0.3368254 | 0.8686680 | -0.3877493 | 0.6982016 |
kickteamSTL | 0.0564636 | 0.8434508 | 0.0669435 | 0.9466267 |
kickteamTB | -0.8694446 | 0.8198702 | -1.0604662 | 0.2889326 |
kickteamTEN | -0.2234026 | 0.8285824 | -0.2696203 | 0.7874524 |
kickteamWAS | -1.2528242 | 0.7780121 | -1.6102888 | 0.1073348 |
nameC.Barth | 0.6544918 | 1.1324063 | 0.5779655 | 0.5632874 |
nameD.Rayner | 13.8651583 | 3956.1803931 | 0.0035047 | 0.9972037 |
nameG.Hartley | 16.4788689 | 988.0657044 | 0.0166779 | 0.9866936 |
nameJ.Carney | -12.6449218 | 3956.1803983 | -0.0031962 | 0.9974498 |
nameJ.Feely | 20.1237309 | 3956.1803830 | 0.0050867 | 0.9959415 |
nameM.Gramatica | 0.9339553 | 1.4794047 | 0.6313048 | 0.5278412 |
nameM.Stover | -0.2247927 | 1.5178776 | -0.1480967 | 0.8822665 |
distance | -0.1310321 | 0.0137846 | -9.5056979 | 0.0000000 |
kickdiff | 0.0094157 | 0.0135669 | 0.6940249 | 0.4876666 |
timerem | 0.0001139 | 0.0001443 | 0.7897314 | 0.4296846 |
reduced.model = glm(GOOD ~ distance + timerem + kickdiff + grp.defscore,
family = binomial(link = "logit"),
data = clean_fieldgoals)
kable(summary(reduced.model)$coef,
caption="Summary of inferential statistics of the reduced model")
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | 6.4367599 | 0.6185545 | 10.4061313 | 0.0000000 |
distance | -0.1215100 | 0.0125796 | -9.6592898 | 0.0000000 |
timerem | 0.0001100 | 0.0001338 | 0.8218077 | 0.4111863 |
kickdiff | 0.0099186 | 0.0123044 | 0.8061065 | 0.4201815 |
grp.defscore11-18 | 0.2987973 | 0.3073211 | 0.9722642 | 0.3309191 |
grp.defscore19-26 | 1.2527248 | 0.4951909 | 2.5297816 | 0.0114134 |
grp.defscore27+ | -0.0516108 | 0.5375298 | -0.0960148 | 0.9235088 |
library(MASS)
final.model.forward = stepAIC(reduced.model,
scope = list(lower=formula(reduced.model),upper=formula(full.model)),
direction = "forward",
trace = 0
)
kable(summary(final.model.forward)$coef,
caption="Summary of inferential statistics of the final model")
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | 6.4367599 | 0.6185545 | 10.4061313 | 0.0000000 |
distance | -0.1215100 | 0.0125796 | -9.6592898 | 0.0000000 |
timerem | 0.0001100 | 0.0001338 | 0.8218077 | 0.4111863 |
kickdiff | 0.0099186 | 0.0123044 | 0.8061065 | 0.4201815 |
grp.defscore11-18 | 0.2987973 | 0.3073211 | 0.9722642 | 0.3309191 |
grp.defscore19-26 | 1.2527248 | 0.4951909 | 2.5297816 | 0.0114134 |
grp.defscore27+ | -0.0516108 | 0.5375298 | -0.0960148 | 0.9235088 |
global.measure=function(s.logit){
dev.resid = s.logit$deviance
dev.0.resid = s.logit$null.deviance
aic = s.logit$aic
goodness = cbind(Deviance.residual =dev.resid, Null.Deviance.Residual = dev.0.resid,
AIC = aic)
goodness
}
goodness=rbind(full.model = global.measure(full.model),
reduced.model=global.measure(reduced.model),
final.model=global.measure(final.model.forward))
row.names(goodness) = c("full.model", "reduced.model", "final.model")
kable(goodness, caption ="Comparison of global goodness-of-fit statistics")
Deviance.residual | Null.Deviance.Residual | AIC | |
---|---|---|---|
full.model | 630.8280 | 809.6515 | 720.8280 |
reduced.model | 676.9963 | 809.6515 | 690.9963 |
final.model | 676.9963 | 809.6515 | 690.9963 |
In our exploritory analysis, we saw that two varibales were correlated. After automoatic varribale selction we dropped kickteam and name. Although some varibales in defscore, kick difficulty, and time remaining we still keep them in our model because they are important to our study.
model.coef.stats = summary(final.model.forward)$coef
odds.ratio = exp(coef(final.model.forward))
out.stats = cbind(model.coef.stats, odds.ratio = odds.ratio)
kable(out.stats,caption = "Summary Stats with Odds Ratios")
Estimate | Std. Error | z value | Pr(>|z|) | odds.ratio | |
---|---|---|---|---|---|
(Intercept) | 6.4367599 | 0.6185545 | 10.4061313 | 0.0000000 | 624.3804346 |
distance | -0.1215100 | 0.0125796 | -9.6592898 | 0.0000000 | 0.8855822 |
timerem | 0.0001100 | 0.0001338 | 0.8218077 | 0.4111863 | 1.0001100 |
kickdiff | 0.0099186 | 0.0123044 | 0.8061065 | 0.4201815 | 1.0099680 |
grp.defscore11-18 | 0.2987973 | 0.3073211 | 0.9722642 | 0.3309191 | 1.3482363 |
grp.defscore19-26 | 1.2527248 | 0.4951909 | 2.5297816 | 0.0114134 | 3.4998666 |
grp.defscore27+ | -0.0516108 | 0.5375298 | -0.0960148 | 0.9235088 | 0.9496984 |
In our grp.defscore varibale group we have 4 categories. The baseline category is from 0-10 defensive score. We see from the table above that the odds ratio of defensive score 19-26 is 3.5, meaning when all our other varibles are at the same level the odds of a made field goal with a defensive score of 19-26 is about 3.5 times more likley than our baseline of 0-10. But the same ratio goes down by .05 when comparing scores 27+ with the baseline of 0-10
This analysis focused on the association between the factors that go into a made field goal. After explortaory analysis we re-grouped our varibale of defensive score and defined dummy variables.
Since we know that distance, kick difficulty and defensive score have a big influence on a mde field goal we include these factors even though they are not significant. After varibale selection we are left with distnace and defensive score 19-26 and our non significant values of kick difficulty, time remianing, and the remaining of our dummy varibales for defensive score.
From previous football knowldge a majority of games are decided in the 20-28 range. We may be able to draw the conclusion that in high leverage situations a kicker makes the field goal. But we also have a part of our analysis that shows when the defensive score is 27+ a make goes down but we most lilely have a smaller sample which would make it less significant. So we cannot draw any conlusions about our analysis and will have to take a deeper look into our analysis to draw more conclusions.