Introduction
Description
In this data set we have data from the 2008 NFL season. More
specifically we have factors that go into NFL fielgoals. Some variables
include the kicking team, Name, Distance, timerem, defscore, and
GOOD.
Kicking team - Name of the kicking team (categorical) Name - Name of
the kicker Distance - How far the ball is from the goal Time Remaining -
How much time is on the game clock remaining in the game Defensive Score
- The score of the opposing team GOOD - If the field goal is made or
not, a 1 for a make and 0 for a miss
Question
From general knowledge most fans assume that the longer the distance
it becomes less likely for a field goal to be made. Our question for
this analysis is to see if this statement remains true. We will be
exploring the association between a made field goal and distance
Data Cleaning
fieldgoals <- read.csv("https://raw.githubusercontent.com/TylerBattaglini/STA-321/refs/heads/main/nfl2008_fga.csv", header = TRUE)
clean_fieldgoals <- na.omit(fieldgoals)
clean_fieldgoals <- clean_fieldgoals %>% select(-GameDate, -AwayTeam, -HomeTeam, -qtr, -min, -sec, -def, -down, -togo, -kicker, -ydline, -homekick, -offscore, -season, -Missed, -Blocked)
y0=clean_fieldgoals$GOOD
fieldgoal.01 = rep(0, length(y0))
fieldgoal.01[which(y0=="pos")] = 1
clean_fieldgoals$fieldgoal.01 = fieldgoal.01
head(clean_fieldgoals)
kickteam name distance kickdiff timerem defscore GOOD fieldgoal.01
1 IND A.Vinatieri 30 -3 2822 3 1 0
2 IND A.Vinatieri 46 0 3287 0 1 0
3 IND A.Vinatieri 28 7 2720 0 1 0
4 IND A.Vinatieri 37 14 2742 0 1 0
5 IND A.Vinatieri 39 0 3056 0 1 0
6 IND A.Vinatieri 40 -3 3043 3 1 0
We take out any observations with a missing value. We also take out
many variables due to there being a high likeleyhood for
multicollineairty. We already have a variable for time so we eliminated
many variables related to time. We also already have a variable for a
make so we do not need any for a miss or blocked, that would just be a
repeat our data. The others are just categorical variables that are to
identify the kicker or kicking team which again we already have
variables that describe that.
Data Analysis
ylimit = max(density(clean_fieldgoals$distance)$y)
hist(clean_fieldgoals$distance, probability = TRUE, main = "Distance", xlab="Dis",
col = "azure1", border="lightseagreen")
lines(density(clean_fieldgoals$distance, adjust=2), col="blue")

We do an exploritory data anylsis on our predictor variable. We see
from the histogram above that there is no skew which means there is no
imbalanace.
s.logit = glm(GOOD ~ distance,
family = binomial(link = "logit"),
data = clean_fieldgoals)
result = summary(s.logit)
result
Call:
glm(formula = GOOD ~ distance, family = binomial(link = "logit"),
data = clean_fieldgoals)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 6.7056 0.5480 12.236 <2e-16 ***
distance -0.1194 0.0124 -9.631 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 809.65 on 1036 degrees of freedom
Residual deviance: 686.12 on 1035 degrees of freedom
AIC: 690.12
Number of Fisher Scoring iterations: 6
model.coef.stats = summary(s.logit)$coef
conf.ci = confint(s.logit)
Waiting for profiling to be done...
sum.stats = cbind(model.coef.stats, conf.ci.95=conf.ci)
kable(sum.stats,caption = "The summary stats of regression coefficients")
The summary stats of regression coefficients
(Intercept) |
6.7056029 |
0.5480144 |
12.236179 |
0 |
5.6755150 |
7.8271583 |
distance |
-0.1194428 |
0.0124020 |
-9.630942 |
0 |
-0.1445903 |
-0.0958982 |
From ouroutput above we see that distance is negatively asscoiated
with a made field goal. Our estimate is equal to -.1194. Our 95% CI is
[-.144, -.095]. This confidence interval also supports our
hypothesis.
model.coef.stats = summary(s.logit)$coef
odds.ratio = exp(coef(s.logit))
out.stats = cbind(model.coef.stats, odds.ratio = odds.ratio)
kable(out.stats,caption = "Summary Stats with Odds Ratios")
Summary Stats with Odds Ratios
(Intercept) |
6.7056029 |
0.5480144 |
12.236179 |
0 |
816.9704676 |
distance |
-0.1194428 |
0.0124020 |
-9.630942 |
0 |
0.8874148 |
Now we convert our estimate to an odds ratio. The odds ratio
associated with distance is .887 meaning that as distance increases by
one unit, the odds of being a made field goal goes down by 11.3%.
bmi.range = range(clean_fieldgoals$distance)
x = seq(bmi.range[1], bmi.range[2], length = 200)
beta.x = coef(s.logit)[1] + coef(s.logit)[2]*x
success.prob = exp(beta.x)/(1+exp(beta.x))
failure.prob = 1/(1+exp(beta.x))
ylimit = max(success.prob, failure.prob)
##
beta1 = coef(s.logit)[2]
success.prob.rate = beta1*exp(beta.x)/(1+exp(beta.x))^2
##
##
par(mfrow = c(1,2))
plot(x, success.prob, type = "l", lwd = 2, col = "navy",
main = "The probability of being \n a made field goal",
ylim=c(0, 1.1*ylimit),
xlab = "distance",
ylab = "probability",
axes = FALSE,
col.main = "navy",
cex.main = 0.8)
# lines(x, failure.prob,lwd = 2, col = "darkred")
axis(1, pos = 0)
axis(2)

The graph above is our S curve which is pointing down like we think
it would because it shows probability of a made field goal as distance
goes up. We see that the probability of a field goal goes down as the
distance goes up.
