This is a data set of the resultd of all NFL regular season field goal attempts for the 2008 season. There are 1039 observations with 23 variables The variables are: 1. GameDate 2. AwayTeam 3. HomeTeam 4. qtr (quarter, 5=overtime) 5. min (minutes remaining) 6. sec (seconds remaining, added to minutes) 7. kickteam (team kicking field goal) 8. def (defending team) 9. down 10. togo (yards to go for 1st down) 11. kicker (ID #) 12. ydline (yardline of kicking team) 13. name (kicker’s name) 14. distance (yards) 15. homekick (1 if kicker at Home, 0 if Away) 16. kickdiff (kicking team lead +, or deficit -, prior to kick) 17. timerem (Time remaining in seconds, negative = overtime) 18. offscore (kicking team’s score prior to kick) 19. defscore (defense team’s score prior to kick) 20. season (2008) 21. GOOD (1 is Success, 0 is Miss) 22. Missed (Missed, not blocked = -1, 0 ow) 23. Blocked (1 if Blocked, 0 ow)
The objective of this data is to explore the association between distance from the goal and if the kick was good.
Since we only study the simple logistic regression model, only one predictor variable is included in the model. We first perform exploratory data analysis on the predictor variable to make sure the variable is not extremely skewed.
Since the simple logistic regression contains only one continuous
variable of a binary categorical variable as the predictor variable, no
there is no issue of potential imbalance. We will not transform distance
and fit a logistic regression directly to the data.
## Waiting for profiling to be done...
| Estimate | Std. Error | z value | Pr(>|z|) | 2.5 % | 97.5 % | |
|---|---|---|---|---|---|---|
| (Intercept) | 6.7627078 | 0.5444277 | 12.42168 | 0 | 5.7399740 | 7.877643 |
| distance | -0.1208357 | 0.0122852 | -9.83590 | 0 | -0.1457751 | -0.097540 |
From the table above, distance is negatively associated with the success of the kick which is expected. This is because the further away you are from the goal line, the smaller chance you have of scoring.
Next, we convert the estimated regression coefficients to the odds ratio.
| Estimate | Std. Error | z value | Pr(>|z|) | odds.ratio | |
|---|---|---|---|---|---|
| (Intercept) | 6.7627078 | 0.5444277 | 12.42168 | 0 | 864.9812621 |
| distance | -0.1208357 | 0.0122852 | -9.83590 | 0 | 0.8861796 |
The odds ratio associated with distance is 0.886 meaning that as the distance increases by one yard, the odds of having a successful field goal decrease by 12%.
Some global goodness-of-fit measures are summarized in the following table.
| Deviance.residual | Null.Deviance.Residual | AIC |
|---|---|---|
| 686.9 | 817.7 | 690.9 |
Since the above global goodness-of-fit is based on the likelihood function, we don’t have other candidate models with corresponding likelihood at the same scale to compare in this simple logistic regression model, we will not interpret these goodness-of-fit measures.
The success probability curve (so-called S curve) is given below.
The left-hand side plot in the above figure is the standard S curve representing how the probability of a successful field goal decreases as the distance decreases. After diving deeper to see the rate of change in the probability of a failed field goal, we obtain the curve on the right-hand side that indicates that the rate of change in the probability of successful field goals decreases when distance is less than 55 yards and inecreases when distance is greater than 55. The turning point is about 55 yards.