shot_logs <- read.csv("C:/Users/13177/OneDrive/Stats for Data Science/filtered_shot_logs.csv")
head(shot_logs)
## GAME_ID MATCHUP LOCATION W.L FINAL_MARGIN SHOT_NUMBER
## 1 21400899 MAR 04, 2015 - CHA @ BKN A W 24 1
## 2 21400899 MAR 04, 2015 - CHA @ BKN A W 24 2
## 3 21400899 MAR 04, 2015 - CHA @ BKN A W 24 3
## 4 21400899 MAR 04, 2015 - CHA @ BKN A W 24 4
## 5 21400899 MAR 04, 2015 - CHA @ BKN A W 24 5
## 6 21400899 MAR 04, 2015 - CHA @ BKN A W 24 6
## PERIOD GAME_CLOCK SHOT_CLOCK DRIBBLES TOUCH_TIME SHOT_DIST PTS_TYPE
## 1 1 1:09 10.8 2 1.9 7.7 2
## 2 1 0:14 3.4 0 0.8 28.2 3
## 3 1 0:00 NA 3 2.7 10.1 2
## 4 2 11:47 10.3 2 1.9 17.2 2
## 5 2 10:34 10.9 2 2.7 3.7 2
## 6 2 8:15 9.1 2 4.4 18.4 2
## SHOT_RESULT CLOSEST_DEFENDER CLOSEST_DEFENDER_PLAYER_ID CLOSE_DEF_DIST FGM
## 1 made Anderson, Alan 101187 1.3 1
## 2 missed Bogdanovic, Bojan 202711 6.1 0
## 3 missed Bogdanovic, Bojan 202711 0.9 0
## 4 missed Brown, Markel 203900 3.4 0
## 5 missed Young, Thaddeus 201152 1.1 0
## 6 missed Williams, Deron 101114 2.6 0
## PTS player_name player_id
## 1 2 brian roberts 203148
## 2 0 brian roberts 203148
## 3 0 brian roberts 203148
## 4 0 brian roberts 203148
## 5 0 brian roberts 203148
## 6 0 brian roberts 203148
For this data dive, we are building a logistic regression model to predict a binary outcome from the NBA Shot Logs dataset. We’ll interpret the model and calculate a confidence interval for one of the coefficients.
We’ll use the FGM (Field Goal Made) column:
It is binary: 1 = shot made, 0 = shot missed.
It’s an important performance metric in basketball.
shot_logs <- shot_logs |> mutate(FGM = as.factor(FGM))
We’ll use the following:
SHOT_DIST: Shot distance in feet
CLOSE_DEF_DIST: Defender’s distance in feet
SHOT_CLOCK: Time left on the shot clock
DRIBBLES: Number of dribbles before the shot
These variables may reasonably influence whether a shot is made.
logit_model <- glm(FGM ~ SHOT_DIST + CLOSE_DEF_DIST + SHOT_CLOCK + DRIBBLES,
data = shot_logs,
family = binomial(link = "logit"))
summary(logit_model)
##
## Call:
## glm(formula = FGM ~ SHOT_DIST + CLOSE_DEF_DIST + SHOT_CLOCK +
## DRIBBLES, family = binomial(link = "logit"), data = shot_logs)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.0129298 0.0192101 0.673 0.501
## SHOT_DIST -0.0600867 0.0008597 -69.894 <2e-16 ***
## CLOSE_DEF_DIST 0.1044951 0.0028132 37.144 <2e-16 ***
## SHOT_CLOCK 0.0178157 0.0010562 16.868 <2e-16 ***
## DRIBBLES -0.0198401 0.0017884 -11.094 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 168479 on 122199 degrees of freedom
## Residual deviance: 161997 on 122195 degrees of freedom
## (5553 observations deleted due to missingness)
## AIC: 162007
##
## Number of Fisher Scoring iterations: 4
Coefficient Interpretations (in log-odds):
(Intercept) = 0.0129 (Not statistically significant) When all other variables are 0, the log-odds of making the shot is ~0.013.
Since p = 0.501, it’s not statistically significant and we generally don’t interpret it alone here.
SHOT_DIST = -0.0601 (Significant, p < 2e-16) For every additional foot away from the basket, the log-odds of making the shot decreases by ~0.0601.
In terms of odds, this means:
exp(-0.0601)
## [1] 0.9416704
Each additional foot decreases the odds of making the shot by ~5.8%.
CLOSE_DEF_DIST = +0.1045 (Significant, p < 2e-16) For each extra foot of space from the defender, the log-odds of making the shot increases by ~0.1045.
In odds:
exp(0.1045)
## [1] 1.110155
Each foot of defender space increases the odds of making the shot by ~11%.
SHOT_CLOCK = +0.0178 (Significant, p < 2e-16) For each additional second left on the shot clock, the log-odds of making the shot increases by ~0.0178.
In odds:
exp(0.0178)
## [1] 1.017959
Each second increases odds of making the shot by ~1.8%. More time left = better shot quality, Less time left = worse shot quality.
DRIBBLES = -0.0198 (Significant, p < 2e-16) For each extra dribble before the shot, the log-odds of making it decreases by ~0.0198.
In odds:
exp(-0.0198)
## [1] 0.9803947
Each dribble reduces shot success odds by ~2%, possibly due to tougher, more contested shots.
exp(coef(logit_model))
## (Intercept) SHOT_DIST CLOSE_DEF_DIST SHOT_CLOCK DRIBBLES
## 1.0130138 0.9416829 1.1101499 1.0179753 0.9803554
Each odds ratio tells us how the odds of making a shot change with a one-unit increase in that predictor.
What This Tells Us About Shot Success: Longer shots = lower success (makes sense, harder to make from far).
More space = more success (less defensive pressure).
More time = better decisions = better outcomes.
More dribbles = lower odds, possibly due to tougher, rushed, or more contested shots.
coef_shot_dist <- coef(summary(logit_model))["SHOT_DIST", "Estimate"]
se_shot_dist <- coef(summary(logit_model))["SHOT_DIST", "Std. Error"]
ci_lower <- coef_shot_dist - 1.96 * se_shot_dist
ci_upper <- coef_shot_dist + 1.96 * se_shot_dist
exp(c(ci_lower, ci_upper))
## [1] 0.9400975 0.9432709
This means:
We are 95% confident that each additional foot of shot distance reduces the odds of making the shot by between ~5.7% and ~6.0%.
Here’s how we break it down:
The odds ratio for SHOT_DIST is ~0.942.
The 95% CI is entirely below 1 - this means the effect is statistically significant and consistently negative.
Specifically:
Lower bound = 0.9401 - implies max effect of ~6.0% decrease in odds.
Upper bound = 0.9433 - implies minimum effect of ~5.7% decrease in odds.
Every extra foot a player moves away from the basket reduces their odds of making a shot by about 5.7–6.0%, with 95% confidence — all else being equal.
In this data dive, we modeled the probability of a shot being made (FGM = 1) using a logistic regression model with four explanatory variables:
SHOT_DIST (distance of the shot in feet)
CLOSE_DEF_DIST (distance to the nearest defender)
SHOT_CLOCK (seconds left on the shot clock)
DRIBBLES (number of dribbles before the shot)
Key Findings: SHOT_DIST had a significant negative effect on shot success.
Odds ratio: 0.942
95% Confidence Interval: [0.9401, 0.9433]
Interpretation: Each additional foot away from the basket reduces the odds of making the shot by ~5.7–6.0%, with high confidence.
CLOSE_DEF_DIST had a positive effect on shot success.
Odds ratio: 1.110
Interpretation: Each extra foot of defender space increases the odds of making the shot by ~11%.
SHOT_CLOCK had a modest positive effect.
Odds ratio: 1.018
Interpretation: More time on the clock slightly increases the chances of success (~1.8% per second).
DRIBBLES had a negative effect.
Odds ratio: 0.980
Interpretation: Each extra dribble reduces shot success odds by ~2%, likely indicating more contested or complex shot situations.
Model Insights: All predictors were statistically significant (p < 0.001).
The model gives interpretable and directionally valid results based on basketball intuition.
The confidence interval for SHOT_DIST confirms its impact is both statistically significant and practically meaningful.
Final Takeaway Shot success in the NBA is strongly influenced by distance to the basket, defender proximity, and shot timing. This logistic model provides a useful statistical lens to understand these patterns and can be used to further explore player-specific performance, strategic decisions, or shot quality metrics.