NBA Dataset

Loading in the data:
Filtering out unnecessary games:

Setting the Stage for the Analysis

Selecting a Binary Column of Data

For this week’s data dive, I want to investigate what contributes to a team’s ability to win games. In order to do this, I will focus on the “wl_home” variable, which is a binary “w” or “l” depending on whether the home team won or lost the game in question.

Identifying Contributing Variables

Potential explanatory variables include a the home teams’ field goal percentage (how efficient the home team is shooting), their rebounding totals, their turnover totals, and assist totals. I selected these because each are important statistics that contribute to a team’s success on the court, but also are isolated enough where we should be able to limit collinearity.

Creating the Model

log_model <- glm(
  home_win ~ fg_pct_home + reb_home + tov_home + ast_home,
  data = NBA_Data,
  family = binomial
)

summary(log_model)
## 
## Call:
## glm(formula = home_win ~ fg_pct_home + reb_home + tov_home + 
##     ast_home, family = binomial, data = NBA_Data)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -17.644267   0.202500 -87.132  < 2e-16 ***
## fg_pct_home  28.168233   0.359937  78.259  < 2e-16 ***
## reb_home      0.166227   0.002373  70.058  < 2e-16 ***
## tov_home     -0.122756   0.003321 -36.961  < 2e-16 ***
## ast_home     -0.010968   0.002994  -3.663 0.000249 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 53314  on 39801  degrees of freedom
## Residual deviance: 37681  on 39797  degrees of freedom
## AIC: 37691
## 
## Number of Fisher Scoring iterations: 5

Home win probability and its drivers

The model estimates the log‑odds of a home win using four performance variables. Each coefficient represents the change in log‑odds of winning for a one‑unit increase in the predictor, holding the others constant.

Shooting percentage (fg_pct_home)

This is the dominant predictor in the model. A coefficient of 28.17 means that even small changes in shooting percentage have a large effect on win probability. Because FG% is on a 0–1 scale, a one‑percentage‑point increase (0.01) changes the log‑odds by about 0.28, which corresponds to roughly a 32% increase in the odds of winning. This aligns with basketball intuition: shooting efficiency is one of the strongest determinants of game outcomes.

Rebounds (reb_home)

The coefficient of 0.166 indicates that each additional rebound increases the log‑odds of winning. Converting to an odds ratio, a single rebound raises the odds of winning by about 18%. Rebounding extends possessions and limits opponent opportunities, so this effect is consistent with game dynamics.

Turnovers (tov_home)

The negative coefficient (–0.123) shows that turnovers reduce win probability. Each turnover decreases the odds of winning by about 11–12%, reflecting the cost of lost possessions and transition opportunities for opponents.

Assists (ast_home)

The coefficient is small (–0.011) and negative, which may seem counterintuitive. This often happens when assists are highly correlated with shooting percentage and field goals made. Once FG% is in the model, assists add little unique information, and the sign can flip due to multicollinearity. The effect is statistically significant but practically small.

Confidence Interval for a Coefficient

Using the standard error for fg_pct_home, the formula for a confidence interval is:

(28.168233 - (z * 0.359937), 28.168233 + (z * 0.359937))

For a confidence interval of 95% (wanting to minimize standard error, but stay within reason), this yields approximately:

Because the entire interval is positive and far from zero, the data strongly support the conclusion that higher shooting percentage increases the probability of winning. In practical terms, even the lower end of the interval implies that small improvements in shooting efficiency have a meaningful impact on game outcomes.

Basketball interpretation

The model captures the essential logic of NBA games: