Week 10 Data Dive

For this week’s data dive I will be selecting a binary column of data and building a logistic regression model for this variable using 4 explanatory variables.

My binary column of data is the Playoffs column.

The explanatory variables are the TOV_per_100, ORB_per_100, X2p_Percent, and FTA_per_100 columns.

After building the logistic regression model I will be interpreting the coefficients and explain what they mean. Then, using the standard error for the TOV_per_100 variable, I will build a confidence interval for that coefficient and translate its meaning.

And as always I will be providing the insights gathered, their significance, and some additional questions to be explored.

Logistic Regression Model

Here is my logistic regression model built from the binary playoffs variable and the 4 explanatory variables: TOV_per_100, ORB_per_100, X2p_Percent, and FTA_per_100.

log_model <- glm(Playoffs ~ ORB_per_100 + TOV_per_100 + FTA_per_100 + X2p_Percent,
                 data = NBA_log,
                 family = "binomial")

summary(log_model)
## 
## Call:
## glm(formula = Playoffs ~ ORB_per_100 + TOV_per_100 + FTA_per_100 + 
##     X2p_Percent, family = "binomial", data = NBA_log)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -14.24347    1.68673  -8.444  < 2e-16 ***
## ORB_per_100   0.14457    0.03992   3.621 0.000293 ***
## TOV_per_100  -0.17048    0.03865  -4.411 1.03e-05 ***
## FTA_per_100   0.17613    0.02214   7.957 1.76e-15 ***
## X2p_Percent  21.94868    2.61424   8.396  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1931.9  on 1401  degrees of freedom
## Residual deviance: 1757.5  on 1397  degrees of freedom
## AIC: 1767.5
## 
## Number of Fisher Scoring iterations: 4

Interpretations:

- ORB_per_100:

- TOV_per_100:

- FTA_per_100:

- X2p_Percent:

exp(coef(log_model))
##  (Intercept)  ORB_per_100  TOV_per_100  FTA_per_100  X2p_Percent 
## 6.518376e-07 1.155540e+00 8.432635e-01 1.192588e+00 3.405571e+09

Here is the Confidence Interval for TOV_per_100:

coef_val <- coef(summary(log_model))["TOV_per_100", "Estimate"]
se_val   <- coef(summary(log_model))["TOV_per_100", "Std. Error"]

lower <- coef_val - 1.96 * se_val
upper <- coef_val + 1.96 * se_val

lower
## [1] -0.2462219
upper
## [1] -0.09472972
exp(lower)
## [1] 0.7817487
exp(upper)
## [1] 0.9096188

The confidence interval indicates that we are 95% confident that the true effect of turnovers lies between -0.246 and -0.095 in log-odds. In ratio form we are 95% confident that each additional turnover reduces playoff odds by between 9% and 22%.

Overall, the entire interval is below the value of 1. This would indicate that the turnovers variable is statistically significant. Turnovers are consistently harmful for teams when it comes to making the playoffs.

Insights, Significance, and Questions

The main insight I gather from the model is that all 4 explanatory variables are very important and have massive impacts on the odds of a team making the playoffs. Increasing your offensive rebounds, your 2pt%, and your free throw attempts can all greatly improve your chances of making the playoffs. On the other hand, increasing your amount of turnovers can decrease your chances of making the playoffs. This insight is significant because it tells us how valuable each possession is in addition to simply making shots. Since increasing offensive rebounds and reducing turnovers lead to having additional possessions we know that winning the possession battle in a game can be more important than how many shots you make. Some additional questions I might explore could be: Does 3pt% have a massive impact towards playoff prediction. Can teams offset their turnovers with additional offensive rebounding?