Binary Variable (Playoffs)

This dataset already includes a binary variable. When playoffs is equal to 1, that means the game occurred in the playoffs. If playoffs equals 0 then the game occurred in the regular season. This is an interesting variable to model because playoff games are typically more competitive and may feature different performance patterns compared to regular season games.

Logistic Regression Model

I will build a logistic regression model using PTS (Points), TRB (Rebounds) and AST (Assists). These variables represent the main components of player’s performance and may differ between playoff and regular season games.

nba$Playoffs <- ifelse(nba$Playoffs == "true", 1, 0)
model_log <- glm(Playoffs ~ PTS + TRB + AST, data = nba, family = "binomial") 
summary(model_log)
## 
## Call:
## glm(formula = Playoffs ~ PTS + TRB + AST, family = "binomial", 
##     data = nba)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -5.03843    0.48283 -10.435  < 2e-16 ***
## PTS          0.05410    0.01193   4.536 5.73e-06 ***
## TRB         -0.00334    0.03308  -0.101    0.920    
## AST         -0.01348    0.04703  -0.287    0.774    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 437.25  on 1702  degrees of freedom
## Residual deviance: 418.37  on 1699  degrees of freedom
## AIC: 426.37
## 
## Number of Fisher Scoring iterations: 6

Coefficients

In logistic regression, these coefficients represent the change in log-odds of the outcome being a playoff game. If the PTS coefficient is positive, higher scoring performances are associated with a higher likelihood of occurring in playoff games. If negative, it suggests scoring performances may be lower or more constrained in playoff settings. A positive TRB coefficient would indicate that stronger rebounding performances are more likely in playoff games, possibly due to increased physicality and emphasis on each possession. A positive AST coefficient would suggest that higher assist numbers are associated with playoff games as well.

Confidence Interval

We can construct a 95% confidence interval for each of the coefficients.

Points

coef_est <- coef(summary(model_log))["PTS", "Estimate"] 
std_err <- coef(summary(model_log))["PTS", "Std. Error"] 

lower <- coef_est - 1.96 * std_err 
upper <- coef_est + 1.96 * std_err 

c(lower, upper)
## [1] 0.03072438 0.07747367

Rebounds

coef_est <- coef(summary(model_log))["TRB", "Estimate"] 
std_err <- coef(summary(model_log))["TRB", "Std. Error"] 

lower <- coef_est - 1.96 * std_err 
upper <- coef_est + 1.96 * std_err 

c(lower, upper)
## [1] -0.06817260  0.06149268

Assists

coef_est <- coef(summary(model_log))["AST", "Estimate"] 
std_err <- coef(summary(model_log))["AST", "Std. Error"] 

lower <- coef_est - 1.96 * std_err 
upper <- coef_est + 1.96 * std_err 

c(lower, upper)
## [1] -0.10566233  0.07870611

Visualization

nba |>
  ggplot(aes(x = PTS, y = Playoffs)) +
  geom_jitter(width = 0, height = 0.1, alpha = 0.5) +
  geom_smooth(method = "glm", method.args = list(family = "binomial"), se = FALSE, color = "red") +
  labs(
    title = "Logistic Fit for Playoff Probability",
    x = "Points",
    y = "Probability of Playoff Game"
  ) +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

The logistic regression results show that points scored is the only statistically significant predictor of whether a game is a playoff game or not since p < 0.001. The positive coefficient for PTS (0.054) indicates that higher-scoring performances are often associated with an increased likelihood of occurring in playoff games. The 95% confidence interval for PTS does not include zero, reinforcing that this relationship is statistically meaningful. In contrast, total rebounds and assists are not deemed statistically significant, as their p-values are very high and their confidence intervals both include zero. This suggests that these variables do not have as much of a clear or reliable relationship with whether a game is a playoff game or not in this dataset. Overall, this model provides some evidence that scoring performance differs between playoff and regular season games, but it also suggests that other aspects of player performance may not be strong indicators of playoff context. The relatively small improvement from null to residual deviance also indicates that the model only barely improves prediction over a baseline model.

Further Questions

Why is scoring the only significant predictor out of the three coefficients? Would a more balanced outcome variable lead to a more stable and interpretable model? Are there other contextual variables that would better explain playoff classification? Would alternative modeling approaches or resampling techniques improve performance?