Response Variable

The response variable for this analysis is Game Score (GmSc) which measures a player’s overall performance in a single game. Game Score combines several statistics including points, rebounds, assists, steals, blocks, turnovers and shooting efficiency all in one metric. Because it summarizes total player contribution, it is a valuable variable for evaluating unexpected performances in the NBA.

ANOVA

The categorical explanatory variable used for the ANOVA test can be Playoffs which separates games played during the regular season from those played during the playoffs. Research Question: Do players produce significantly different Game Scores in playoff games compared to regular season games? Null Hypothesis: The mean Game Score is the same for playoff and regular season games. (H0:μplayoff = μregular) Alternative Hypothesis: The mean Game Score is not the same for playoff and regular season games. (HA:μplayoff not equal to μregular)

Test

anova_model <- aov(GmSc ~ Playoffs, data = nba)
summary(anova_model)
##               Df Sum Sq Mean Sq F value   Pr(>F)    
## Playoffs       1   1292    1292    18.2 2.09e-05 ***
## Residuals   1701 120742      71                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since the p-value is severely small and much less than 0.05, we can safely reject the null hypothesis and conclude that Game Score differs significantly between playoff and regular season games.

Visualization

ggplot(nba, aes(x = Playoffs, y = GmSc, fill = Playoffs)) +
  geom_boxplot(alpha = 0.7) +
  labs(
    title = "Playoffs vs Regular Season Game Score Distribution",
    x = "Playoffs?",
    y = "Game Score"
  ) +
  theme_minimal()

By looking at the visualization, since the playoff boxplot appears to be higher than the regular season boxplot it suggests that players tend to produce stronger performances in playoff games. This could suggest that players increase their level of play during playoff games due to a number of factors including higher stakes, stars getting more playing time and just a more competitive environment overall.

Linear Regression

For the regression model, the explanatory variable can be points scored (PTS) which is a continuous variable that strongly contributes to a player’s overall game score. Research Question: Does the number of points scored in a game significantly influence a player’s Game Score? Model/Equation: GmSc = β0 + β1(PTS) + ϵ Coefficients: Intercept (β0): The predicted Game Score when a player scores zero points, Slope (β1): The average increase in Game Score for each additional point scored For example, if the slope is 0.8 then that would indicate that each additional point scored increases Game Score by approximately 0.8 points on average.

Test

reg_model <- lm(GmSc ~ PTS, data = nba)
summary(reg_model)
## 
## Call:
## lm(formula = GmSc ~ PTS, data = nba)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.4427 -2.2393 -0.2399  2.0597 15.0110 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 5.585814   0.227045    24.6   <2e-16 ***
## PTS         0.750196   0.008101    92.6   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.446 on 1701 degrees of freedom
## Multiple R-squared:  0.8345, Adjusted R-squared:  0.8344 
## F-statistic:  8575 on 1 and 1701 DF,  p-value: < 2.2e-16

This tells us that for every additional point scored in a game, the Game Score increases by about 0.75 on average. The intercept (5.586) represents the predicted Game Score when a player scores 0 points. The coefficient for PTS is highly statistically significant with a very small p-value indicating that there is a strong relationship between points scored and Game Score. The R-squared value shows that about 83.45% of the variability in Game Score can be explained by points scored alone. This makes sense because scoring is a major component of the Game Score formula.

Visualization

ggplot(nba, aes(x = PTS, y = GmSc)) +
  geom_point(alpha = 0.5) +
  geom_smooth(method = "lm", se = TRUE, color = "red") +
  labs(
    title = "Relationship Between Points Scored and Game Score",
    x = "PTS",
    y = "Game Score"
  ) +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

This visualization displays that points scored is a very strong predictor of Game Score. Players who score more points generally have higher Game Scores, which aligns with how Game Score is designed to measure overall performance. While there are many other stats that contribute to the metric, it still makes sense that this relationship is as strong as it is.

Further Questions

Would adding rebounds and/or assists to the regression model improve correlation? Are there any outlier performances where players score a relatively low amount of points but still achieve very high Game Scores? Do players with higher minutes played or better teams tend to generate disproportionately higher Game Scores?