A message for my Sweet Dear Mister Adam Sterling:

Key Findings

Across all teams, offensive and defensive output were strongly correlated (r = .84, p < .001). In other words, teams and players that performed well offensively tended to also perform well defensively. However, this relationship weakened drastically after controlling for minutes played, suggesting that playing time explains a large portion of the overlap. Put simply, players who play get more opportunity to produce more stats, both offensively and defensively.

East vs. West

Correlation between conference and offensive output was weak and insignificant (r = -.07, p = .07). Correlation between conference and defensive output was weak but significant (r = -.08, p = .04), with teams out West showing slightly higher defensive output on average. Overall, the differences between conferences was minimal.

Limitations

As a basketball fan, I know that there is more to a player’s productivity than their statistical output. There are many players who put up “empty stats”. In these sorts of analyses, these stats may look impressive, however they may not correlate with winning. Defensive statistical output is a notoriously unreliable indicator of defensive ability for a variety of reasons. For example, a player who attempts many steals may be giving up more points than another player who defends more carefully.

Please see my work below.

ggplot(nba, aes(x = pra, y = stocks, color = conference)) +
  geom_point(size = 3, alpha = 0.8) +
  geom_smooth(method = "lm", se = FALSE, linetype = "dashed") +
  labs(
    title = "Relationship Between PRA and STOCKS by Conference",
    subtitle = "Each point represents a team's combined offensive (PRA) and defensive (STOCKS) output",
    x = "PRA (Points + Rebounds + Assists)",
    y = "STOCKS (Steals + Blocks)",
    color = "Conference"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold"),
    legend.position = "right"
  )
## `geom_smooth()` using formula = 'y ~ x'

# Let's create another variable that captures combined PRA + STOCKS. 
nba <- nba %>%
  mutate(
    total_stat_output = pra + stocks
  )
# Do players from the West or East conferences produce more total statistical output? Do East vs. West award winners differ? 

ggplot(nba, aes(x = conference, y = total_stat_output, fill = factor(won_award))) +
  geom_boxplot(alpha = 0.7, outlier.shape = 21) +
  labs(
    title = "Total Stat Output by Conference and Award Status",
    subtitle = "Comparing East vs. West Players by Award Status",
    x = "Conference",
    y = "Total Stat Output (PRA + STOCKS)",
    fill = "Won Award\n(0 = No, 1 = Yes)"
  ) +
  scale_fill_manual(values = c("0" = "gray70", "1" = "goldenrod")) +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"))

cat("\nConference (East=1, West=0) ~ PRA\n")
## 
## Conference (East=1, West=0) ~ PRA
pb_pra <- cor.test(nba$east_binary, nba$pra)
print(pb_pra)
## 
##  Pearson's product-moment correlation
## 
## data:  nba$east_binary and nba$pra
## t = -1.8195, df = 650, p-value = 0.0693
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.147164250  0.005629906
## sample estimates:
##         cor 
## -0.07118475
cat("\nConference (East=1, West=0) ~ STOCKS\n")
## 
## Conference (East=1, West=0) ~ STOCKS
pb_stocks <- cor.test(nba$east_binary, nba$stocks)
print(pb_stocks)
## 
##  Pearson's product-moment correlation
## 
## data:  nba$east_binary and nba$stocks
## t = -2.094, df = 650, p-value = 0.03665
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.157650363 -0.005105577
## sample estimates:
##         cor 
## -0.08185737

r = -0.071, p =.069. Conference’s correlation with PRA is weak and insignificant.

r = -0.082, p =.037. Conference’s correlation with STOCKS is weak, but it is significant. West teams have slightly higher STOCKS.

corr_data <- nba[, c("Age", "pra", "stocks")]
corr_matrix <- cor(corr_data, use = "complete.obs", method = "pearson")
print(round(corr_matrix, 3))
##          Age   pra stocks
## Age    1.000 0.124  0.077
## pra    0.124 1.000  0.840
## stocks 0.077 0.840  1.000
ggcorrplot(
  corr_matrix,
  lab = TRUE,           # add r values
  type = "lower",       # show lower triangle
  lab_size = 4,
  colors = c("red", "white", "blue"),
  title = "Correlation Matrix: Age, PRA, and STOCKS"
) +
  theme_minimal(base_size = 13)
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## ℹ The deprecated feature was likely used in the ggcorrplot package.
##   Please report the issue at <https://github.com/kassambara/ggcorrplot/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Age is very weakly correlated with both offensive or defensive output. STOCKS and PRA are highly correlated with one another. This is likely related to multiple other factors, such as minutes played. More productive players get more playing time. Let’s control for this.

zero_r <- cor(nba$pra, nba$stocks, use = "complete.obs")
cat("\nZero-order correlation (PRA ~ STOCKS): r =", round(zero_r, 3), "\n")
## 
## Zero-order correlation (PRA ~ STOCKS): r = 0.84
pc <- pcor.test(nba$pra, nba$stocks, nba$MP, method = "pearson")
print(pc)
##     estimate    p.value statistic   n gp  Method
## 1 0.07911891 0.04359333  2.021931 652  1 pearson
cat("\nPartial correlation (PRA ~ STOCKS | MP): r =", round(pc$estimate, 3),
    "| p =", round(pc$p.value, 3), "\n")
## 
## Partial correlation (PRA ~ STOCKS | MP): r = 0.079 | p = 0.044

The zero-order correlation (r=0.84) shows a very strong association between offensive and defensive output. However, after controlling for minutes played, the relationship drops to a very weak (r=.079) but still significant correlation (p=.04).