Introduction

In this report, we analyze NBA team performance to examine the relationship between offensive (PRA) and defensive (STOCKS) metrics. Specifically, we explore how these two variables relate to one another and whether teams from the Eastern and Western Conferences differ in their overall performance across all 30 NBA teams.

Step 1 – Loading and Preparing the Data

loading_fun <- function(sheet_name){
  df <- read_excel("NBA Team Total Data 2024-2025.xlsx", sheet = sheet_name) %>%
    mutate(
      Team = sheet_name,
      Won_award = ifelse(is.na(Awards),0,1),
      PRA = PTS + TRB + AST,
      STOCKS = STL + BLK)
  return(df)}

In this step, I created a function that loads one sheet from the Excel workbook and prepares it for analysis. The function adds a column for the team name (taken from the sheet), creates a binary column Won_award (0 = No, 1 = Yes), and adds two new columns for performance metrics: PRA (Points + Rebounds + Assists) representing offensive ability, and STOCKS (Steals + Blocks) representing defensive ability.

nba_data <- excel_sheets("NBA Team Total Data 2024-2025.xlsx") %>%
  lapply(loading_fun) %>% 
  bind_rows()

Next, I loaded the Excel file and used lapply() to apply my function to each of the 30 sheets. The resulting team data frames were then combined into a single dataset using bind_rows().

Step 2 – Adding Conference Information

conference_info <- read_excel("Team Conferences.xlsx")

nba_data <- left_join(nba_data, conference_info, by = "Team")

nba_data <- nba_data %>% mutate(conference_binary = ifelse(Conference == "East",1,0))

After combining all 30 sheets into one dataset, I used the provided conference lookup sheet to add each team’s conference (East or West). In addition, I recoded the conference as a binary variable for analysis, where East = 1 and West = 0.

Step 3 – Visual Exploration (2 plots)

1. SCATTER PLOT: PRA + STOCKS by Conference
ggplot(nba_data, aes(x = PRA, y = STOCKS, color = Conference)) +
  geom_point() +
  geom_smooth(method = "lm")+
  labs(title = "Relationship Between PRA and STOCKS by Conference",
       x = "Points + Rebounds + Assists (PRA)", 
       y = "Steals + Blocks (STOCKS)",
       color ="Conference") +
  theme_minimal(base_size = 12) +
  theme(plot.title = element_text(size = 17, family = "Georgia", face = "bold"),
        axis.title.x = element_text(size = 12, family = "Georgia"),
        axis.title.y = element_text(size = 12, family = "Georgia"))
## `geom_smooth()` using formula = 'y ~ x'

Interpretation:

The scatterplot shows a clear positive relationship between PRA and STOCKS. As players’ PRA increases, their STOCKS are also likely to increase. This suggests that players who perform well offensively are also likely to perform well defensively. Additionally, both East and West conferences follow a similar upward trend.

2. BAR CHART: PRA by Conference
team_avg_PRA <- nba_data %>% group_by(Team) %>% summarize(team_avg_PRA= mean(PRA, na.rm = TRUE))

ggplot(team_avg_PRA, aes(x = reorder(Team, desc(team_avg_PRA)), y = team_avg_PRA)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs(
    title = "Average PRA by Team",
    x = "Team",
    y = "Average PRA (Points + Rebounds + Assists)"
  ) +
  theme_minimal(base_size = 12) +
  theme(plot.title = element_text(size = 17, family = "serif", face = "bold"),
        axis.title.x = element_text(size = 12, family = "serif"),
        axis.title.y = element_text(size = 12, family = "serif"))

Interpretation:

The bar chart shows each team’s average Points + Rebounds + Assists (PRA). Right away, we can see that the Detroit Pistons have the lowest average PRA among all teams, standing out from the rest. In contrast, teams like the Nuggets and Celtics show much higher average PRA values, suggesting stronger overall offensive performance.

Step 4 – Correlation Analysis

1. Point-Biserial Correlation: Conference + PRA
cor.test(nba_data$conference_binary, nba_data$PRA)
## 
##  Pearson's product-moment correlation
## 
## data:  nba_data$conference_binary and nba_data$PRA
## t = -1.8195, df = 650, p-value = 0.0693
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.147164250  0.005629906
## sample estimates:
##         cor 
## -0.07118475

Interpretation:

After running a correlation test between Conference and PRA, we see a weak negative relationship (r = -0.07, p = 0.069). This means players from the West have slightly higher offensive totals (points, rebounds, and assists) than those from the East, but this difference is not statistically significant.

2. Point-Biserial Correlation: Conference + STOCKS
cor.test(nba_data$conference_binary, nba_data$STOCKS)
## 
##  Pearson's product-moment correlation
## 
## data:  nba_data$conference_binary and nba_data$STOCKS
## t = -2.094, df = 650, p-value = 0.03665
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.157650363 -0.005105577
## sample estimates:
##         cor 
## -0.08185737

Interpretation:

For Conference and STOCKS, there is a weak negative and statistically significant correlation (r = -0.08, p = 0.037). This means players from the West show slightly higher defensive performance (steals and blocks) compared to those from the East, though the relationship remains very small overall.

3. Correlation Matrix: Age + PRA + STOCKS

corr_matrix <- nba_data %>% dplyr::select(Age, PRA, STOCKS) %>%
  cor(use = "complete.obs")

corr_matrix
##               Age       PRA     STOCKS
## Age    1.00000000 0.1238926 0.07734898
## PRA    0.12389260 1.0000000 0.84021798
## STOCKS 0.07734898 0.8402180 1.00000000
ggcorrplot(corr_matrix,
           lab = TRUE,
           title = "Correlation Matrix: Age, PRA, and STOCKS",
           lab_size = 3,
           colors= c("lightgreen","lightblue","lightpink"))
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## ℹ The deprecated feature was likely used in the ggcorrplot package.
##   Please report the issue at <https://github.com/kassambara/ggcorrplot/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Interpretation:

From the correlation matrix, the strongest relationship is between PRA and STOCKS (r = 0.84), indicating a strong positive correlation. This suggests that players who contribute more offensively (through points, rebounds, and assists) also tend to contribute more defensively (through steals and blocks). On the other hand, Age shows only weak correlations with both PRA (r = 0.12) and STOCKS (r = 0.08), suggesting that performance is not strongly linked to player age.

4. Partial Correlation: PRA + STOCKS + Age(controlled)

pcor.test(nba_data$PRA, nba_data$STOCKS, nba_data$Age)
##    estimate       p.value statistic   n gp  Method
## 1 0.8395996 3.657553e-174  39.37587 652  1 pearson

Interpretation:

After running a partial correlation between PRA and STOCKS while controlling for Age, the relationship remained strong, positive, and statistically significant (r = 0.84, p < .001).This means that even after accounting for differences in player age, offensive (PRA) and defensive (STOCKS). Therefore, the strong link between offensive and defensive performance is not explained by age, it holds true regardless of how old the players are.

Step 5 – Communicate Findings

After analyzing NBA player data, results showed a strong positive relationship between offensive (PRA: Points + Rebounds + Assists) and defensive (STOCKS: Steals + Blocks) performance (r = .84, p < .001), meaning players who do well on offense also tend to do well on defense. There were no major differences between East and West teams, as both performed about the same overall. The relationship between PRA and STOCKS remained strong even after accounting for Age, showing that age does not have a significant effect on this link. One limitation of the current analysis is that it focused on total season statistics, which may not best reflect game-to-game performance. A potential next step would be to analyze per-game averages and track how these performance trends develop over time.