Motivation

The NBA uses a salary cap to promote competitive fairness; but can money still buy wins? This project looks into whether team payroll predicts win percentage across all 30 NBA teams from 2021–2023.

This question connects to sociologist Pierre Bourdieu’s (1986) theory of capital, which argues that economic resources convert into structural advantage even in systems designed to be fair. Prior research supports this: Berri & Schmidt (2006) found a weak but positive relationship between NBA payroll and wins, and reporting from FiveThirtyEight (Paine, 2015) showed large-market teams consistently spend above the luxury tax line while small-market teams cluster near the floor.


Data

I use two data sources joined by team name and season:

1. NBA Payroll (Kaggle)
Historical team payroll from 1990–2023, downloaded from Kaggle (loganlauton/nba-players-and-team-data). Filtered to 2021–2023 seasons and cleaned in R.

2. Win Percentage (ESPN via hoopR)
League win percentage pulled using the espn_nba_standings() function from the hoopR R package for the same seasons.

# Load payroll
payroll_raw <- read_csv("~/Downloads/archive/NBA Payroll(1990-2023).csv")
head(payroll_raw)
## # A tibble: 6 × 5
##    ...1 team      seasonStartYear payroll     inflationAdjPayroll
##   <dbl> <chr>               <dbl> <chr>       <chr>              
## 1     0 Cleveland            1990 $14,403,000 $32,854,246        
## 2     1 New York             1990 $13,290,000 $30,315,416        
## 3     2 Detroit              1990 $12,910,000 $29,448,608        
## 4     3 LA Lakers            1990 $12,120,000 $27,646,565        
## 5     4 Atlanta              1990 $11,761,000 $26,827,658        
## 6     5 Dallas               1990 $11,693,000 $26,672,548
payroll <- payroll_raw %>%
  filter(seasonStartYear %in% c(2020, 2021, 2022)) %>%
  mutate(
    payroll_clean = str_remove_all(payroll, "\\$|,") %>% as.numeric(),
    season = seasonStartYear + 1,
    team = case_when(
      team == "Cleveland"     ~ "Cleveland Cavaliers",
      team == "New York"      ~ "New York Knicks",
      team == "Detroit"       ~ "Detroit Pistons",
      team == "LA Lakers"     ~ "Los Angeles Lakers",
      team == "Atlanta"       ~ "Atlanta Hawks",
      team == "Dallas"        ~ "Dallas Mavericks",
      team == "Philadelphia"  ~ "Philadelphia 76ers",
      team == "Milwaukee"     ~ "Milwaukee Bucks",
      team == "Phoenix"       ~ "Phoenix Suns",
      team == "Brooklyn"      ~ "Brooklyn Nets",
      team == "Boston"        ~ "Boston Celtics",
      team == "Portland"      ~ "Portland Trail Blazers",
      team == "Golden State"  ~ "Golden State Warriors",
      team == "San Antonio"   ~ "San Antonio Spurs",
      team == "Indiana"       ~ "Indiana Pacers",
      team == "Utah"          ~ "Utah Jazz",
      team == "Oklahoma City" ~ "Oklahoma City Thunder",
      team == "Houston"       ~ "Houston Rockets",
      team == "Denver"        ~ "Denver Nuggets",
      team == "Miami"         ~ "Miami Heat",
      team == "Memphis"       ~ "Memphis Grizzlies",
      team == "Minnesota"     ~ "Minnesota Timberwolves",
      team == "New Orleans"   ~ "New Orleans Pelicans",
      team == "LA Clippers"   ~ "LA Clippers",
      team == "Chicago"       ~ "Chicago Bulls",
      team == "Sacramento"    ~ "Sacramento Kings",
      team == "Orlando"       ~ "Orlando Magic",
      team == "Washington"    ~ "Washington Wizards",
      team == "Charlotte"     ~ "Charlotte Hornets",
      team == "Toronto"       ~ "Toronto Raptors",
      TRUE ~ team
    )
  ) %>%
  select(team, season, payroll_clean)

head(payroll)
## # A tibble: 6 × 3
##   team                  season payroll_clean
##   <chr>                  <dbl>         <dbl>
## 1 Golden State Warriors   2021     171105334
## 2 Brooklyn Nets           2021     170444633
## 3 Philadelphia 76ers      2021     147825311
## 4 LA Clippers             2021     139722606
## 5 Los Angeles Lakers      2021     139334713
## 6 Utah Jazz               2021     136881324
# Pull win percentage from ESPN via hoopR
seasons <- c(2021, 2022, 2023)

standings <- map_df(seasons, function(s) {
  espn_nba_standings(s) %>%
    mutate(season = s) %>%
    select(team, leaguewinpercent, season)
})

head(standings)
## # A tibble: 6 × 3
##   team               leaguewinpercent season
##   <chr>                         <dbl>  <dbl>
## 1 Utah Jazz                     0.667   2021
## 2 Phoenix Suns                  0.714   2021
## 3 Philadelphia 76ers            0.738   2021
## 4 Brooklyn Nets                 0.619   2021
## 5 Denver Nuggets                0.619   2021
## 6 LA Clippers                   0.643   2021
# Join payroll and standings
teams <- inner_join(standings, payroll, by = c("team", "season")) %>%
  mutate(
    leaguewinpercent = as.numeric(leaguewinpercent),
    payroll_M = payroll_clean / 1e6
  )

glimpse(teams)
## Rows: 90
## Columns: 5
## $ team             <chr> "Utah Jazz", "Phoenix Suns", "Philadelphia 76ers", "B…
## $ leaguewinpercent <dbl> 0.6666667, 0.7142857, 0.7380952, 0.6190476, 0.6190476…
## $ season           <dbl> 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021,…
## $ payroll_clean    <dbl> 136881324, 128858241, 147825311, 170444633, 129793210…
## $ payroll_M        <dbl> 136.8813, 128.8582, 147.8253, 170.4446, 129.7932, 139…

Data limitations: Payroll does not capture player quality, injuries, or mid-season trades. Metro income is a rough proxy for market size.


Methods

I use a scatter plot with a linear regression line to examine the relationship between average team payroll and average win percentage across the 2021–2023 seasons.

Why linear regression? Win percentage is a continuous outcome and payroll is a continuous predictor. Linear regression is the most interpretable method for this relationship and appropriate for the sample size of 30 teams.

Pre-processing: Team names were standardized across sources using case_when() . Payrolls were averaged across three seasons to produce one observation per team.

# Average across seasons — one row per team
teams_avg <- teams %>%
  group_by(team) %>%
  summarise(
    avg_payroll_M  = mean(payroll_M, na.rm = TRUE),
    avg_winpct     = mean(leaguewinpercent, na.rm = TRUE)
  )

head(teams_avg)
## # A tibble: 6 × 3
##   team                avg_payroll_M avg_winpct
##   <chr>                       <dbl>      <dbl>
## 1 Atlanta Hawks                130.      0.524
## 2 Boston Celtics               135.      0.582
## 3 Brooklyn Nets                173.      0.604
## 4 Charlotte Hornets            117.      0.505
## 5 Chicago Bulls                134.      0.538
## 6 Cleveland Cavaliers          134.      0.473
# Linear regression: does payroll predict win percentage?
model <- lm(avg_winpct ~ avg_payroll_M, data = teams_avg)
summary(model)
## 
## Call:
## lm(formula = avg_winpct ~ avg_payroll_M, data = teams_avg)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.261601 -0.042897  0.004459  0.079719  0.241765 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   0.028014   0.177158   0.158    0.875  
## avg_payroll_M 0.003497   0.001302   2.685    0.012 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1209 on 28 degrees of freedom
## Multiple R-squared:  0.2048, Adjusted R-squared:  0.1764 
## F-statistic:  7.21 on 1 and 28 DF,  p-value: 0.01205

Results

ggplot(teams_avg, aes(x = avg_payroll_M, y = avg_winpct, label = team)) +
  geom_point(size = 3.5, alpha = 0.85, color = "#0D1B2A") +
  geom_smooth(method = "lm", se = TRUE, color = "#F5A623", fill = "#F5A62340") +
  geom_text_repel(size = 2.6, color = "#5F6368", max.overlaps = 30) +
  scale_x_continuous(labels = label_dollar(suffix = "M")) +
  scale_y_continuous(labels = label_percent(), limits = c(0, 1)) +
  labs(
    title    = "Do Richer Teams Win More?",
    subtitle = "Average payroll vs. average win percentage per team (2021–2023)",
    x        = "Average Team Payroll",
    y        = "Win Percentage",
    caption  = "Sources: Kaggle (NBA Payroll 1990-2023), ESPN via hoopR"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title       = element_text(face = "bold", color = "#0D1B2A"),
    plot.subtitle    = element_text(color = "#5F6368", size = 10),
    plot.caption     = element_text(color = "#9AA0A6", size = 8),
    panel.grid.minor = element_blank()
  )

The regression line shows a positive relationship between payroll and win percentage. The wide confidence band indicates the relationship is real but not deterministic. Payroll raises the floor but does not guarantee outcomes. Notable outliers include Phoenix Suns (overperformed their budget) and Houston Rockets (underperformed despite high spending).


Conclusions

This analysis supports the structural advantage hypothesis. Teams at the bottom of the payroll range rarely break 50% win rate, while high-spending teams cluster near the top. This is consistent with Bourdieu’s argument that economic capital converts into competitive advantage even in regulated systems like the NBA’s soft cap.

The luxury tax was designed to level the playing field, but large-market teams consistently absorb the penalty, maintaining a spending advantage small-market teams cannot match.

Outliers matter: Memphis and Dallas outperformed their payroll through drafting and development, showing that financial disadvantage can be partially offset, but these remain exceptions, not the rule.

Next steps:

  • Control for star player presence using an all-star dummy variable
  • Extend the time window to include more seasons
  • Compare across leagues (MLS, EPL) with different cap structures
  • Use ticket prices as a richer market size proxy

References

  • Berri, D. & Schmidt, M. (2006). On the Road With the NBA’s Superstar Externality. Journal of Sports Economics.
  • Bourdieu, P. (1986). The Forms of Capital. In J. Richardson (Ed.), Handbook of Theory and Research for the Sociology of Education.
  • Paine, N. (2015). The NBA Has A Spending Problem. FiveThirtyEight.
  • Kaggle Dataset: loganlauton/nba-players-and-team-data
  • hoopR R package: Saiem Gilani et al.