2025-12-01

Goal: Understand Correlations between NBA performance stats

  • Recreate Heatmapped Correlation Matrix from the article “A Hybrid Machine Learning Model for Predicting USA NBA All-Stars”
  • Quantify uncertainty using bootstrapping
  • Simulate season-to-season variability with Monte Carlo
  • Discuss important stats and features of the league that would lead to differing correlations across seasons.
  • Discuss Monte Carlo distributions of NBA stats

Data and Methods

  • Dataset: Top 15 NBA players (2018/2022 season stats)

  • Small dataset makes bootstrapping useful for determining uncertanity

  • Goal: Understand relationships between NBA performance metrics using correlation matrices

Methods Used

  • Correlation analysis (visualized using heatmaps)

  • Bootstrap resampling to estimate variability in correlations

  • Monte Carlo simulation to assess stability of correlation matrices

Motivation

  • Why do this study?

  • What is special about the 2018 and 2022 seasons?

  • Some NBA stats are very correlated others are not very correlated

  • Most MVP discussions often rely on a “eye test” + incomplete narratives so prediction becomes difficult for MVP status

  • Some statistics are very important to some positions, no so much to others.

Introduction to Basic Stats

  • Percent stats
    • 3pt, Ft, Fg
  • Per game stats
    • Blocks, Steals, Assists, Rebounds, Points, Minutes
  • Attempts
    • 3pt, Ft, Fg
  • Totals
    • Games Played

What We Notice from Season to Season

##   season pace tov_pct orb_pct num_players
## 1   2018 97.3    13.3    22.4         494
## 2   2022 97.0    12.8    23.1          15
  • Pace is down
  • Less turnovers
  • Rebounding is up
  • 3pa have skyrocketed

Their Matrix

Original Study

  • The original study in the paper we chose created this correlation matrix using a machine learning model to predict MVP points.

  • We recreated a version of this matrix using pearson correlations, and simplified the overall study that they performed while taking on the task of comparing these correlations ans how they change from season to season.

  • Correlation gives a quantitative view of NBA stat relationships -> then test stability using Monte Carlo and Bootstrapping

Replicated Correlation Matrix from 2018 season

Replicated Correlation Matrix from 2022 season

Statistics to Note from Matrices:

The red squares are the least correlated statistics where as the green squares are highly correlated on a scale from -1 to 1.

Strong Relationships 2022 Season:

  • PTS and FGA (0.86)
  • REB and BLK (0.82)
  • FG% and REB (0.82)

Weak relationships

  • FT% and most other statistics

  • Correlations may change based on position

Differences between the Matrices

  • Many more highly correlated statistics in 2018 then in 2022, this could be due to how superstars have changed over time.

Minutes per game

  • In 2018 minutes per game drove almost every stat (0.40 - 0.90)

  • In 2022 minutes per game correlations dropped to near 0 (0.04 - 0.40)

  • This means that performance is less dependent on which players play the most.

  • Lineups have become standardized as well as minutes

3-Point Attempts

Points vs 3PA

  • 2018: 0.91

  • 2022 : 0.71

  • Why? The top scoring teams were more 3pt dominant (ex. Warriors)

3-Point Attempts Graph 2018

3-Point Attempt Graph 2022

3 Point Average vs Field Goal %

3PA vs FG%

  • 2018 -0.32

  • 2022 -0.29

  • In 2018, only high-usages players scored a lot of threes while in 2022, everybody has started to shoot threes -> less to seperate the elite scorers as much thanks to the three point revolution.

3 Point Average vs Field Goal % Graph 2018

3 Point Average vs Field Goal % Graph 2022

FG Attempts vs Scoring

  • 2018: FGA vs PTS = 0.91

  • 2022: FGA vs PTS = 0.71

  • In 2022, more teams rely on ball movement, effciency, and possession by possesion matchups more than just volume shooting.

  • This is a hint at the rise of heliocentric offenses like Luka and Jokic where raw shot attempts are replaced more by playmaking. Lineups have shifted to distribution from one player rather than funneling shots to one guy.

FG Attempts vs Scoring Graph 2018

FG Attempts vs Scoring Graph 2022

Bootstrapping Simulation for League Stats

Bootstrapping for Leauge Stats

  • Used Bootstrapping to understand the averages for the leauge if we were to sample from our dataset mutiple times

  • This plot gives very smooth league averages for all of the stats in our dataset.

  • The plot represents the most “average” correlation of the leagues stats.

  • How does this compare to the actual data we have?

Monte Carlo Distributions for League Stats

Monte Carlo Distributions Continued

  • The previous plots are Monte Carlo simulated distributions of all the NBA stats we are working with.

  • Some almost seem bimodal while others are a almost perfect normal distribution

  • These values are very important to understand the overall ranges of the expected values of these statistics in the league and how the contribute to understanding how these stats change and affect the league.

Conclusion

  • The NBA is random, there will never be a combination of predictors that can explain the game.

  • Most correlations are obvious because they go hand in hand (ex. shooting and scoring)

  • Using correlation can highlight which facets of the game are most important to make adjustments to and how they would effect statistics.

  • Bootstrapping and Monte Carlo methods are particularly useful for smaller sample sizes, as they allow for deeper interpretation of data that might be otherwise difficult to analyze reliably.