Unexpected NBA Player Performance Analysis

Author

Lucas Tetrault

Audience

This analysis is intended for an NBA front office or analytics department, such as a general manager or coaching staff, who are interested in using data to evaluate player performance and inform decision-making. It could also be presented to the commissioner of the NBA if he is interested in learning more about the statistical history of the league as a whole.


Background & Objective

Modern NBA teams rely heavily on analytics to evaluate players and optimize performance. Metrics like Game Score (GmSc) summarize a player’s overall contribution in a game. The main objective of this project is to determine how unexpected player performances vary across teams and over time, identifying key factors that contribute to high-impact performances along the way.


library(tidyverse)
Warning: package 'ggplot2' was built under R version 4.3.3
Warning: package 'lubridate' was built under R version 4.3.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggthemes)

nba <- read.csv("nba.csv")

Initial Exploration

Game Score Distribution by Team

ggplot(nba, aes(x = GmSc, y = reorder(Tm, GmSc))) +
  geom_boxplot(fill = "blue") +
  theme_fivethirtyeight() +
  labs(title = "Game Score Distribution by Team",
       x = "Game Score",
       y = "Team")

Points vs Game Score

ggplot(nba, aes(x = PTS, y = GmSc)) +
  geom_point(alpha = 0.4) +
  geom_smooth(method = "lm", color = "red") +
  theme_minimal() +
  labs(title = "Points vs Game Score")
`geom_smooth()` using formula = 'y ~ x'

Initial exploration shows that player performance varies widely across teams, with some teams having a greater spread of high game score performances with some represented as outliers. Additionally, there seems to be a strong relationship between points scored and overall Game Score.

Assumptions

Game Score is a valid measure of overall player performance, observations are independent, dataset represents a mix of regular season and playoff games

Mitigating Risks

Avoid overinterpreting aggregated data like totals or averages, acknowledge missing variables such as minutes played or pace of play, use multiple models for thorough analysis