This analysis is intended for the NBA commissioner Adam Silver and his team along with all the front offices around the league that are interested. This could also be useful for sports analysts, broadcasters or anyone else that is in charge of creating content for sports media outlets like ESPN, B/R or Barstool.
Background & Objective
The games where a player greatly exceeds expectations are the performances that NBA fans remember the most. Metrics like Game Score (GmSc) summarize a player’s overall contribution in a game. The main objective of this project is to determine how unexpected player performances vary across teams and over time, identifying key factors that contribute to high-impact performances along the way.
library(tidyverse)
Warning: package 'ggplot2' was built under R version 4.3.3
Warning: package 'lubridate' was built under R version 4.3.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggthemes)nba <-read.csv("nba.csv")
Initial Exploration
Game Score Distribution by Team
ggplot(nba, aes(x = GmSc, y =reorder(Tm, GmSc))) +geom_boxplot(fill ="blue") +theme_fivethirtyeight() +labs(title ="Game Score Distribution by Team",x ="Game Score",y ="Team")
Points vs Game Score
ggplot(nba, aes(x = PTS, y = GmSc)) +geom_point(alpha =0.4) +geom_smooth(method ="lm", color ="red") +theme_minimal() +labs(title ="Points vs Game Score")
`geom_smooth()` using formula = 'y ~ x'
Initial EDA shows us that player performance varies widely across teams, with certain teams having a greater spread of high game score performances than others with some even represented as outliers. Additionally, there seems to be a positive relationship between points scored and overall Game Score.
Assumptions
Game Score is a valid measure of overall player performance, observations are independent, dataset represents a mix of regular season and playoff games
Mitigating Risks
Avoid overinterpreting aggregated data like totals or averages, acknowledge missing variables such as minutes played or pace of play, use multiple models for thorough analysis