The response variable is GmSc (Game Score) and the explanatory variable is PTS (Points). We mutate these to create a new metric called PointShare that measures how much of a player’s Game Score comes from just scoring points.
nba <- nba |>
mutate(
ScoringShare = PTS / GmSc,
)
cor(nba$ScoringShare, nba$GmSc, use = "complete.obs")
## [1] 0.1165962
The correlation being this low proves that Game Score depends on a lot more factors than just scoring.
The original variable I am using is GmScMovingZ, and I am creating a new variable called HighlyUnexpected that converts unexpectedness into an ordered numeric indicator. I will use this to determine if extreme unexpected performances correspond to higher raw Game Scores.
nba <- nba |>
mutate(
HighlyUnexpected = abs(GmScMovingZ) > 5
)
nba |>
ggplot(aes(x = factor(HighlyUnexpected), y = GmSc)) +
geom_boxplot(fill = "steelblue", alpha = 0.7) +
theme_minimal() +
labs(
title = "Game Score by Unexpectedness Level",
x = "Highly Unexpected Performance (|Z| > 5)",
y = "Game Score"
)
This visualization shows that while extreme z-scores above 5 in this dataset are typically associated with slightly higher Game Scores on average, there are a lot of cases where extremely high games scores that could maybe even be outliers are not considered very unexpected due to the player’s relative baseline. It has more to do with the individual players rather than raw performance metrics.
cor(nba$HighlyUnexpected, nba$GmSc, use = "complete.obs")
## [1] 0.04303588
This extremely low correlation value makes sense because it was proven that unexpectedness does not really depend on GmSc at all.
Q1 <- quantile(nba$GmSc, 0.25)
Q3 <- quantile(nba$GmSc, 0.75)
IQR_value <- Q3 - Q1
lower <- Q1 - 1.5 * IQR_value
upper <- Q3 + 1.5 * IQR_value
nba |> filter(GmSc < lower | GmSc > upper)
## bbrID Date Tm Opp TRB AST STL BLK PTS GmSc Season Playoffs Year
## 1 antetgi01 2019-03-17 MIL PHI 16 7 2 1 52 50.4 2018-19 false 2019
## 2 anthoca01 2014-01-24 NYK CHA 13 0 0 0 62 50.6 2013-14 false 2014
## 3 arenagi01 2006-12-17 WAS LAL 8 8 2 0 60 47.8 2006-07 false 2007
## 4 barklch01 1994-05-04 PHO GSW 14 4 3 1 56 52.6 1993-94 true 1994
## 5 bookede01 2017-03-24 PHO BOS 8 6 3 1 70 54.5 2016-17 false 2017
## 6 bryanko01 2006-01-22 LAL TOR 6 2 3 1 81 63.5 2005-06 false 2006
## 7 burtowi01 1994-12-13 PHI MIA 8 3 1 2 53 49.6 1994-95 false 1995
## 8 butleji01 2017-01-02 CHI CHO 12 6 3 1 52 51.5 2016-17 false 2017
## 9 davisan02 2016-02-21 NOP DET 20 4 0 1 59 53.9 2015-16 false 2016
## 10 duranke01 2021-06-15 BRK MIL 17 10 3 2 49 50.4 2020-21 true 2021
## 11 embiijo01 2017-11-15 PHI LAL 15 7 0 7 46 47.9 2017-18 false 2018
## 12 hardeja01 2018-01-30 HOU ORL 10 11 4 1 60 56.6 2017-18 false 2018
## 13 irvinky01 2015-03-12 CLE SAS 3 5 4 0 57 48.2 2014-15 false 2015
## 14 jamesle01 2017-11-03 CLE WAS 11 7 3 2 57 53.2 2017-18 false 2018
## 15 jordami01 1990-03-28 CHI CLE 18 6 4 1 69 64.6 1989-90 false 1990
## 16 lillada01 2016-02-19 POR GSW 0 7 6 0 51 47.9 2015-16 false 2016
## 17 malonka01 1990-01-27 UTA MIL 18 2 3 0 61 60.2 1989-90 false 1990
## 18 millere01 1992-11-28 IND CHH 5 8 1 0 57 50.4 1992-93 false 1993
## 19 olajuha01 1987-05-14 HOU SEA 25 2 2 6 49 48.6 1986-87 true 1987
## 20 paulch01 2015-04-01 LAC POR 5 17 4 0 41 47.5 2014-15 false 2015
## 21 robinda01 1993-01-16 SAS CHH 14 3 0 7 52 48.0 1992-93 false 1993
## 22 stoudam01 2008-11-05 PHO IND 11 6 5 2 49 50.2 2008-09 false 2009
## 23 tatumja01 2021-04-30 BOS SAS 8 5 0 1 60 47.9 2020-21 false 2021
## 24 vanvlfr01 2021-02-02 TOR ORL 3 2 3 3 54 50.3 2020-21 false 2021
## 25 wadedw01 2009-03-09 MIA CHI 6 12 4 3 48 49.0 2008-09 false 2009
## 26 youngtr01 2022-01-03 ATL POR 4 14 0 0 56 51.6 2021-22 false 2022
## GameIndex GmScMovingZ GmScMovingZTop2Delta Date2 GmSc2 GmScMovingZ2
## 1 476 3.30 0.48 2015-03-09 29.6 2.82
## 2 818 4.16 0.85 2008-04-05 42.5 3.31
## 3 382 2.87 0.13 2009-12-18 36.8 2.74
## 4 828 3.60 0.47 1997-11-29 39.6 3.13
## 5 146 5.03 1.99 2022-03-24 45.8 3.04
## 6 784 4.43 0.77 2002-01-14 46.3 3.66
## 7 240 6.48 3.56 1992-01-24 24.6 2.92
## 8 389 3.69 0.73 2020-10-04 42.0 2.96
## 9 251 3.49 0.53 2019-12-08 48.5 2.96
## 10 1032 3.47 0.49 2012-02-19 42.1 2.98
## 11 42 4.06 1.09 2021-02-19 50.8 2.97
## 12 744 3.60 0.23 2009-12-07 27.1 3.37
## 13 242 4.13 0.49 2022-03-15 53.8 3.64
## 14 1286 3.72 0.05 2014-03-03 48.8 3.67
## 15 450 4.26 1.06 1996-03-07 47.7 3.20
## 16 309 3.74 0.23 2017-04-08 48.6 3.51
## 17 387 4.31 0.18 1998-04-07 46.2 4.13
## 18 424 4.56 1.11 2003-11-15 30.0 3.45
## 19 259 3.56 0.29 2001-04-17 26.2 3.27
## 20 745 3.82 0.60 2018-05-08 40.1 3.22
## 21 279 3.30 0.08 1994-04-24 51.8 3.22
## 22 421 3.97 1.02 2011-02-06 38.4 2.95
## 23 326 3.15 0.17 2020-01-11 39.2 2.98
## 24 299 4.44 1.36 2018-01-24 24.4 3.08
## 25 431 3.07 0.02 2018-11-25 27.8 3.05
## 26 252 3.44 1.29 2020-01-26 39.4 2.15
## ScoringShare HighlyUnexpected
## 1 1.0317460 FALSE
## 2 1.2252964 FALSE
## 3 1.2552301 FALSE
## 4 1.0646388 FALSE
## 5 1.2844037 TRUE
## 6 1.2755906 FALSE
## 7 1.0685484 TRUE
## 8 1.0097087 FALSE
## 9 1.0946197 FALSE
## 10 0.9722222 FALSE
## 11 0.9603340 FALSE
## 12 1.0600707 FALSE
## 13 1.1825726 FALSE
## 14 1.0714286 FALSE
## 15 1.0681115 FALSE
## 16 1.0647182 FALSE
## 17 1.0132890 FALSE
## 18 1.1309524 FALSE
## 19 1.0082305 FALSE
## 20 0.8631579 FALSE
## 21 1.0833333 FALSE
## 22 0.9760956 FALSE
## 23 1.2526096 FALSE
## 24 1.0735586 FALSE
## 25 0.9795918 FALSE
## 26 1.0852713 FALSE
All of these outliers look like true elite performances rather than potential data anomalies. There do not seem to be any unexpected performances that are so much worse than a player’s standards that it is considered an outlier in this dataset.
mean_gmsc <- mean(nba$GmSc, na.rm = TRUE)
sd_gmsc <- sd(nba$GmSc, na.rm = TRUE)
n <- sum(!is.na(nba$GmSc))
margin_error <- qt(0.975, df = n - 1) * (sd_gmsc / sqrt(n))
lower_ci <- mean_gmsc - margin_error
upper_ci <- mean_gmsc + margin_error
c(lower_ci, upper_ci)
## [1] 24.73472 25.53962
From this confidence interval that I have calculate, we can conclude with 95% confidence that the true mean Game Score for unexpected performances lies between 24.7347 and 25.5396. The relatively narrow width of this interval suggests that our estimate is precise. However, this estimate applies specifically to the population of each player’s most statistically surprising game, not to all NBA games. Because the dataset only includes peak unexpected performances, the average Game Score is elevated relative to typical NBA performance. Therefore, conclusions drawn from this interval must be interpreted within the context of how the data was constructed.
Does a player’s position affect PointShare? Is the GmSc mean stable across eras or has it gradually increased over time? How can we refine the threshold for what is truly “unexpected”?