Pair 1 (Explanatory vs Response)

The response variable is GmSc (Game Score) and the explanatory variable is PTS (Points). We mutate these to create a new metric called PointShare that measures how much of a player’s Game Score comes from just scoring points.

nba <- nba |>
  mutate(
    ScoringShare = PTS / GmSc,
  )

Visualization 1 (GmSc vs PointShare)

nba |>
  ggplot(aes(x = ScoringShare, y = GmSc)) +
  geom_point(alpha = 0.6) +
  theme_minimal() +
  labs(
    title = "Game Score vs PointShare",
    x = "Proportion of Game Score from Points",
    y = "Game Score"
  )

This visualization shows how moderate PointShare proportions around 1.0 tend to produce the highest Game Scores. This suggests that balanced stat lines with more rebounds, assists, steals, etc. matter significantly when it comes to producing high Game Scores.

Correlation 1

cor(nba$ScoringShare, nba$GmSc, use = "complete.obs")
## [1] 0.1165962

The correlation being this low proves that Game Score depends on a lot more factors than just scoring.

Pair 2 (Unexpectedness)

The original variable I am using is GmScMovingZ, and I am creating a new variable called HighlyUnexpected that converts unexpectedness into an ordered numeric indicator. I will use this to determine if extreme unexpected performances correspond to higher raw Game Scores.

nba <- nba |>
  mutate(
    HighlyUnexpected = abs(GmScMovingZ) > 5
  )

Visualization 2 (GmSc vs Unexpectedness)

nba |>
  ggplot(aes(x = factor(HighlyUnexpected), y = GmSc)) +
  geom_boxplot(fill = "steelblue", alpha = 0.7) +
  theme_minimal() +
  labs(
    title = "Game Score by Unexpectedness Level",
    x = "Highly Unexpected Performance (|Z| > 5)",
    y = "Game Score"
  )

This visualization shows that while extreme z-scores above 5 in this dataset are typically associated with slightly higher Game Scores on average, there are a lot of cases where extremely high games scores that could maybe even be outliers are not considered very unexpected due to the player’s relative baseline. It has more to do with the individual players rather than raw performance metrics.

Correlation 2

cor(nba$HighlyUnexpected, nba$GmSc, use = "complete.obs")
## [1] 0.04303588

This extremely low correlation value makes sense because it was proven that unexpectedness does not really depend on GmSc at all.

Outlier Check

Q1 <- quantile(nba$GmSc, 0.25)
Q3 <- quantile(nba$GmSc, 0.75)
IQR_value <- Q3 - Q1

lower <- Q1 - 1.5 * IQR_value
upper <- Q3 + 1.5 * IQR_value

nba |> filter(GmSc < lower | GmSc > upper)
##        bbrID       Date  Tm Opp TRB AST STL BLK PTS GmSc  Season Playoffs Year
## 1  antetgi01 2019-03-17 MIL PHI  16   7   2   1  52 50.4 2018-19    false 2019
## 2  anthoca01 2014-01-24 NYK CHA  13   0   0   0  62 50.6 2013-14    false 2014
## 3  arenagi01 2006-12-17 WAS LAL   8   8   2   0  60 47.8 2006-07    false 2007
## 4  barklch01 1994-05-04 PHO GSW  14   4   3   1  56 52.6 1993-94     true 1994
## 5  bookede01 2017-03-24 PHO BOS   8   6   3   1  70 54.5 2016-17    false 2017
## 6  bryanko01 2006-01-22 LAL TOR   6   2   3   1  81 63.5 2005-06    false 2006
## 7  burtowi01 1994-12-13 PHI MIA   8   3   1   2  53 49.6 1994-95    false 1995
## 8  butleji01 2017-01-02 CHI CHO  12   6   3   1  52 51.5 2016-17    false 2017
## 9  davisan02 2016-02-21 NOP DET  20   4   0   1  59 53.9 2015-16    false 2016
## 10 duranke01 2021-06-15 BRK MIL  17  10   3   2  49 50.4 2020-21     true 2021
## 11 embiijo01 2017-11-15 PHI LAL  15   7   0   7  46 47.9 2017-18    false 2018
## 12 hardeja01 2018-01-30 HOU ORL  10  11   4   1  60 56.6 2017-18    false 2018
## 13 irvinky01 2015-03-12 CLE SAS   3   5   4   0  57 48.2 2014-15    false 2015
## 14 jamesle01 2017-11-03 CLE WAS  11   7   3   2  57 53.2 2017-18    false 2018
## 15 jordami01 1990-03-28 CHI CLE  18   6   4   1  69 64.6 1989-90    false 1990
## 16 lillada01 2016-02-19 POR GSW   0   7   6   0  51 47.9 2015-16    false 2016
## 17 malonka01 1990-01-27 UTA MIL  18   2   3   0  61 60.2 1989-90    false 1990
## 18 millere01 1992-11-28 IND CHH   5   8   1   0  57 50.4 1992-93    false 1993
## 19 olajuha01 1987-05-14 HOU SEA  25   2   2   6  49 48.6 1986-87     true 1987
## 20  paulch01 2015-04-01 LAC POR   5  17   4   0  41 47.5 2014-15    false 2015
## 21 robinda01 1993-01-16 SAS CHH  14   3   0   7  52 48.0 1992-93    false 1993
## 22 stoudam01 2008-11-05 PHO IND  11   6   5   2  49 50.2 2008-09    false 2009
## 23 tatumja01 2021-04-30 BOS SAS   8   5   0   1  60 47.9 2020-21    false 2021
## 24 vanvlfr01 2021-02-02 TOR ORL   3   2   3   3  54 50.3 2020-21    false 2021
## 25  wadedw01 2009-03-09 MIA CHI   6  12   4   3  48 49.0 2008-09    false 2009
## 26 youngtr01 2022-01-03 ATL POR   4  14   0   0  56 51.6 2021-22    false 2022
##    GameIndex GmScMovingZ GmScMovingZTop2Delta      Date2 GmSc2 GmScMovingZ2
## 1        476        3.30                 0.48 2015-03-09  29.6         2.82
## 2        818        4.16                 0.85 2008-04-05  42.5         3.31
## 3        382        2.87                 0.13 2009-12-18  36.8         2.74
## 4        828        3.60                 0.47 1997-11-29  39.6         3.13
## 5        146        5.03                 1.99 2022-03-24  45.8         3.04
## 6        784        4.43                 0.77 2002-01-14  46.3         3.66
## 7        240        6.48                 3.56 1992-01-24  24.6         2.92
## 8        389        3.69                 0.73 2020-10-04  42.0         2.96
## 9        251        3.49                 0.53 2019-12-08  48.5         2.96
## 10      1032        3.47                 0.49 2012-02-19  42.1         2.98
## 11        42        4.06                 1.09 2021-02-19  50.8         2.97
## 12       744        3.60                 0.23 2009-12-07  27.1         3.37
## 13       242        4.13                 0.49 2022-03-15  53.8         3.64
## 14      1286        3.72                 0.05 2014-03-03  48.8         3.67
## 15       450        4.26                 1.06 1996-03-07  47.7         3.20
## 16       309        3.74                 0.23 2017-04-08  48.6         3.51
## 17       387        4.31                 0.18 1998-04-07  46.2         4.13
## 18       424        4.56                 1.11 2003-11-15  30.0         3.45
## 19       259        3.56                 0.29 2001-04-17  26.2         3.27
## 20       745        3.82                 0.60 2018-05-08  40.1         3.22
## 21       279        3.30                 0.08 1994-04-24  51.8         3.22
## 22       421        3.97                 1.02 2011-02-06  38.4         2.95
## 23       326        3.15                 0.17 2020-01-11  39.2         2.98
## 24       299        4.44                 1.36 2018-01-24  24.4         3.08
## 25       431        3.07                 0.02 2018-11-25  27.8         3.05
## 26       252        3.44                 1.29 2020-01-26  39.4         2.15
##    ScoringShare HighlyUnexpected
## 1     1.0317460            FALSE
## 2     1.2252964            FALSE
## 3     1.2552301            FALSE
## 4     1.0646388            FALSE
## 5     1.2844037             TRUE
## 6     1.2755906            FALSE
## 7     1.0685484             TRUE
## 8     1.0097087            FALSE
## 9     1.0946197            FALSE
## 10    0.9722222            FALSE
## 11    0.9603340            FALSE
## 12    1.0600707            FALSE
## 13    1.1825726            FALSE
## 14    1.0714286            FALSE
## 15    1.0681115            FALSE
## 16    1.0647182            FALSE
## 17    1.0132890            FALSE
## 18    1.1309524            FALSE
## 19    1.0082305            FALSE
## 20    0.8631579            FALSE
## 21    1.0833333            FALSE
## 22    0.9760956            FALSE
## 23    1.2526096            FALSE
## 24    1.0735586            FALSE
## 25    0.9795918            FALSE
## 26    1.0852713            FALSE

All of these outliers look like true elite performances rather than potential data anomalies. There do not seem to be any unexpected performances that are so much worse than a player’s standards that it is considered an outlier in this dataset.

Confidence Interval for Response Variable (GmSc)

mean_gmsc <- mean(nba$GmSc, na.rm = TRUE)
sd_gmsc <- sd(nba$GmSc, na.rm = TRUE)
n <- sum(!is.na(nba$GmSc))

margin_error <- qt(0.975, df = n - 1) * (sd_gmsc / sqrt(n))

lower_ci <- mean_gmsc - margin_error
upper_ci <- mean_gmsc + margin_error

c(lower_ci, upper_ci)
## [1] 24.73472 25.53962

From this confidence interval that I have calculate, we can conclude with 95% confidence that the true mean Game Score for unexpected performances lies between 24.7347 and 25.5396. The relatively narrow width of this interval suggests that our estimate is precise. However, this estimate applies specifically to the population of each player’s most statistically surprising game, not to all NBA games. Because the dataset only includes peak unexpected performances, the average Game Score is elevated relative to typical NBA performance. Therefore, conclusions drawn from this interval must be interpreted within the context of how the data was constructed.

Further Questions

Does a player’s position affect PointShare? Is the GmSc mean stable across eras or has it gradually increased over time? How can we refine the threshold for what is truly “unexpected”?