MSG, or “The Garden”, is a historic arena located in the heart of Manhattan. The Garden opened on February 11, 1968, making it the oldest major sporting facility in the New York metropolitan area and the oldest active arena in the NBA. Beyond basketball, the arena hosts concerts, boxing matches, political events, and other major cultural performances, making it one of the most recognizable venues in the world. The Garden is especially iconic in the world of basketball, with many revering it as the “Mecca of Basketball”.
MSG is the premier indoor venue of New York City–the most visited and densely populated city in the United States, as well as the nation’s largest media market. Events held at MSG often receive disproportionate national and international attention, and NBA games played at The Garden are more likely to be nationally televised, widely discussed in sports media, and preserved in highlight compilations than games played elsewhere.
MSG is also the home court of the New York Knicks, one of the NBA’s most valuable and widely followed franchises, regardless of on-court success. Knicks fans are notoriously vocal, expressive, and unafraid to engage with players. Courtside seats are typically filled with famous actors, musicians, athletes, and public figures, many of whom are highlighted on broadcasts and arena displays. On any given night, there are dozens of high-profile celebrities present throughout the arena, creating a feeling that the game is a public showcase, rather than just another regular-season game. Given the arena’s historic status, visibility, and the fans it brings in, performances at MSG are perceived as particularly meaningful and memorable.
In line with this, MSG boasts a long list of iconic player performances. Michael Jordan scored 55 points in his “double-nickel” return game to MSG on March 28, 1995 after his first retirement, definitively reinstating himself as the face of the league. On February 2, 2009, Kobe Bryant set the record for most points scored in a game at The Garden with a 61-point performance, solidifying his legacy as one of the best scorers the NBA has ever seen. Stephen Curry’s breakout game on February 27, 2013, where he made 11 out of his 13 three-point attempts to score 54 points at The Garden, is often referenced as the moment he elevated to star status. Nearly nine years later at The Garden, Curry hit his 2974th career three-pointer, surpassing Ray Allen and breaking the NBA’s all-time three point scoring record. The list goes on.
This combination of historical performances, celebrity presence, media visibility, and competitive intensity has brought on a widely-accepted narrative that the pressures associated with MSG uniquely influence individual games and player performances. Clutch star players rise to the occasion with standout performances, while others “shrink under the lights” and struggle to perform, further entrenching the belief that the arena itself exerts an influence on performance.
While perceptions of the “MSG Effect” are deeply ingrained in basketball culture, they are rooted in isolated moments, media framing, and retrospective storytelling rather than statistics. The present analysis seeks to address this gap by examining whether this “MSG Effect” is actually detectable in statistics, or if it is simply a product of our narrative-obsessed imagination.
seasons <- 2002:most_recent_nba_season()
# Let's download game-level schedule data for every game played in this era.
sched <- load_nba_schedule(seasons = seasons)
# Only standard NBA games (excludes ALLSTAR, USA/WORLD, EAST/WEST, etc.)
sched <- sched %>%
filter(type_abbreviation == "STD")
nba_abbrevs <- sched %>%
select(home_abbreviation, away_abbreviation) %>%
pivot_longer(cols = everything(), values_to = "team_abbreviation") %>%
distinct(team_abbreviation)
# Let's create a dataset with only games played at MSG.
msg_games <- sched %>%
filter(venue_full_name == "Madison Square Garden") %>% # venue name is in schedule data :contentReference[oaicite:3]{index=3}
transmute(
game_id,
season,
season_type,
game_date,
venue_full_name,
home_abbreviation,
away_abbreviation,
home_score,
away_score,
home_winner,
neutral_site
)
# Cleaning MSG schedule data to only include Knicks regular season and playoff games.
msg_games %>% count(season_type, sort = TRUE)
## ── ESPN NBA Schedule from hoopR data repository ───────────────── hoopR 2.1.0 ──
## ℹ Data updated: 2025-12-18 07:28:59 EST
## # A tibble: 1 × 2
## season_type n
## <int> <int>
## 1 2 1001
msg_games %>% count(home_abbreviation, sort = TRUE) %>% head(10)
## ── ESPN NBA Schedule from hoopR data repository ───────────────── hoopR 2.1.0 ──
## ℹ Data updated: 2025-12-18 07:28:59 EST
## # A tibble: 3 × 2
## home_abbreviation n
## <chr> <int>
## 1 NY 999
## 2 EAST 1
## 3 IND 1
msg_knicks_home_games <- msg_games %>%
filter(home_abbreviation == "NY", neutral_site == FALSE)
msg_knicks_home_games %>%
count(season_type, sort = TRUE)
## ── ESPN NBA Schedule from hoopR data repository ───────────────── hoopR 2.1.0 ──
## ℹ Data updated: 2025-12-18 07:28:59 EST
## # A tibble: 1 × 2
## season_type n
## <int> <int>
## 1 2 999
# Load player box scores for all games in all seasons.
pb <- load_nba_player_box(seasons = seasons)
pb %>%
filter(team_abbreviation %in% c("NY", "NYK")) %>%
count(team_abbreviation, sort = TRUE)
## ── ESPN NBA Player Boxscores from hoopR data repository ───────── hoopR 2.1.0 ──
## ℹ Data updated: 2025-12-18 07:28:04 EST
## # A tibble: 1 × 2
## team_abbreviation n
## <chr> <int>
## 1 NY 26747
# Let's add some composite measures of offensive and defensive stat creation.
pb <- pb %>%
filter(!did_not_play, minutes > 0) %>%
mutate(
# True Shooting Percentage
denom = 2 * (field_goals_attempted + 0.44 * free_throws_attempted),
ts = if_else(denom > 0, points / denom, NA_real_),
# Composite performance metrics
offensive_output = points + rebounds + assists,
defensive_output = steals + blocks
)
# Create dataset of all player box scores only from games at MSG. Categorize home/away players. Calculate TS%.
pb_msg <- pb %>%
inner_join(
msg_knicks_home_games,
by = c("game_id", "season", "season_type", "game_date")
) %>%
mutate(
at_msg = TRUE,
is_knicks = (team_abbreviation == "NY"),
is_home = (home_away == "home"),
is_away = (home_away == "away"),
ts = points / (2 * (field_goals_attempted + 0.44 * free_throws_attempted))
)
pb_road_flagged <- pb %>%
filter(home_away == "away", !did_not_play, minutes > 0) %>%
left_join(
msg_knicks_home_games %>% transmute(game_id, at_msg = TRUE),
by = "game_id"
) %>%
mutate(
at_msg = if_else(is.na(at_msg), FALSE, at_msg),
ts = points / (2 * (field_goals_attempted + 0.44 * free_throws_attempted))
)
# Let's find each team's home court advantage (average total points scored at home - average total points scored away).
non_nba_teams <- c("EAST", "WEST", "USA", "WORLD", "GIA", "LEB")
team_game <- pb %>%
filter(
!did_not_play,
minutes > 0,
) %>%
group_by(game_id, team_abbreviation, home_away) %>%
summarise(
team_points = sum(points, na.rm = TRUE),
.groups = "drop"
)
team_home_away <- team_game %>%
filter(!team_abbreviation %in% non_nba_teams) %>%
group_by(team_abbreviation, home_away) %>%
summarise(
avg_points = mean(team_points, na.rm = TRUE),
.groups = "drop"
)
team_home_advantage <- team_home_away %>%
pivot_wider(
names_from = home_away,
values_from = avg_points
) %>%
mutate(
home_court_advantage = home - away
) %>%
select(team_abbreviation, home_court_advantage)
nba_abbrevs <- sched %>%
select(home_abbreviation, away_abbreviation) %>%
pivot_longer(
cols = everything(),
values_to = "team_abbreviation"
) %>%
distinct(team_abbreviation)
team_home_advantage_nba <- team_home_advantage %>%
semi_join(nba_abbrevs, by = "team_abbreviation")
knicks_abbrevs <- c("NY", "NYK")
team_home_advantage_ranked <- team_home_advantage_nba %>%
mutate(is_knicks = team_abbreviation %in% knicks_abbrevs) %>%
arrange(desc(home_court_advantage)) %>%
mutate(rank = row_number())
# Display:
knicks_row <- team_home_advantage_ranked %>%
filter(is_knicks)
display_table <- team_home_advantage_ranked %>%
filter(!team_abbreviation %in% non_nba_teams) %>%
mutate(
home_court_advantage = round(home_court_advantage, 2)
) %>%
select(rank, team_abbreviation, home_court_advantage)
knitr::kable(
display_table,
caption = "Team-Level Home Court Advantage (Home − Away Points)"
)
| rank | team_abbreviation | home_court_advantage |
|---|---|---|
| 1 | DEN | 4.49 |
| 2 | POR | 4.23 |
| 3 | ATL | 3.80 |
| 4 | WSH | 3.69 |
| 5 | MIL | 3.62 |
| 6 | MIA | 3.50 |
| 7 | SAC | 3.50 |
| 8 | GS | 3.47 |
| 9 | OKC | 3.47 |
| 10 | SA | 3.43 |
| 11 | UTAH | 3.35 |
| 12 | IND | 3.35 |
| 13 | NJ | 3.16 |
| 14 | NO | 3.02 |
| 15 | DAL | 3.00 |
| 16 | CLE | 2.99 |
| 17 | ORL | 2.90 |
| 18 | MEM | 2.75 |
| 19 | TOR | 2.74 |
| 20 | DET | 2.63 |
| 21 | PHX | 2.54 |
| 22 | LAL | 2.52 |
| 23 | NY | 2.47 |
| 24 | CHA | 2.41 |
| 25 | PHI | 2.27 |
| 26 | BOS | 2.26 |
| 27 | SEA | 2.04 |
| 28 | LAC | 1.96 |
| 29 | HOU | 1.64 |
| 30 | MIN | 1.02 |
| 31 | CHI | 0.88 |
| 32 | BKN | 0.48 |
league_home_away <- pb %>%
filter(!did_not_play, minutes > 0) %>%
mutate(
location = if_else(home_away == "home", "Home", "Away"),
ts = points / (2 * (field_goals_attempted + 0.44 * free_throws_attempted))
) %>%
group_by(location) %>%
summarise(
games = n_distinct(game_id),
pts = mean(points, na.rm = TRUE),
ts = mean(ts, na.rm = TRUE),
tov = mean(turnovers, na.rm = TRUE),
offensive_output = mean(offensive_output, na.rm = TRUE),
defensive_output = mean(defensive_output, na.rm = TRUE),
.groups = "drop"
)
league_home_away
## # A tibble: 2 × 7
## location games pts ts tov offensive_output defensive_output
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Away 31083 9.83 0.519 1.33 16.0 1.18
## 2 Home 31083 10.1 0.531 1.31 16.6 1.23
t.test(points ~ home_away, data = pb)
##
## Welch Two Sample t-test
##
## data: points by home_away
## t = -13.275, df = 642733, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group away and group home is not equal to 0
## 95 percent confidence interval:
## -0.3154677 -0.2342968
## sample estimates:
## mean in group away mean in group home
## 9.831734 10.106616
# Players score more points (+0.27 PTS/G) at home vs. away (p-value < .001).
t.test(ts ~ home_away, data = pb)
##
## Welch Two Sample t-test
##
## data: ts by home_away
## t = -18.289, df = 615885, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group away and group home is not equal to 0
## 95 percent confidence interval:
## -0.01316832 -0.01061907
## sample estimates:
## mean in group away mean in group home
## 0.5190604 0.5309541
# Players score more efficiently (+1.19 TS%) at home vs. away (p-value < .001).
t.test(turnovers ~ home_away, data = pb)
##
## Welch Two Sample t-test
##
## data: turnovers by home_away
## t = 7.7773, df = 642729, p-value = 7.42e-15
## alternative hypothesis: true difference in means between group away and group home is not equal to 0
## 95 percent confidence interval:
## 0.02042106 0.03418153
## sample estimates:
## mean in group away mean in group home
## 1.332918 1.305616
# Players turn the ball over less often (-0.03 TO/G) at home vs. away (p-value < .001).
t.test(offensive_output ~ home_away, data = pb)
##
## Welch Two Sample t-test
##
## data: offensive_output by home_away
## t = -18.012, df = 642575, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group away and group home is not equal to 0
## 95 percent confidence interval:
## -0.5793055 -0.4656053
## sample estimates:
## mean in group away mean in group home
## 16.03439 16.55684
# Players produce more offensive stats (+0.52) at home vs. away (p-value < .001).
t.test(defensive_output ~ home_away, data = pb)
##
## Welch Two Sample t-test
##
## data: defensive_output by home_away
## t = -14.845, df = 641993, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group away and group home is not equal to 0
## 95 percent confidence interval:
## -0.05663887 -0.04342695
## sample estimates:
## mean in group away mean in group home
## 1.178529 1.228562
# Players produce more defensive stats (+0.05) at home vs. away (p-value < .001).
league_long <- league_home_away %>%
pivot_longer(
cols = c(pts, ts, tov, offensive_output, defensive_output),
names_to = "metric",
values_to = "value"
) %>%
mutate(
metric = recode(metric,
pts = "Points per player-game",
ts = "True Shooting (TS%)",
tov = "Turnovers per player-game",
offensive_output = "Offensive Output (PTS + REB + AST)",
defensive_output = "Defensive Output (STL + BLK)"
)
)
league_long <- league_long %>%
group_by(metric) %>%
mutate(z_value = (value - mean(value)) / sd(value)) %>%
ungroup()
# Bar plot
ggplot(league_long, aes(x = location, y = value, fill = location)) +
geom_col(width = .85) +
facet_wrap(~ metric, scales = "free_y") +
labs(
title = "League-Wide Home vs Away Performance",
x = NULL,
y = NULL
) +
theme_minimal(base_size = 12) +
theme(
legend.position = "none",
strip.text = element_text(face = "bold")
)
# Do visiting players post better or worse averages at MSG compared to other arenas?
# Note: To isolate the effect of Madison Square Garden on visiting teams, analyses were restricted to away games only. As a result, all Knicks home games were excluded, and MSG performances reflect exclusively visiting team data.
opponent_msg_summary <- pb_road_flagged %>%
group_by(at_msg) %>%
summarise(
games = n_distinct(game_id),
pts = mean(points, na.rm = TRUE),
ts = mean(ts, na.rm = TRUE),
tov = mean(turnovers, na.rm = TRUE),
offensive_output = mean(offensive_output, na.rm = TRUE),
defensive_output = mean(defensive_output, na.rm = TRUE),
.groups = "drop"
)
opponent_msg_summary
## # A tibble: 2 × 7
## at_msg games pts ts tov offensive_output defensive_output
## <lgl> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 FALSE 30113 9.83 0.519 1.33 16.0 1.18
## 2 TRUE 970 9.95 0.529 1.30 16.0 1.13
t.test(points ~ at_msg, data = pb_road_flagged)
##
## Welch Two Sample t-test
##
## data: points by at_msg
## t = -1.3879, df = 10655, p-value = 0.1652
## alternative hypothesis: true difference in means between group FALSE and group TRUE is not equal to 0
## 95 percent confidence interval:
## -0.28513634 0.04873972
## sample estimates:
## mean in group FALSE mean in group TRUE
## 9.828047 9.946245
# The difference in scoring (+0.12 PTS/G) by visiting players at MSG compared to other arenas is not statistically significant (p-value = 0.14).
t.test(ts ~ at_msg, data = pb_road_flagged)
##
## Welch Two Sample t-test
##
## data: ts by at_msg
## t = -3.6774, df = 10195, p-value = 0.0002369
## alternative hypothesis: true difference in means between group FALSE and group TRUE is not equal to 0
## 95 percent confidence interval:
## -0.015069238 -0.004590022
## sample estimates:
## mean in group FALSE mean in group TRUE
## 0.5187547 0.5285843
# The difference in shooting efficiency (+1.0 TS%) at MSG compared to other arenas *is* statistically significant (p-value < .001).
t.test(turnovers ~ at_msg, data = pb_road_flagged)
##
## Welch Two Sample t-test
##
## data: turnovers by at_msg
## t = 2.2579, df = 10703, p-value = 0.02397
## alternative hypothesis: true difference in means between group FALSE and group TRUE is not equal to 0
## 95 percent confidence interval:
## 0.004224023 0.059839513
## sample estimates:
## mean in group FALSE mean in group TRUE
## 1.333917 1.301885
# The difference in turnovers committed (-0.03 TO/G) at MSG compared to other arenas *is* statistically significant (p-value = .029).
t.test(offensive_output ~ at_msg, data = pb_road_flagged)
##
## Welch Two Sample t-test
##
## data: offensive_output by at_msg
## t = -0.1314, df = 10660, p-value = 0.8955
## alternative hypothesis: true difference in means between group FALSE and group TRUE is not equal to 0
## 95 percent confidence interval:
## -0.2477500 0.2166222
## sample estimates:
## mean in group FALSE mean in group TRUE
## 16.03390 16.04947
# The difference in offensive stat creation (+0.10) is not statistically significant (p-value = 0.87).
t.test(defensive_output ~ at_msg, data = pb_road_flagged)
##
## Welch Two Sample t-test
##
## data: defensive_output by at_msg
## t = 3.6927, df = 10716, p-value = 0.000223
## alternative hypothesis: true difference in means between group FALSE and group TRUE is not equal to 0
## 95 percent confidence interval:
## 0.02275366 0.07424042
## sample estimates:
## mean in group FALSE mean in group TRUE
## 1.180042 1.131545
# The difference in defensive stat creation (+0.05) *is* statistically significant (p-value < .001).
road_means <- pb_road_flagged %>%
group_by(at_msg) %>%
summarise(
ts = mean(ts, na.rm = TRUE),
turnovers = mean(turnovers, na.rm = TRUE),
defensive_output = mean(defensive_output, na.rm = TRUE),
.groups = "drop"
) %>%
mutate(
location = if_else(at_msg, "Away at MSG", "Away (Other)")
) %>%
select(location, ts, turnovers, defensive_output)
road_long <- road_means %>%
pivot_longer(
cols = c(ts, turnovers, defensive_output),
names_to = "metric",
values_to = "value"
) %>%
mutate(
metric = recode(metric,
ts = "True Shooting (TS%)",
turnovers = "Turnovers per player-game",
defensive_output = "Defensive Output (STL + BLK)"
),
location = factor(location, levels = c("Away (Other)", "Away at MSG"))
)
ggplot(road_long, aes(x = location, y = value, fill = location)) +
geom_col(width = 0.8) +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) +
facet_wrap(~ metric, scales = "free_y") +
geom_hline(yintercept = 0, linetype = "dashed") +
labs(
title = "Visiting Player Performance: MSG vs Other Away Arenas",
x = NULL,
y = NULL
) +
theme_minimal(base_size = 12) +
theme(
legend.position = "none",
strip.text = element_text(face = "bold")
)
player_msg_overall <- pb_road_flagged %>%
filter(at_msg == TRUE, !did_not_play, minutes >= 15) %>%
mutate(
total_output = offensive_output + defensive_output
) %>%
group_by(athlete_id, athlete_display_name) %>%
summarise(
games = n(),
avg_pts = mean(points, na.rm = TRUE),
avg_ts = mean(ts, na.rm = TRUE),
avg_off = mean(offensive_output, na.rm = TRUE),
avg_def = mean(defensive_output, na.rm = TRUE),
avg_tot = mean(total_output, na.rm = TRUE),
.groups = "drop"
)
player_msg_overall_clean <- player_msg_overall %>%
filter(games >= 8)
# Here are the top 20 players with the highest total outputs at MSG.
player_msg_overall_clean %>%
arrange(desc(avg_tot)) %>%
select(
athlete_display_name,
games,
avg_pts,
avg_ts,
avg_off,
avg_def,
avg_tot
) %>%
head(20)
## # A tibble: 20 × 7
## athlete_display_name games avg_pts avg_ts avg_off avg_def avg_tot
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Kobe Bryant 12 33.9 0.623 44.1 1.83 45.9
## 2 Anthony Davis 9 28.6 0.602 42.6 3 45.6
## 3 LeBron James 31 28.2 0.590 42.7 2.58 45.3
## 4 Kevin Durant 11 31.2 0.659 43.4 1.55 44.9
## 5 James Harden 14 27.6 0.633 42.1 2.71 44.9
## 6 Joel Embiid 10 27.7 0.590 41.4 2.5 43.9
## 7 Giannis Antetokounmpo 20 23.4 0.593 40.0 2.5 42.4
## 8 Stephen Curry 12 28.3 0.623 39.9 1.83 41.8
## 9 Russell Westbrook 17 22.2 0.530 39.1 1.94 41
## 10 Allen Iverson 12 26.6 0.469 38.1 2.5 40.6
## 11 Trae Young 11 25.6 0.529 38.4 1.18 39.5
## 12 Devin Booker 9 31.2 0.613 37.9 1.33 39.2
## 13 Nikola Jokic 10 23.4 0.657 37.9 1.2 39.1
## 14 Dirk Nowitzki 15 26.5 0.636 36.5 1.47 38
## 15 Jayson Tatum 14 23.6 0.572 35.1 2.57 37.6
## 16 DeMarcus Cousins 9 20.2 0.528 34.3 3.22 37.6
## 17 Zach LaVine 10 26.8 0.615 35.5 1.7 37.2
## 18 Kyrie Irving 12 26 0.593 35.9 1.25 37.2
## 19 Donovan Mitchell 10 25.9 0.573 35.4 1.6 37
## 20 Tracy McGrady 11 23.1 0.516 34.1 2.27 36.4
# Here are the top 20 players with the highest true shooting % at MSG.
player_msg_overall_clean %>%
arrange(desc(avg_ts)) %>%
select(
athlete_display_name,
games,
avg_pts,
avg_ts,
avg_off,
avg_def
) %>%
head(20)
## # A tibble: 20 × 6
## athlete_display_name games avg_pts avg_ts avg_off avg_def
## <chr> <int> <dbl> <dbl> <dbl> <dbl>
## 1 Joe Ingles 8 10.6 0.774 17.1 0.625
## 2 Patrick Patterson 10 8.5 0.772 13.3 1.6
## 3 DeAndre Jordan 12 11.1 0.749 21.7 2.08
## 4 Rudy Gobert 11 12.4 0.739 22.5 2.27
## 5 Kevin Martin 8 24.2 0.712 30 1.38
## 6 Jae Crowder 11 14.1 0.690 19.4 1.09
## 7 Jonas Jerebko 10 8.5 0.685 14.7 1
## 8 Wally Szczerbiak 9 16.4 0.684 22.7 1
## 9 Richaun Holmes 8 10.2 0.682 16.9 1.5
## 10 Joe Harris 9 9.67 0.682 13.7 0.667
## 11 Nick Collison 8 6.88 0.680 12.8 0.875
## 12 Blake Griffin 10 21 0.676 30.7 1.6
## 13 Cameron Johnson 8 16.2 0.674 21.5 0.75
## 14 Kelly Olynyk 14 10.6 0.671 16.8 0.786
## 15 Domantas Sabonis 12 19.1 0.670 34.1 0.75
## 16 JJ Redick 15 16 0.666 20.4 0.333
## 17 Ed Davis 9 9.67 0.665 17.8 1
## 18 Draymond Green 12 10.6 0.664 23.8 2.92
## 19 Corey Maggette 10 18.9 0.661 25 0.9
## 20 Kevin Durant 11 31.2 0.659 43.4 1.55
# Let's compute every player's MSG advantage score = (average offensive + defensive output at MSG away games - average offensive + defensive output at other away games).
player_msg_advantage <- pb_road_flagged %>%
filter(!did_not_play, minutes >= 15) %>%
mutate(total_output = offensive_output + defensive_output) %>%
group_by(athlete_id, athlete_display_name, at_msg) %>%
summarise(
games = n(),
avg_total = mean(total_output, na.rm = TRUE),
avg_pts = mean(points, na.rm = TRUE),
avg_ts = mean(ts, na.rm = TRUE),
.groups = "drop"
) %>%
pivot_wider(
names_from = at_msg,
values_from = c(games, avg_total, avg_pts, avg_ts)
) %>%
mutate(
msg_adv_total = avg_total_TRUE - avg_total_FALSE,
msg_adv_pts = avg_pts_TRUE - avg_pts_FALSE,
msg_adv_ts = avg_ts_TRUE - avg_ts_FALSE
)
player_msg_advantage <- player_msg_advantage %>%
filter(games_TRUE >= 8)
# Let's identify the top MSG risers and chokers.
msg_extremes <- bind_rows(
player_msg_advantage %>% arrange(desc(msg_adv_total)) %>% slice_head(n = 20) %>% mutate(group = "MSG Risers"),
player_msg_advantage %>% arrange(msg_adv_total) %>% slice_head(n = 20) %>% mutate(group = "MSG Chokers")
) %>%
mutate(
athlete_display_name = factor(athlete_display_name, levels = athlete_display_name[order(msg_adv_total)])
)
msg_extremes <- msg_extremes
ggplot(msg_extremes, aes(x = msg_adv_total, y = athlete_display_name, fill = group)) +
geom_col(width = 0.75) +
facet_wrap(~ group, scales = "free_y") +
geom_vline(xintercept = 0, linetype = "dashed", color = "grey40") +
labs(
title = "MSG Risers and Chokers: Total Stat Output",
subtitle = "Within-player difference in total output at MSG vs other away games",
x = "MSG Total Output − Other Away Total Output",
y = NULL
) +
theme_minimal(base_size = 12) +
theme(legend.position = "none", strip.text = element_text(face = "bold"))
ts_extremes <- bind_rows(
player_msg_advantage %>% arrange(desc(msg_adv_ts)) %>% slice_head(n = 20) %>% mutate(group = "TS Risers"),
player_msg_advantage %>% arrange(msg_adv_ts) %>% slice_head(n = 20) %>% mutate(group = "TS Chokers")
) %>%
mutate(
athlete_display_name = factor(athlete_display_name, levels = athlete_display_name[order(msg_adv_ts)])
)
ggplot(ts_extremes, aes(x = msg_adv_ts, y = athlete_display_name, fill = group)) +
geom_col(width = 0.75) +
facet_wrap(~ group, scales = "free_y") +
geom_vline(xintercept = 0, linetype = "dashed", color = "grey40") +
labs(
title = "MSG Risers and Chokers: Shooting Efficiency",
subtitle = "Within-player difference in True Shooting at MSG vs other away games",
x = "MSG Advantage (TS%)",
y = NULL
) +
theme_minimal(base_size = 12) +
theme(legend.position = "none", strip.text = element_text(face = "bold"))
# Let's make a dataset using recent all-stars from 2024 and 2025.
recent_all_stars <- c(
"LeBron James", "Stephen Curry", "Kevin Durant", "Giannis Antetokounmpo",
"Nikola Jokic", "Joel Embiid", "Luka Doncic", "Jayson Tatum",
"Jimmy Butler", "Damian Lillard", "Anthony Davis", "Kawhi Leonard",
"Shai Gilgeous-Alexander", "Devin Booker", "Jaylen Brown", "Kyrie Irving",
"Tyrese Haliburton", "Donovan Mitchell", "Bam Adebayo", "Jalen Brunson",
"Anthony Edwards", "Julius Randle", "Trae Young", "Pascal Siakam",
"James Harden", "Jalen Williams", "Evan Mobley", "Victor Wembanyama",
"Cade Cunningham", "Tyler Herro", "Jaren Jackson Jr.", "Darius Garland",
"Alperen Sengun", "Tyrese Maxey", "Paolo Banchero", "Scottie Barnes"
)
# Remove the MSG game minimum for the young guys on this list. (This is ugly but I figured it'd work)
player_msg_advantage1 <- pb_road_flagged %>%
filter(!did_not_play, minutes >= 15) %>%
mutate(total_output = offensive_output + defensive_output) %>%
group_by(athlete_id, athlete_display_name, at_msg) %>%
summarise(
games = n(),
avg_total = mean(total_output, na.rm = TRUE),
avg_pts = mean(points, na.rm = TRUE),
avg_ts = mean(ts, na.rm = TRUE),
.groups = "drop"
) %>%
pivot_wider(
names_from = at_msg,
values_from = c(games, avg_total, avg_pts, avg_ts)
) %>%
mutate(
msg_adv_total = avg_total_TRUE - avg_total_FALSE,
msg_adv_pts = avg_pts_TRUE - avg_pts_FALSE,
msg_adv_ts = avg_ts_TRUE - avg_ts_FALSE
)
allstar_msg_adv <- player_msg_advantage1 %>%
filter(athlete_display_name %in% recent_all_stars) %>%
mutate(
msg_adv_ts = msg_adv_ts * 100)
# Let's make a scatterplot with TS% change on the y axis, offensive output change on the x axis.
library(ggrepel)
## Warning: package 'ggrepel' was built under R version 4.4.2
ggplot(player_msg_advantage1,
aes(x = msg_adv_total, y = msg_adv_ts)) +
geom_point(
data = allstar_msg_adv,
size = 3,
) +
geom_text_repel(
data = allstar_msg_adv,
aes(label = athlete_display_name),
size = 3,
color = "blue",
max.overlaps = Inf
) +
geom_hline(yintercept = 0, linetype = "dashed", color = "grey50") +
geom_vline(xintercept = 0, linetype = "dashed", color = "grey50") +
labs(
title = "How Recent NBA All-Stars Perform at Madison Square Garden",
subtitle = "Differences in offensive output (x) and shooting efficiency (y) at MSG",
x = "Change in Total Stat (Offensive + Defensive) Output",
y = "Change in TS percentage points"
) +
theme_minimal(base_size = 12)