Setup and Data

library(tidyverse)
library(ggplot2)
library(plotly)
library(geosphere)

schedule <- read_csv("~/Downloads/Datasets/schedule.csv")
draft_schedule <- read_csv("~/Downloads/Datasets/schedule_24_partial.csv")
locations <- read_csv("~/Downloads/Datasets/locations.csv")
game_data <- read_csv("~/Downloads/Datasets/team_game_data.csv")

Part 1 – Schedule Analysis

Question 1

QUESTION: How many times are the Thunder scheduled to play 4 games in 6 nights in the provided 80-game draft of the 2024-25 season schedule? (Note: clarification, the stretches can overlap, the question is really “How many games are the 4th game played over the past 6 nights?”)

# Here and for all future questions, feel free to add as many code chunks as you like. Do NOT put echo = F though, we'll want to see your code.

# The idea here is to build a function that counts the number of 4-in-6 
# stretches any team has given the data set is similar.

count_four_in_six <- function(df, team_name = 'OKC'){
  df %>%
    filter(team == team_name) %>%
      arrange(gamedate) %>%
      mutate(
        lag3 = lag(gamedate,3), # Taking note of which game was 3 rows prior.
        four_in_last_6 = (gamedate - lag3) <= 5 # Calculating if game is in range of 4-in-6
      ) %>%
      summarise(total_stretches = sum(four_in_last_6, na.rm=TRUE))

 }


okc_four_in_six <- count_four_in_six(draft_schedule, 'OKC')

okc_four_in_six

## # A tibble: 1 × 1
##   total_stretches
##             <int>
## 1              26

ANSWER 1:

26 4-in-6 stretches in OKC’s draft schedule.

Question 2

QUESTION: From 2014-15 to 2023-24, what is the average number of 4-in-6 stretches for a team in a season? Adjust each team/season to per-82 games before taking your final average.

team_season <- schedule %>%
  group_by(season, team) %>%
  arrange(gamedate, .by_group = TRUE) %>%
  mutate(
    lag3  = lag(gamedate, 3), # Creating a variable that checks game three games prior
    is_4th_in_6 = !is.na(lag3) & as.integer(gamedate - lag3) <= 5 # Determines whether the game being checked is within time frame.
  ) %>%
  summarize(
    games_played = n(),
    four_in_6    = sum(is_4th_in_6), # Summing the total number of 4-in-6s
    .groups = "drop"
  ) %>%
  mutate(four_in_6_per82 = four_in_6 * 82 / games_played) # Normalizing data to 82 games per season

avg_4in6_per_team_season <- mean(team_season$four_in_6_per82, na.rm = TRUE) # Calculating average

avg_4in6_per_team_season

## [1] 25.10331

ANSWER 2:

25.1 4-in-6 stretches on average.

Question 3

QUESTION: Which of the 30 NBA teams has had the highest average number of 4-in-6 stretches between 2014-15 and 2023-24? Which team has had the lowest average? Adjust each team/season to per-82 games.

league_by_team <- team_season %>%
  group_by(team) %>%
  summarise(
    seasons            = n(),                          # seasons included
    four_in_6_total    = sum(four_in_6),               # total across seasons
    games_played_total = sum(games_played),            # total games across seasons
    avg_per82          = mean(82 * four_in_6 / games_played), # avg per-season rate,
    .groups = 'drop'
  ) %>%
  arrange(desc(avg_per82))

head(league_by_team$avg_per82,1)  #

## [1] 28.10919

tail(league_by_team$avg_per82,1)

## [1] 22.18611

ANSWER 3:

Most 4-in-6 stretches on average: CHA (28.1)
Fewest 4-in-6 stretches on average: NYK (22.2)

Question 4

QUESTION: Is the difference between most and least from Q3 surprising, or do you expect that size difference is likely to be the result of chance?

ANSWER 4:

The difference was initially surprising to me. After digging deeper into how the NBA schedule is made—and its goals of competitive balance and cost reduction—plus factors like available nights at a team’s home arena, I noticed that teams can have higher counts of 4-in-6s yet still end up with more wins. That shifted my first impression: a lot of this is likely due to chance, because venue availability changes year to year (and some arenas host multiple teams). Big touring acts also book dates unpredictably, and if an arena hosts a show one year but not the next, that ripple affects the schedule. In other words, you can’t reliably expect the venue to be available on the same date the following year.

Question 5

QUESTION: What was BKN’s defensive eFG% in the 2023-24 season? What was their defensive eFG% that season in situations where their opponent was on the second night of back-to-back?

# Formula for defensive eFG% = (Opponent FGM + (.5 * Opponent 3PM)) / Opponent FGA

def_e_fg_percent <- game_data %>%
  filter(season == 2023, def_team == 'BKN') %>%
  summarize(
    total_opp_fgm = sum(fgmade),
    opp_3pm = sum(fg3made),
    total_opp_fga = sum(fgattempted)
  ) %>%
  mutate('Defensive eFG%'= ( 100 * (total_opp_fgm +(.5 * opp_3pm)) / total_opp_fga)) 

def_efg = def_e_fg_percent$`Defensive eFG%`

sprintf("BKN's Defensive eFG%% is %.1f%%", def_efg)

## [1] "BKN's Defensive eFG% is 54.3%"

back_to_back <- game_data %>%
  filter(season == 2023) %>%
  arrange(off_team,gamedate) %>%
  mutate(is_btb = (gamedate - lag(gamedate) == 1))

back_to_back_efg <- back_to_back %>%
  filter(def_team == 'BKN') %>%
  arrange(gamedate) %>%
  summarize(
     total_opp_fgm = sum(fgmade[is_btb]),
    opp_3pm = sum(fg3made[is_btb]),
    total_opp_fga = sum(fgattempted[is_btb])
  ) %>%
  mutate('Defensive eFG%'= ( 100 * (total_opp_fgm +(.5 * opp_3pm)) / total_opp_fga))

def_efg_btb <- back_to_back_efg$`Defensive eFG%`

sprintf("BKN's Defensive eFG%% with back to back is %.1f%%", def_efg_btb)

## [1] "BKN's Defensive eFG% with back to back is 53.5%"

ANSWER 5:

BKN Defensive eFG%: 54.35%
When opponent on a B2B: 53.49%

Part 2 – Trends and Visualizations

Question 6

QUESTION: Please identify at least 2 trends in scheduling over time. In other words, how are the more recent schedules different from the schedules of the past? Please include a visual (plot or styled table) highlighting or explaining each trend and include a brief written description of your findings.

ANSWER 6:

plot <- ggplot(data = team_season,
       mapping = aes(x = season, y = four_in_6, color = team, group = team)
)+
  geom_line() +
  geom_point(size = 2) +
  labs(
    title = "4 Games in 6 Nights Stretches by Season",
    x = 'Season(year)', y = 'Stretches', color = "Team"
  ) +
  theme_minimal()


ggplotly(plot, tooltip = c("season",'four_in_6','team'))

Trend #1:

Piggybacking off earlier questions, I visualized 4-in-6 stretches by team over time. Since 2014, the number of 4-in-6s per team has generally declined, with a clear outlier spike around the pandemic-compressed season. For example, CHA had 38 stretches in 2014 and is down to 28 by 2023. Most teams follow a similar pattern, suggesting less schedule compression and more recovery time between games.

back_to_back_2 <- game_data%>%
  arrange(off_team, gamedate) %>%
  mutate(is_btb = (gamedate - lag(gamedate) == 1)) %>%
  summarize(
    num_back_to_back = sum(is_btb),
    .by = c(off_team,season)
  )

num_back_to_back <- back_to_back_2 %>%
  filter(!is.na(num_back_to_back)) %>%
  ggplot(aes(x = season,
             y = num_back_to_back,
             color = off_team,      # color by team
             group = off_team)) +
  geom_line(linewidth = 0.9) +
  geom_point(size = 2) +
  labs(title = "Back-to-Backs per Season by Team",
       x = "Season (year)", y = "Back-to-Backs (games)", color = "Team") +
  theme_minimal()

ggplotly(num_back_to_back, tooltip = c("off_team", "season", "num_back_to_back"))

Trend #2:

The second trend I looked at was the number of back-to-back games each team had from 2014 to 2023. Like the first trend, this matters because it ties directly to recovery. More back-to-backs can lead to more fatigue, which can heavily affect performance. Over the years there’s a steady decline, with a few outliers—like GSW having only 7 back-to-backs in 2019—and a clear spike during the pandemic. Overall, the scheduling of back-to-backs has improved, leading to a notable drop.

Side note for both trends: I dug into the teams with the highest and lowest counts of BTBs or 4-in-6 stretches, and it’s hard to say by eye how much these alone drive win/loss records. Rosters, injuries, age, and a bunch of other factors matter too. These schedule stats feel important and probably do move the needle, but on their own they might not be a huge factor.

Question 7

QUESTION: Please design a plotting tool to help visualize a team’s schedule for a season. The plot should cover the whole season and should help the viewer contextualize and understand a team’s schedule, potentially highlighting periods of excessive travel, dense blocks of games, or other schedule anomalies. If you can, making the plots interactive (for example through the plotly package) is a bonus.

Please use this tool to plot OKC and DEN’s provided 80-game 2024-25 schedules.

ANSWER 7:

game_density <- draft_schedule %>%
  mutate(HA = factor(home, levels = c(0,1), labels = c("Away","Home")))

plot3 <- ggplot(game_density, aes(x = gamedate, y = HA, color = team)) +
  geom_point(position = position_jitter(height = 0.06, width = 0), size = 2) +
  scale_x_date(date_breaks = "1 month", date_labels = "%b") +
  labs(title = "OKC & DEN 2024–25 — Home/Away Timeline",
       x = "Game Date", y = "", color = "Team") +
  theme_minimal()

game_density_plot <- ggplotly(plot3, tooltip = c("gamedate","team","opponent","HA"))

# Using two joins to bring in both the Home and Away teams lat and lon.
travel <- draft_schedule %>%
  left_join(locations, by = 'team') %>%
  left_join(
    locations %>% rename(opponent = team, opp_lat = latitude, opp_lon = longitude),
    by = 'opponent'
  )
# Edit the data frame to understand what stadium the game is played at.
draft_schedule_with_loc <- travel %>%
  mutate(
    game_lat = ifelse(home == 1, latitude, opp_lat),
    game_lon = ifelse(home == 1, longitude, opp_lon)
  )

# Using the Geosphere library to help compute distance between stadiums using Lat and Lon
# Create a data frame to calculate distance traveled(miles) from the last game played.
distance <- draft_schedule_with_loc %>%
  arrange(team, gamedate) %>%
  group_by(team) %>%
  mutate(
    prev_lat = lag(game_lat),
    prev_lon = lag(game_lon),

    # meters -> miles * 0.000621371
    dist_miles = if_else(
      is.na(prev_lat) | is.na(prev_lon), # If the row doesn't exist input 0
      0,
      as.numeric(distHaversine(
        cbind(prev_lon, prev_lat),
        cbind(game_lon, game_lat)
      )) * 0.000621371 # distHaversine converts to meters so we use this number to convert to miles
    ),

    cum_miles = cumsum(dist_miles),
    days_since_last = as.integer(gamedate - lag(gamedate)),
    is_btb = !is.na(days_since_last) & days_since_last == 1 # If previous row was NULL and there
  ) %>%                                                     # was a game the night before it's a BTB.                           
  ungroup()
 
p_game <- ggplot(
  distance,
  aes(
    x = gamedate, y = dist_miles, fill = team,
    text = paste0(
      "Team: ", team,
      "<br>Date: ", gamedate,
      "<br>Miles since last game: ", round(dist_miles, 0)
    )
  )
) +
  geom_col(position = "identity", alpha = 0.7) +
  labs(
    title = "OKC & DEN Miles Between Consecutive Games",
    x = "Date", y = "Miles"
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.title = element_blank())

p_gamely <- ggplotly(p_game, tooltip = "text") 

p_gamely

game_density_plot

Question 8

QUESTION: Using your tool, what is the best and worst part of OKC’s 2024-25 draft schedule? Please give your answer as a short brief to members of the front office and coaching staff to set expectations going into the season. You can include context from past schedules.

ANSWER 8:

Worst parts:

Front-loaded difficulty. Before December, OKC plays 19 games, many against projected contenders (GSW, HOU, LAL, DEN). That’s a high-stress on-ramp.

Recovery risk. Early travel and opponent quality raise the load; conditioning and recovery protocols need to be front-loaded.

The toughest window hits March, right after the All-Star break: a road-heavy, dense stretch with elevated travel. Fatigue risk is highest here—minutes management is critical to protect a playoff runway.

Best parts:

Early homestand. The season opens with a favorable home block—prime runway to bank wins and build momentum.

December reset. A ~12-day gap provides rare recovery time and a chance for high-quality practice.

Travel advantage. Versus Denver, OKC’s total travel is lower by ~2,500 miles in our read, shifting outcomes more toward on-court performance than schedule losses.

Recommendations

Manage load early. Plan rotation discipline and recovery modalities through the first 19 games.

Target March mitigation. Enter the post-ASG road block with predefined minute caps, staggered rest, and travel-day recovery protocols.

Exploit the gap. Use the December window for tactical install and conditioning tune-ups to carry into the mid-season grind.

Part 3 – Modeling

Question 9

QUESTION: Please estimate how many more/fewer regular season wins each team has had due to schedule-related factors from 2019-20 though 2023-24. Your final answer should have one number for each team, representing the total number of wins (not per 82, and not a per-season average). You may consider the on-court strength of the scheduled opponents as well as the impact of travel/schedule density. Please include the teams and estimates for the most helped and most hurt in the answer key.

If you fit a model to help answer this question, please write a paragraph explaining your model, and include a simple model diagnostic (eg a printed summary of a regression, a variable importance plot, etc).

model_schedule <- schedule %>%
  filter(season %in% 2019:2023) %>%                     
  left_join(locations, by = "team") %>%
  left_join(
    locations %>% rename(opponent = team, opp_lat = latitude, opp_lon = longitude),
    by = "opponent"
  ) %>%
  mutate(
    game_lat = ifelse(home == 1, latitude, opp_lat),
    game_lon = ifelse(home == 1, longitude, opp_lon)
  ) %>%
  arrange(team, season, gamedate) %>%                   # ensure proper order
  group_by(team, season) %>%                            
  mutate(
    prev_lat = lag(game_lat),
    prev_lon = lag(game_lon),

    dist_miles = dplyr::if_else(
      is.na(prev_lat) | is.na(prev_lon),
      0,
      as.numeric(distHaversine(
        cbind(prev_lon, prev_lat),
        cbind(game_lon, game_lat)
      )) * 0.000621371
    ),

    cum_miles_season = cumsum(dist_miles),              # Cumulative miles reset each season
    days_since_last   = as.integer(gamedate - lag(gamedate)),
    is_btb            = !is.na(days_since_last) & days_since_last == 1
  ) %>%
  ungroup() %>%
  group_by(team) %>%
  mutate(cum_miles_all = cumsum(dist_miles)) %>%        # Running total across years
  ungroup()


# Creating a data frame for the linear regression model

df <- model_schedule %>%
  filter(season %in% 2019:2023) %>%
  arrange(team, season, gamedate) %>%
  mutate(
    days_used  = ifelse(is.na(days_since_last), 3L, days_since_last),
    is_btb     = as.integer(is_btb),    # numeric 0/1 keeps types consistent
    home       = as.integer(home),
    dist_miles = as.numeric(dist_miles),
    team_season = interaction(team, season, drop = TRUE)
  )

lm_fit <- lm(
  win ~ home + is_btb + days_used + scale(dist_miles) + team_season,
  data = df
)
print(summary(lm_fit))   # simple required diagnostic

## 
## Call:
## lm(formula = win ~ home + is_btb + days_used + scale(dist_miles) + 
##     team_season, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.85200 -0.46528  0.04136  0.43996  0.93569 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          0.2608320  0.0588101   4.435 9.29e-06 ***
## home                 0.0963422  0.0093683  10.284  < 2e-16 ***
## is_btb              -0.0584676  0.0119542  -4.891 1.02e-06 ***
## days_used           -0.0001636  0.0007232  -0.226 0.820992    
## scale(dist_miles)   -0.0119223  0.0047505  -2.510 0.012096 *  
## team_seasonBKN.2019  0.1868841  0.0813427   2.297 0.021609 *  
## team_seasonBOS.2019  0.3689583  0.0813457   4.536 5.80e-06 ***
## team_seasonCHA.2019  0.0568930  0.0834149   0.682 0.495221    
## team_seasonCHI.2019  0.0339436  0.0834192   0.407 0.684086    
## team_seasonCLE.2019 -0.0139828  0.0834245  -0.168 0.866893    
## team_seasonDAL.2019  0.2758330  0.0805529   3.424 0.000619 ***
## team_seasonDEN.2019  0.3312605  0.0810732   4.086 4.42e-05 ***
## team_seasonDET.2019  0.0035581  0.0830931   0.043 0.965845    
## team_seasonGSW.2019 -0.0712539  0.0834211  -0.854 0.393041    
## team_seasonHOU.2019  0.3138743  0.0813476   3.858 0.000115 ***
## team_seasonIND.2019  0.3151069  0.0810772   3.887 0.000102 ***
## team_seasonLAC.2019  0.3822965  0.0813427   4.700 2.63e-06 ***
## team_seasonLAL.2019  0.4339324  0.0816208   5.316 1.08e-07 ***
## team_seasonMEM.2019  0.1673267  0.0810738   2.064 0.039051 *  
## team_seasonMIA.2019  0.3081892  0.0810805   3.801 0.000145 ***
## team_seasonMIL.2019  0.4702727  0.0810729   5.801 6.78e-09 ***
## team_seasonMIN.2019 -0.0024551  0.0837447  -0.029 0.976613    
## team_seasonNOP.2019  0.1207823  0.0813554   1.485 0.137671    
## team_seasonNYK.2019  0.0180765  0.0830932   0.218 0.827788    
## team_seasonOKC.2019  0.3116272  0.0813428   3.831 0.000128 ***
## team_seasonORL.2019  0.1556655  0.0810731   1.920 0.054875 .  
## team_seasonPHI.2019  0.2922798  0.0810732   3.605 0.000313 ***
## team_seasonPHX.2019  0.1667414  0.0810795   2.057 0.039755 *  
## team_seasonPOR.2019  0.1800833  0.0808247   2.228 0.025895 *  
## team_seasonSAC.2019  0.1368506  0.0813549   1.682 0.092568 .  
## team_seasonSAS.2019  0.1541185  0.0816219   1.888 0.059024 .  
## team_seasonTOR.2019  0.4370094  0.0813427   5.372 7.92e-08 ***
## team_seasonUTA.2019  0.3142786  0.0813475   3.863 0.000112 ***
## team_seasonWAS.2019  0.0458185  0.0813463   0.563 0.573274    
## team_seasonATL.2020  0.2709440  0.0813351   3.331 0.000867 ***
## team_seasonBKN.2020  0.3698635  0.0813301   4.548 5.48e-06 ***
## team_seasonBOS.2020  0.2042916  0.0813299   2.512 0.012022 *  
## team_seasonCHA.2020  0.1603490  0.0813370   1.971 0.048700 *  
## team_seasonCHI.2020  0.1345670  0.0813335   1.655 0.098051 .  
## team_seasonCLE.2020  0.0089247  0.0813355   0.110 0.912628    
## team_seasonDAL.2020  0.2875036  0.0813307   3.535 0.000409 ***
## team_seasonDEN.2020  0.3552680  0.0813291   4.368 1.26e-05 ***
## team_seasonDET.2020 -0.0200344  0.0813362  -0.246 0.805442    
## team_seasonGSW.2020  0.2461960  0.0813302   3.027 0.002474 ** 
## team_seasonHOU.2020 -0.0608240  0.0813292  -0.748 0.454552    
## team_seasonIND.2020  0.1764179  0.0813330   2.169 0.030097 *  
## team_seasonLAC.2020  0.3552361  0.0813290   4.368 1.27e-05 ***
## team_seasonLAL.2020  0.2855959  0.0813290   3.512 0.000447 ***
## team_seasonMEM.2020  0.2310028  0.0813324   2.840 0.004516 ** 
## team_seasonMIA.2020  0.2577736  0.0813290   3.170 0.001531 ** 
## team_seasonMIL.2020  0.3399479  0.0813332   4.180 2.94e-05 ***
## team_seasonMIN.2020  0.0247564  0.0813311   0.304 0.760836    
## team_seasonNOP.2020  0.1342136  0.0813295   1.650 0.098920 .  
## team_seasonNYK.2020  0.2717583  0.0813298   3.341 0.000836 ***
## team_seasonOKC.2020  0.0090041  0.0813352   0.111 0.911853    
## team_seasonORL.2020 -0.0041763  0.0813303  -0.051 0.959048    
## team_seasonPHI.2020  0.3825313  0.0813304   4.703 2.59e-06 ***
## team_seasonPHX.2020  0.4134897  0.0813318   5.084 3.75e-07 ***
## team_seasonPOR.2020  0.2899286  0.0813333   3.565 0.000366 ***
## team_seasonSAC.2020  0.1371679  0.0813346   1.686 0.091734 .  
## team_seasonSAS.2020  0.1624475  0.0813308   1.997 0.045809 *  
## team_seasonTOR.2020  0.0787689  0.0813312   0.968 0.332817    
## team_seasonUTA.2020  0.4257278  0.0813294   5.235 1.68e-07 ***
## team_seasonWAS.2020  0.1757878  0.0813348   2.161 0.030694 *  
## team_seasonATL.2021  0.2265377  0.0789037   2.871 0.004098 ** 
## team_seasonBKN.2021  0.2380644  0.0789027   3.017 0.002557 ** 
## team_seasonBOS.2021  0.3237792  0.0789029   4.104 4.10e-05 ***
## team_seasonCHA.2021  0.2246264  0.0789075   2.847 0.004425 ** 
## team_seasonCHI.2021  0.2635771  0.0789046   3.340 0.000839 ***
## team_seasonCLE.2021  0.2365338  0.0789089   2.998 0.002727 ** 
## team_seasonDAL.2021  0.3329250  0.0789064   4.219 2.47e-05 ***
## team_seasonDEN.2021  0.2862528  0.0789029   3.628 0.000287 ***
## team_seasonDET.2021 -0.0202406  0.0789064  -0.257 0.797559    
## team_seasonGSW.2021  0.3504682  0.0789052   4.442 9.01e-06 ***
## team_seasonHOU.2021 -0.0560221  0.0789040  -0.710 0.477717    
## team_seasonIND.2021  0.0041296  0.0789092   0.052 0.958264    
## team_seasonLAC.2021  0.2147027  0.0789030   2.721 0.006516 ** 
## team_seasonLAL.2021  0.1032203  0.0789035   1.308 0.190837    
## team_seasonMEM.2021  0.3850192  0.0789026   4.880 1.08e-06 ***
## team_seasonMIA.2021  0.3488038  0.0789030   4.421 9.93e-06 ***
## team_seasonMIL.2021  0.3218350  0.0789063   4.079 4.56e-05 ***
## team_seasonMIN.2021  0.2628065  0.0789033   3.331 0.000869 ***
## team_seasonNOP.2021  0.1414536  0.0789025   1.793 0.073037 .  
## team_seasonNYK.2021  0.1514576  0.0789035   1.920 0.054942 .  
## team_seasonOKC.2021 -0.0059930  0.0789028  -0.076 0.939457    
## team_seasonORL.2021 -0.0284468  0.0789027  -0.361 0.718458    
## team_seasonPHI.2021  0.3209662  0.0789050   4.068 4.78e-05 ***
## team_seasonPHX.2021  0.4837495  0.0789046   6.131 9.03e-10 ***
## team_seasonPOR.2021  0.0333728  0.0789051   0.423 0.672341    
## team_seasonSAC.2021  0.0677925  0.0789025   0.859 0.390252    
## team_seasonSAS.2021  0.1158913  0.0789040   1.469 0.141924    
## team_seasonTOR.2021  0.2870067  0.0789078   3.637 0.000277 ***
## team_seasonUTA.2021  0.2992332  0.0789031   3.792 0.000150 ***
## team_seasonWAS.2021  0.1266792  0.0789064   1.605 0.108425    
## team_seasonATL.2022  0.1991913  0.0789067   2.524 0.011603 *  
## team_seasonBKN.2022  0.2484742  0.0789071   3.149 0.001643 ** 
## team_seasonBOS.2022  0.3954047  0.0789036   5.011 5.49e-07 ***
## team_seasonCHA.2022  0.0270820  0.0789107   0.343 0.731456    
## team_seasonCHI.2022  0.1888229  0.0789046   2.393 0.016725 *  
## team_seasonCLE.2022  0.3192914  0.0789134   4.046 5.24e-05 ***
## team_seasonDAL.2022  0.1633129  0.0789044   2.070 0.038498 *  
## team_seasonDEN.2022  0.3478612  0.0789045   4.409 1.05e-05 ***
## team_seasonDET.2022 -0.0937135  0.0789107  -1.188 0.235020    
## team_seasonGSW.2022  0.2396548  0.0789028   3.037 0.002392 ** 
## team_seasonHOU.2022 -0.0311975  0.0789033  -0.395 0.692563    
## team_seasonIND.2022  0.1258213  0.0789106   1.594 0.110856    
## team_seasonLAC.2022  0.2390138  0.0789025   3.029 0.002457 ** 
## team_seasonLAL.2022  0.2240332  0.0789042   2.839 0.004529 ** 
## team_seasonMEM.2022  0.3221067  0.0789035   4.082 4.49e-05 ***
## team_seasonMIA.2022  0.2378083  0.0789029   3.014 0.002584 ** 
## team_seasonMIL.2022  0.4067298  0.0789059   5.155 2.58e-07 ***
## team_seasonMIN.2022  0.2143370  0.0789026   2.716 0.006608 ** 
## team_seasonNOP.2022  0.2113580  0.0789052   2.679 0.007403 ** 
## team_seasonNYK.2022  0.2720159  0.0789062   3.447 0.000568 ***
## team_seasonOKC.2022  0.1885029  0.0789031   2.389 0.016908 *  
## team_seasonORL.2022  0.1147922  0.0789039   1.455 0.145742    
## team_seasonPHI.2022  0.3573479  0.0789084   4.529 6.00e-06 ***
## team_seasonPHX.2022  0.2493345  0.0789034   3.160 0.001582 ** 
## team_seasonPOR.2022  0.1049703  0.0789030   1.330 0.183422    
## team_seasonSAC.2022  0.2884147  0.0789028   3.655 0.000258 ***
## team_seasonSAS.2022 -0.0317967  0.0789062  -0.403 0.686979    
## team_seasonTOR.2022  0.1974375  0.0789128   2.502 0.012364 *  
## team_seasonUTA.2022  0.1538712  0.0789025   1.950 0.051183 .  
## team_seasonWAS.2022  0.1275108  0.0789088   1.616 0.106138    
## team_seasonATL.2023  0.1400422  0.0789046   1.775 0.075953 .  
## team_seasonBKN.2023  0.0912930  0.0789031   1.157 0.247286    
## team_seasonBOS.2023  0.4820096  0.0789026   6.109 1.04e-09 ***
## team_seasonCHA.2023 -0.0429934  0.0789050  -0.545 0.585850    
## team_seasonCHI.2023  0.1760385  0.0789045   2.231 0.025698 *  
## team_seasonCLE.2023  0.2845912  0.0789131   3.606 0.000312 ***
## team_seasonDAL.2023  0.3120589  0.0789028   3.955 7.70e-05 ***
## team_seasonDEN.2023  0.3968294  0.0789032   5.029 5.00e-07 ***
## team_seasonDET.2023 -0.1276279  0.0789071  -1.617 0.105810    
## team_seasonGSW.2023  0.2661258  0.0789050   3.373 0.000747 ***
## team_seasonHOU.2023  0.2008229  0.0789035   2.545 0.010935 *  
## team_seasonIND.2023  0.2713896  0.0789115   3.439 0.000586 ***
## team_seasonLAC.2023  0.3262837  0.0789059   4.135 3.57e-05 ***
## team_seasonLAL.2023  0.2739210  0.0789027   3.472 0.000519 ***
## team_seasonMEM.2023  0.0307478  0.0789026   0.390 0.696770    
## team_seasonMIA.2023  0.2625119  0.0789030   3.327 0.000881 ***
## team_seasonMIL.2023  0.2958971  0.0789074   3.750 0.000178 ***
## team_seasonMIN.2023  0.3845760  0.0789031   4.874 1.11e-06 ***
## team_seasonNOP.2023  0.3003519  0.0789032   3.807 0.000142 ***
## team_seasonNYK.2023  0.3095888  0.0789046   3.924 8.77e-05 ***
## team_seasonOKC.2023  0.3956718  0.0789041   5.015 5.39e-07 ***
## team_seasonORL.2023  0.2746800  0.0789034   3.481 0.000501 ***
## team_seasonPHI.2023  0.2734000  0.0789051   3.465 0.000532 ***
## team_seasonPHX.2023  0.2988874  0.0789028   3.788 0.000153 ***
## team_seasonPOR.2023 -0.0416472  0.0789027  -0.528 0.597628    
## team_seasonSAC.2023  0.2640548  0.0789028   3.347 0.000821 ***
## team_seasonSAS.2023 -0.0307347  0.0789029  -0.390 0.696894    
## team_seasonTOR.2023  0.0040032  0.0789098   0.051 0.959540    
## team_seasonUTA.2023  0.0783214  0.0789036   0.993 0.320916    
## team_seasonWAS.2023 -0.1179967  0.0789072  -1.495 0.134841    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4791 on 11504 degrees of freedom
## Multiple R-squared:  0.09392,    Adjusted R-squared:  0.08187 
## F-statistic: 7.794 on 153 and 11504 DF,  p-value: < 2.2e-16

# Creating a neutral schedule where variables are
# taking out of the picture and everything stays constant.
neutral_df <- df %>%
  mutate(
    is_btb   = 0,
    days_used = 2,       # choose 2 as a standard "typical" rest day
    dist_miles = 0
  )

# 4) Predicted actual vs neutral win probabilities
p_actual  <- pmin(pmax(predict(lm_fit, newdata = df),         0), 1)
p_neutral <- pmin(pmax(predict(lm_fit, newdata = neutral_df), 0), 1)

# 5) Schedule wins per game and per team
sched_delta <- p_actual - p_neutral

team_totals <- df %>%
  select(team) %>%
  mutate(sched_delta = sched_delta) %>%
  group_by(team) %>%
  summarise(total_sched_wins = sum(sched_delta, na.rm = TRUE), .groups = "drop") %>%
  arrange(desc(total_sched_wins))

most_helped <- team_totals %>% slice_max(total_sched_wins, n = 1)
most_hurt   <- team_totals %>% slice_min(total_sched_wins, n = 1)

cat(sprintf("Most helped: %s (%.2f wins)\n",
            most_helped$team, most_helped$total_sched_wins))

## Most helped: CLE (-7.57 wins)

cat(sprintf("Most hurt:   %s (%.2f wins)\n",
            most_hurt$team,   most_hurt$total_sched_wins))

## Most hurt:   POR (-9.50 wins)

The idea behind this linear regression model was to determine how each team was affected by independent variables such as whether the game was home or away, whether it was a back-to-back, the days since the last game, and the distance traveled for that specific game. To create this model I built a dataset similar to the one used to visualize OKC and DEN’s 2024–25 draft schedule, adding longitudes, latitudes, distance traveled, and whether or not it was a back-to-back game. In the model, I computed a predicted value on the actual data and a predicted value on neutral data—neutral meaning the team didn’t have to travel, didn’t play a back-to-back, and had a typical rest day between games (home/away held constant). After producing the model, it shows that every team is predicted to be affected negatively by the scheduling, some less than others. Cleveland ended up being the most helped (or least negatively affected) by scheduling with −7.57 wins, and Portland was the most hurt (or most affected) by scheduling with a predicted −9.5 wins.

After seeing these results, and looking at the data from a bird’s-eye view, it seems as if teams that are further on the West or East Coast are affected the most due to having to travel much more than teams that are more centrally located. Of course, there are many other factors that go into this, such as the players on the roster, how young or old those players are, how healthy those players are, win streaks, etc.

This underscores how demanding and draining being a professional athlete is. Recognizing the grind that these players go through in order to perform their best is much more than what people think while watching them on a TV screen. Throughout this project I have built even more respect for these professional athletes because not only are they performing at the highest level—they are doing it with the odds against them most of the time.

ANSWER 9:

Most Helped by Schedule: TEAM1 (-7.57 wins)
Most Hurt by Schedule: TEAM2 (-9.50 wins)

Data Science Project

Nicholas Calip

09/05/25

Introduction