The purpose of this project is to gauge your technical skills and problem solving ability by working through something similar to a real NBA data science project. You will work your way through this R Markdown document, answering questions as you go along. Please begin by adding your name to the “author” key in the YAML header. When you’re finished with the document, come back and type your answers into the answer key at the top. Please leave all your work below and have your answers where indicated below as well. Please note that we will be reviewing your code so make it clear, concise and avoid long printouts. Feel free to add in as many new code chunks as you’d like.
Remember that we will be grading the quality of your code and visuals alongside the correctness of your answers. Please try to use the tidyverse as much as possible (instead of base R and explicit loops). Please do not bring in any outside data.
Note:
Throughout this document, any season column
represents the year each season started. For example, the 2015-16 season
will be in the dataset as 2015. For most of the rest of the project, we
will refer to a season by just this number (e.g. 2015) instead of the
full text (e.g. 2015-16).
library(tidyverse)
# Importing the player & team data
library(readr)
team_game_data <- read_csv("team_game_data.csv")
player_game_data <- read_csv("player_game_data.csv")
# Converting from tibble to dataframe to make
team_game_data <- as.data.frame(team_game_data)
player_game_data <- as.data.frame(player_game_data)
Question 1:
Question 2: 40.9%
Question 3: 23.1%
Question 4: This is a written question. Please leave your response in the document under Question 5.
Question 5: 83.5% of games
Question 6:
Question 7:
Please show your work in the document, you don’t need anything here.
Please write your response in the document, you don’t need anything here.
In this section, you’re going to work to answer questions using data from both team and player stats. All provided stats are on the game level.
QUESTION: What was the Warriors’ Team offensive and defensive eFG% in the 2015-16 regular season? Remember that this is in the data as the 2015 season.
# According to (Source - https://www.breakthroughbasketball.com/stats/effective-field-goal-percentage.html), eFG% = (2pt FGM + 1.5*3pt FGM) / FGA.
# Calculating offensive eFG% for the GSW Warriors in the regular season (gametype == 2) for the 2015 season
GSW_off_efg <- team_game_data %>%
filter(gametype == 2 & season == 2015 & off_team == "GSW") %>%
summarise(off_efg = round((sum(fg2made) + 1.5*sum(fg3made))/sum(fgattempted)*100, 1))
# Calculating defensive eFG% for the GSW Warriors in the regular season for the 2015 season
GSW_def_efg <- team_game_data %>%
filter(gametype == 2 & season == 2015 & def_team == "GSW") %>%
summarise(def_efg = round((sum(fg2made) + 1.5*sum(fg3made))/sum(fgattempted)*100, 1))
GSW_off_efg # GSW Warriors' offensive eFG% for the 2015 season
## off_efg
## 1 56.3
GSW_def_efg # GSW Warriors' defensive eFG% for the 2015 season
## def_efg
## 1 47.9
ANSWER 1:
Offensive: 56.3% eFG
Defensive: 47.9% eFG
QUESTION: What percent of the time does the team with the higher eFG% in a given game win that game? Use games from the 2014-2023 regular seasons. If the two teams have an exactly equal eFG%, remove that game from the calculation.
# Making a df that contains a offensive eFG% column for the 2014-2023 regular seasons and putting them in order by nbagameid so that teams that played in the same game are next to each other - ordering them this way will be important for the next code chunk where defensive eFG% will be calculated using swap_pair_of_stats function
team_game_data_2 <- team_game_data %>%
filter(gametype == 2) %>%
mutate(off_efg = round(((fg2made + 1.5*fg3made)/fgattempted*100), 1)) %>%
arrange(nbagameid)
# creating def_efg column by taking the off_efg of the team they faced (team with the same nbagameid) using swap_pair_of_stats function
swap_pair_of_stats <- function(x) {
for (i in seq(1, length(x) - 1, by = 2)) {
temp <- x[i]
x[i] <- x[i + 1]
x[i + 1] <- temp
}
return(x)
}
team_game_data_2$def_efg <- swap_pair_of_stats(team_game_data_2$off_efg)
# Calculating the winning percentage of teams that had the higher eFG% in the game they played while excluding games where teams had the same eFG%
team_game_data_2 %>%
filter(off_efg != def_efg) %>%
summarise(efg_winner = round(100*sum((off_win == 1 & off_efg > def_efg))/n(),1))
## efg_winner
## 1 40.9
ANSWER 2:
40.9X%
QUESTION: What percent of the time does the team with more offensive rebounds in a given game win that game? Use games from the 2014-2023 regular seasons. If the two teams have an exactly equal number of offensive rebounds, remove that game from the calculation.
# creating a new df that includes the number of offensive rebounds a team conceded while they were on defense by taking the reboffensive from the team with the same nbagameid using swap_pair_of_stats function for the 2014-2023 regular seasons (similar process to question #2 above)
team_game_data_3 <- team_game_data %>%
filter(gametype == 2) %>%
arrange(nbagameid) %>%
mutate(offreb_conceded = swap_pair_of_stats(reboffensive), .after = reboffensive)
# Calculating the win percentage of teams that had more offensive rebounds than the team they faced & excluding games where teams had the same amount of offensive rebounds
team_game_data_3 %>%
filter(reboffensive != offreb_conceded) %>%
summarise(offreb_winner = round(100*sum((off_win == 1 & reboffensive > offreb_conceded))/n(),1))
## offreb_winner
## 1 23.1
ANSWER 3:
23.1%
QUESTION: Do you have any theories as to why the answer to question 3 is lower than the answer to question 2? Try to be clear and concise with your answer.
ANSWER 4:
To win a basketball game, you need to score more points than your opponent, and effective field goal percentage (eFG%) is more directly correlated to scoring than offensive rebounds. Having a higher eFG% means you will score more than your opponents if you’re taking around the same number of shots and making around the same number of free throws. In contrast, grabbing more offensive rebounds than your opponent only provides additional possessions but does not guarantee more points since you can still miss on the following attempt. Also, the pace of the game has significantly increased and teams are already getting a good quantity of possessions/shot attempts so getting a couple of extra possessions from offensive rebounds isn’t as important; having higher quality possessions and being more efficient, especially on 3-pointers, is more advantageous and eFG% accounts for added value of 3-pointers by giving them more weight than 2-pointers. Making 3-pointers have become an instrumental factor in winning games over the past decade which is better reflected in eFG% than offensive rebounds.
QUESTION: Look at players who played at least 25% of their possible games in a season and scored at least 25 points per game played. Of those player-seasons, what percent of games were they available for on average? Use games from the 2014-2023 regular seasons.
For example:
# Calculating the total number of games played by each team in each season
reg_games <- player_game_data %>%
filter(gametype == 2) %>%
group_by(team, season) %>%
summarise(total_games = sum(starter) / 5)
## `summarise()` has grouped output by 'team'. You can override using the
## `.groups` argument.
head(reg_games)
## # A tibble: 6 × 3
## # Groups: team [1]
## team season total_games
## <chr> <dbl> <dbl>
## 1 ATL 2014 82
## 2 ATL 2015 82
## 3 ATL 2016 82
## 4 ATL 2017 82
## 5 ATL 2018 82
## 6 ATL 2019 67
# Calculating the percentage of games each player was available for while calculating the number of games they played, their ppg that season and the number of games they were unavailable
player_availability <- player_game_data %>%
filter(gametype == 2) %>%
group_by(player_name, team, season) %>%
summarise(
games_played = sum(seconds > 0),
points_per_game = ifelse(games_played > 0, sum(points)/games_played, 0),
total_missed = sum(missed)
) %>%
left_join(reg_games, by = c("team", "season")) %>%
mutate(availability = (total_games - total_missed) / total_games)
## `summarise()` has grouped output by 'player_name', 'team'. You can override
## using the `.groups` argument.
head(player_availability)
## # A tibble: 6 × 8
## # Groups: player_name, team [4]
## player_name team season games_played points_per_game total_missed total_games
## <chr> <chr> <dbl> <int> <dbl> <dbl> <dbl>
## 1 A.J. Green MIL 2022 35 4.4 11 82
## 2 A.J. Green MIL 2023 56 4.5 4 82
## 3 A.J. Hammo… DAL 2016 22 2.18 1 82
## 4 A.J. Hammo… MIA 2017 0 0 55 82
## 5 A.J. Lawson DAL 2022 14 3.86 5 82
## 6 A.J. Lawson DAL 2023 42 3.24 0 82
## # ℹ 1 more variable: availability <dbl>
# Filtering players who played at least 25% of their team's games and scored at least 25 points per game
player_game_data_5 <- player_availability %>%
filter(
points_per_game >= 25,
games_played >= 0.25 * total_games
)
# View the resulting data
head(player_game_data_5)
## # A tibble: 6 × 8
## # Groups: player_name, team [3]
## player_name team season games_played points_per_game total_missed total_games
## <chr> <chr> <dbl> <int> <dbl> <dbl> <dbl>
## 1 Anthony Da… LAL 2019 62 26.1 9 71
## 2 Anthony Da… LAL 2022 56 25.9 25 82
## 3 Anthony Da… NOP 2016 75 28.0 7 82
## 4 Anthony Da… NOP 2017 75 28.1 7 82
## 5 Anthony Da… NOP 2018 56 25.9 25 82
## 6 Anthony Ed… MIN 2023 79 25.9 3 82
## # ℹ 1 more variable: availability <dbl>
# Getting the percentage of games that players who played 25% of their teams games while averaging at least 25 points per game were available for
round(mean(player_game_data_5$availability)*100, 1)
## [1] 83.5
ANSWER 5:
83.5% of games
QUESTION: What % of playoff series are won by the team with home court advantage? Give your answer by round. Use playoffs series from the 2014-2022 seasons. Remember that the 2023 playoffs took place during the 2022 season (i.e. 2022-23 season).
# Creating a df that has playoff data for each team for the 2014-2022 seasons
playoffs_6 <- team_game_data %>%
filter(gametype == 4, season %in% 2014:2022) %>%
arrange(nbagameid, gamedate)
# Creating a function to extract the digits from a number - will be used on nbagameid to create columns for the round, and number game in the series for each playoff game
extract_digit <- function(number, position) {
# Convert the number to a character string
number_str <- as.character(abs(number))
# Reverse the string to simplify extraction from right to left
number_str <- rev(strsplit(number_str, "")[[1]])
# Check if the position is valid
if(position > length(number_str) || position < 1) {
return(NA) # Return NA if the position is out of bounds
}
# Extract the digit at the specified position
digit <- number_str[position]
# Convert the digit back to a numeric value
digit <- as.numeric(digit)
return(digit)
}
# Creating a round column based on nbagameid to know which round in the playoffs each game is played
playoffs_6$round <- sapply(playoffs_6$nbagameid, extract_digit, position = 3)
# Creating a series column based on nbagameid to know which series each game is played
playoffs_6 <- playoffs_6 %>%
mutate(series = if_else(nbagameid < 0,
-as.numeric(substr(as.character(abs(nbagameid)), 1, nchar(as.character(abs(nbagameid))) - 1)),
as.numeric(substr(as.character(nbagameid), 1, nchar(as.character(nbagameid)) - 1))))
# Creating a game column based on nbagameid to know which game in the series each game is played
playoffs_6$game <- sapply(playoffs_6$nbagameid, extract_digit, position = 1)
library(dplyr)
# Identifying the home-court team for each series based on who the home team is for game 1 in the series since the team with homecourt advantage in the series always plays at home
home_court_teams <- playoffs_6 %>%
filter(game == 1 & off_home == 1) %>%
select(series, off_team) %>%
rename(home_court_team = off_team)
# Determining the series winners
series_max_game <- playoffs_6 %>%
group_by(series) %>%
summarise(max_game = max(game)) %>%
ungroup()
series_winners <- playoffs_6 %>%
inner_join(series_max_game, by = "series") %>%
filter(game == max_game & off_win == 1) %>%
select(series, off_team) %>%
rename(series_winner = off_team)
# Merging to identify wins by home-court teams
home_court_wins <- home_court_teams %>%
inner_join(series_winners, by = "series") %>%
mutate(home_court_win = home_court_team == series_winner)
# Calculating the percentage of wins by home-court teams by round
series_rounds <- playoffs_6 %>%
distinct(series, round)
home_court_win_percentage <- home_court_wins %>%
inner_join(series_rounds, by = "series") %>%
group_by(round) %>%
summarise(home_court_win_percentage = round(mean(home_court_win) * 100, 1))
# Percentage of teams with homecourt advantage that won the series from the 2014-2022 playoffs by round
home_court_win_percentage
## # A tibble: 4 × 2
## round home_court_win_percentage
## <dbl> <dbl>
## 1 1 84.7
## 2 2 63.9
## 3 3 55.6
## 4 4 77.8
ANSWER 6:
Round 1: 84.7%
Round 2: 63.9%
Conference Finals: 55.6%
Finals: 77.8%
QUESTION: Among teams that had at least a +5.0 net rating in the regular season, what percent of them made the second round of the playoffs the following year? Among those teams, what percent of their top 5 total minutes played players (regular season) in the +5.0 net rating season played in that 2nd round playoffs series? Use the 2014-2021 regular seasons to determine the +5 teams and the 2015-2022 seasons of playoffs data.
For example, the Thunder had a better than +5 net rating in the 2023 season. If we make the 2nd round of the playoffs next season (2024-25), we would qualify for this question. Our top 5 minutes played players this season were Shai Gilgeous-Alexander, Chet Holmgren, Luguentz Dort, Jalen Williams, and Josh Giddey. If three of them play in a hypothetical 2nd round series next season, it would count as 3/5 for this question.
Hint: The definition for net rating is in the data dictionary.
# Creating a new df that has team stats and adding 2 additional columns: 1) includes the number of points a team allowed while they were on defense (points scored by the other team with the same nbagameid) and 2) includes the number of defensive possessions (# of offensive possessions the other team with the same nbagameid had) a team had for each regular season from 2014-2021
team_game_data_7 <- team_game_data %>%
filter(gametype == 2 & season %in% 2014:2021) %>%
arrange(nbagameid) %>%
mutate(points_allowed = swap_pair_of_stats(points)) %>%
mutate(def_possessions = swap_pair_of_stats(possessions))
# Adding columns to include offensive rating, defensive rating and net rating to the df made in the chunk above
team_game_data_7 <- team_game_data_7 %>%
mutate(off_rating = points/(possessions/100)) %>%
mutate(def_rating = points_allowed/(def_possessions/100)) %>%
mutate(net_rating = off_rating - def_rating)
# Getting the net ratings for all 30 teams for each regular season from 2014-2021
team_game_data_7_2 <- team_game_data_7 %>%
group_by(off_team, season) %>%
summarise(mean_net_rating = mean(net_rating, na.rm = TRUE)) %>%
ungroup()
## `summarise()` has grouped output by 'off_team'. You can override using the
## `.groups` argument.
head(team_game_data_7_2)
## # A tibble: 6 × 3
## off_team season mean_net_rating
## <chr> <dbl> <dbl>
## 1 ATL 2014 6.21
## 2 ATL 2015 3.62
## 3 ATL 2016 -1.21
## 4 ATL 2017 -5.79
## 5 ATL 2018 -5.93
## 6 ATL 2019 -7.41
# Filtering to get the teams with a +5 net rating each regular season from 2014-2021
plus5_net_rating <- team_game_data_7_2 %>%
filter(mean_net_rating > 5)
plus5_net_rating <- as.data.frame(plus5_net_rating)
# Getting the teams that made it to the 2nd round of the playoffs each season from 2015-2022 using the team playoff df from question 6
unique_teams_second_round <- playoffs_6 %>%
filter(season != 2014 & round == 2) %>%
distinct(off_team, season)
head(unique_teams_second_round)
## off_team season
## 1 ATL 2015
## 2 CLE 2015
## 3 MIA 2015
## 4 TOR 2015
## 5 GSW 2015
## 6 POR 2015
# Adding next_season in net_ratings to be one year ahead to align with the following year's playoffs
net_ratings_next_year <- plus5_net_rating %>%
mutate(next_season = season + 1)
head(net_ratings_next_year)
## off_team season mean_net_rating next_season
## 1 ATL 2014 6.207150 2015
## 2 BOS 2019 6.370748 2020
## 3 BOS 2021 7.783639 2022
## 4 CLE 2015 6.324848 2016
## 5 GSW 2014 10.629787 2015
## 6 GSW 2015 10.766214 2016
# Performing the join operation to check if the same team made the playoffs the next year
teams_made_second_round_next_year <- unique_teams_second_round %>%
inner_join(net_ratings_next_year, by = c("off_team", "season" = "next_season"))
# Getting the teams with +5 net rating during the 2014-2021 regular seasons that made the 2nd round of the playoffs the following year while naming the reg_season they had the +5 net rating and playoff_season as the season they made it to the 2nd round
teams_made_second_round_next_year <- teams_made_second_round_next_year %>%
rename(reg_season = season.y) %>%
rename(playoff_season = season)
teams_made_second_round_next_year
## off_team playoff_season reg_season mean_net_rating
## 1 ATL 2015 2014 6.207150
## 2 GSW 2015 2014 10.629787
## 3 SAS 2015 2014 6.464015
## 4 CLE 2016 2015 6.324848
## 5 GSW 2016 2015 10.766214
## 6 SAS 2016 2015 11.094909
## 7 HOU 2017 2016 5.687542
## 8 GSW 2017 2016 11.786450
## 9 TOR 2018 2017 7.990256
## 10 GSW 2018 2017 6.272020
## 11 HOU 2018 2017 8.570834
## 12 MIL 2019 2018 8.599643
## 13 TOR 2019 2018 5.726189
## 14 MIL 2020 2019 9.631936
## 15 LAC 2020 2019 6.485266
## 16 PHI 2021 2020 5.675074
## 17 MIL 2021 2020 5.564283
## 18 PHX 2021 2020 5.978664
## 19 BOS 2022 2021 7.783639
## 20 PHX 2022 2021 7.666856
## 21 GSW 2022 2021 5.676754
# Calculating the percentage of teams with a +5 net rating during the 2014-2021 regular seasons that made it to the 2nd round of the playoffs the next year
round(nrow(teams_made_second_round_next_year)/nrow(plus5_net_rating)*100, 1)
## [1] 63.6
# Creating a df that has the regular season stats of the players on the teams with +5 net rating from the 2014-2021 regular seasons that made it to the 2nd round the next year
player_game_data_7 <- player_game_data %>%
filter(gametype == 2) %>%
semi_join(teams_made_second_round_next_year, by = c("team" = "off_team", "season" = "reg_season"))
# Getting the players who were top 5 in minutes for their team (only teams with +5 net rating from 2014-2021 regular seasons)
top_5_minutes <- player_game_data_7 %>%
group_by(team, season, player_name) %>%
summarise(minutes = sum(seconds/60)) %>%
slice_max(minutes, n = 5) %>%
rename(reg_season = season) %>%
mutate(playoff_season = reg_season + 1)
## `summarise()` has grouped output by 'team', 'season'. You can override using
## the `.groups` argument.
# Creating a df that has the playoff stats of the players on the teams with +5 net rating from the 2014-2021 regular seasons that made it to the 2nd round the next year
player_playoff_data_7 <- player_game_data %>%
mutate(round = sapply(player_game_data$nbagameid, extract_digit, position = 3)) %>%
filter(gametype == 4, round == 2) %>%
semi_join(teams_made_second_round_next_year, by = c("team" = "off_team", "season" = "playoff_season"))
top_5_minutes_next_playoffs <- inner_join(top_5_minutes, player_playoff_data_7, by = c("player_name", "team", "playoff_season" = "season"))
# Making a df of the players who played in the 2nd round
top_5_minutes_2nd_round <- top_5_minutes_next_playoffs %>%
group_by(player_name, team, playoff_season) %>%
filter(sum(seconds) > 0)
top_5_minutes_2nd_round <- as.data.frame(top_5_minutes_2nd_round)
# Getting the distinct players for each team/season
top_5_minutes_2nd_round <- top_5_minutes_2nd_round %>%
distinct(player_name, team, playoff_season)
# The percent of top 5 minutes played players who played in those 2nd round series
round(nrow(top_5_minutes_2nd_round)/nrow(top_5_minutes)*100, 1)
## [1] 78.1
ANSWER 7:
Percent of +5.0 net rating teams making the 2nd round next year:
63.6%
Percent of top 5 minutes played players who played in those 2nd round
series: 78.1%
For this part, you will work to fit a model that predicts the winner and the number of games in a playoffs series between any given two teams.
This is an intentionally open ended question, and there are multiple approaches you could take. Here are a few notes and specifications:
Your final output must include the probability of each team winning the series. For example: “Team A has a 30% chance to win and team B has a 70% chance.” instead of “Team B will win.” You must also predict the number of games in the series. This can be probabilistic or a point estimate.
You may use any data provided in this project, but please do not bring in any external sources of data.
You can only use data available prior to the start of the series. For example, you can’t use a team’s stats from the 2016-17 season to predict a playoffs series from the 2015-16 season.
The best models are explainable and lead to actionable insights around team and roster construction. We’re more interested in your thought process and critical thinking than we are in specific modeling techniques. Using smart features is more important than using fancy mathematical machinery.
Include, as part of your answer:
library(ggplot2)
library(teamcolors)
library(ggdark)
library(ggimage)
source("elo_funcs.r")
# Making reg_data_year dfs to contain regular season team data for each season from 2014-2023
reg_data_14 <- team_game_data %>%
filter(season == 2014, gametype == 2) %>%
arrange(nbagameid, gamedate)
reg_data_15 <- team_game_data %>%
filter(season == 2015, gametype == 2) %>%
arrange(nbagameid, gamedate)
reg_data_16 <- team_game_data %>%
filter(season == 2016, gametype == 2) %>%
arrange(nbagameid, gamedate)
reg_data_17 <- team_game_data %>%
filter(season == 2017, gametype == 2) %>%
arrange(nbagameid, gamedate)
reg_data_18 <- team_game_data %>%
filter(season == 2018, gametype == 2) %>%
arrange(nbagameid, gamedate)
reg_data_19 <- team_game_data %>%
filter(season == 2019, gametype == 2) %>%
arrange(nbagameid, gamedate)
reg_data_20 <- team_game_data %>%
filter(season == 2020, gametype == 2) %>%
arrange(nbagameid, gamedate)
reg_data_21 <- team_game_data %>%
filter(season == 2021, gametype == 2) %>%
arrange(nbagameid, gamedate)
reg_data_22 <- team_game_data %>%
filter(season == 2022, gametype == 2) %>%
arrange(nbagameid, gamedate)
reg_data_23 <- team_game_data %>%
filter(season == 2023, gametype == 2) %>%
arrange(nbagameid, gamedate)
# Making gam_res df that has specific columns from reg_data above
game_res_14 <- reg_data_14[,c("season", "gamedate","nbagameid","off_team", "off_team_name","def_team", "def_team_name", "off_win")]
game_res_15 <- reg_data_15[,c("season", "gamedate","nbagameid","off_team", "off_team_name","def_team", "def_team_name", "off_win")]
game_res_16 <- reg_data_16[,c("season", "gamedate","nbagameid","off_team", "off_team_name","def_team", "def_team_name", "off_win")]
game_res_17 <- reg_data_17[,c("season", "gamedate","nbagameid","off_team", "off_team_name","def_team", "def_team_name", "off_win")]
game_res_18 <- reg_data_18[,c("season", "gamedate","nbagameid","off_team", "off_team_name","def_team", "def_team_name", "off_win")]
game_res_19 <- reg_data_19[,c("season", "gamedate","nbagameid","off_team", "off_team_name","def_team", "def_team_name", "off_win")]
game_res_20 <- reg_data_20[,c("season", "gamedate","nbagameid","off_team", "off_team_name","def_team", "def_team_name", "off_win")]
game_res_21 <- reg_data_21[,c("season", "gamedate","nbagameid","off_team", "off_team_name","def_team", "def_team_name", "off_win")]
game_res_22 <- reg_data_22[,c("season", "gamedate","nbagameid","off_team", "off_team_name","def_team", "def_team_name", "off_win")]
game_res_23 <- reg_data_23[,c("season", "gamedate","nbagameid","off_team", "off_team_name","def_team", "def_team_name", "off_win")]
# Getting the unique teams from each year
teams_14 <- unique(game_res_14[,c("off_team", "off_team_name" )])
teams_15 <- unique(game_res_14[,c("off_team", "off_team_name" )])
teams_16 <- unique(game_res_14[,c("off_team", "off_team_name" )])
teams_17 <- unique(game_res_14[,c("off_team", "off_team_name" )])
teams_18 <- unique(game_res_14[,c("off_team", "off_team_name" )])
teams_19 <- unique(game_res_14[,c("off_team", "off_team_name" )])
teams_20 <- unique(game_res_14[,c("off_team", "off_team_name" )])
teams_21 <- unique(game_res_14[,c("off_team", "off_team_name" )])
teams_22 <- unique(game_res_14[,c("off_team", "off_team_name" )])
teams_23 <- unique(game_res_14[,c("off_team", "off_team_name" )])
# Create data frame of teams and a column of 1500
team_db_14 <- cbind.data.frame(teams_14, rep(1500, nrow(teams_14)))
# Name second column Elo
names(team_db_14)[c(1,3)] <- c("teams" ,"elo")
team_db_15 <- cbind.data.frame(teams_15, rep(1500, nrow(teams_15)))
# Name second column Elo
names(team_db_15)[c(1,3)] <- c("teams" ,"elo")
team_db_16 <- cbind.data.frame(teams_16, rep(1500, nrow(teams_16)))
# Name second column Elo
names(team_db_16)[c(1,3)] <- c("teams" ,"elo")
team_db_17 <- cbind.data.frame(teams_17, rep(1500, nrow(teams_17)))
# Name second column Elo
names(team_db_17)[c(1,3)] <- c("teams" ,"elo")
team_db_18 <- cbind.data.frame(teams_18, rep(1500, nrow(teams_18)))
# Name second column Elo
names(team_db_18)[c(1,3)] <- c("teams" ,"elo")
team_db_19 <- cbind.data.frame(teams_19, rep(1500, nrow(teams_19)))
# Name second column Elo
names(team_db_19)[c(1,3)] <- c("teams" ,"elo")
team_db_20 <- cbind.data.frame(teams_20, rep(1500, nrow(teams_20)))
# Name second column Elo
names(team_db_20)[c(1,3)] <- c("teams" ,"elo")
team_db_21 <- cbind.data.frame(teams_21, rep(1500, nrow(teams_21)))
# Name second column Elo
names(team_db_21)[c(1,3)] <- c("teams" ,"elo")
team_db_22 <- cbind.data.frame(teams_22, rep(1500, nrow(teams_22)))
# Name second column Elo
names(team_db_22)[c(1,3)] <- c("teams" ,"elo")
team_db_23 <- cbind.data.frame(teams_23, rep(1500, nrow(teams_23)))
# Name second column Elo
names(team_db_23)[c(1,3)] <- c("teams" ,"elo")
# Define the list of team_db and game_res data frames
team_db_list <- list(team_db_14, team_db_15, team_db_16, team_db_17,
team_db_18, team_db_19, team_db_20, team_db_21,
team_db_22, team_db_23)
game_res_list <- list(game_res_14, game_res_15, game_res_16, game_res_17,
game_res_18, game_res_19, game_res_20, game_res_21,
game_res_22, game_res_23)
# Loop through each pair of team_db and game_res data frames
for (i in seq_along(team_db_list)) {
team_db <- team_db_list[[i]]
game_res <- game_res_list[[i]]
for (j in 1:nrow(game_res)) {
# Extract match
match <- game_res[j, ]
# Extract team 1 Elo
team1_elo <- team_db$elo[team_db$teams == match$off_team]
# Extract team 2 Elo
team2_elo <- team_db$elo[team_db$teams == match$def_team]
# Calculate new Elo ratings
new_elo <- elo.calc(wins.A = match$off_win, # Select game outcome
elo.A = team1_elo, # Set Elo for team 1
elo.B = team2_elo, # Set Elo for team 2
k = 50) # Set update speed
# Store new Elo ratings for home team
team_db$elo[team_db$teams == match$off_team] <- new_elo[1, 1]
# Store new Elo ratings for away team
team_db$elo[team_db$teams == match$def_team] <- new_elo[1, 2]
}
# Assign the updated Elo ratings back to the correct team_db in the list
team_db_list[[i]] <- team_db
}
team_db_14 <- as.data.frame(team_db_list[1]) %>%
arrange(desc(elo))
team_db_15 <- as.data.frame(team_db_list[2]) %>%
arrange(desc(elo))
team_db_16 <- as.data.frame(team_db_list[3]) %>%
arrange(desc(elo))
team_db_17 <- as.data.frame(team_db_list[4]) %>%
arrange(desc(elo))
team_db_18 <- as.data.frame(team_db_list[5]) %>%
arrange(desc(elo))
team_db_19 <- as.data.frame(team_db_list[6]) %>%
arrange(desc(elo))
team_db_20 <- as.data.frame(team_db_list[7]) %>%
arrange(desc(elo))
team_db_21 <- as.data.frame(team_db_list[8]) %>%
arrange(desc(elo))
team_db_22 <- as.data.frame(team_db_list[9]) %>%
arrange(desc(elo))
team_db_23 <- as.data.frame(team_db_list[10]) %>%
arrange(desc(elo))
# Define a list of team_db data frames
team_db_list <- list(team_db_14, team_db_15, team_db_16, team_db_17,
team_db_18, team_db_19, team_db_20, team_db_21,
team_db_22, team_db_23)
# Loop through each team_db data frame
for (i in seq_along(team_db_list)) {
team_db <- team_db_list[[i]]
# Initialize wins and losses vectors
wins <- rep(0, nrow(team_db))
losses <- rep(0, nrow(team_db))
for(j in 1:nrow(team_db)){
team <- team_db$teams[j]
# Calculate wins: Team is offensive team and won
wins[j] <- sum(game_res_list[[i]]$off_team == team & game_res_list[[i]]$off_win == 1)
# Calculate losses: Team is defensive team and lost
losses[j] <- sum(game_res_list[[i]]$def_team == team & game_res_list[[i]]$off_win == 1)
}
# Add wins and losses to team_db
team_db$wins <- wins
team_db$losses <- losses
# Calculate win percentage
team_db$win_pct <- team_db$wins / (team_db$wins + team_db$losses)
team_db_list[[i]] <- team_db
}
team_db_14 <- as.data.frame(team_db_list[1]) %>%
arrange(desc(elo))
team_db_15 <- as.data.frame(team_db_list[2]) %>%
arrange(desc(elo))
team_db_16 <- as.data.frame(team_db_list[3]) %>%
arrange(desc(elo))
team_db_17 <- as.data.frame(team_db_list[4]) %>%
arrange(desc(elo))
team_db_18 <- as.data.frame(team_db_list[5]) %>%
arrange(desc(elo))
team_db_19 <- as.data.frame(team_db_list[6]) %>%
arrange(desc(elo))
team_db_20 <- as.data.frame(team_db_list[7]) %>%
arrange(desc(elo))
team_db_21 <- as.data.frame(team_db_list[8]) %>%
arrange(desc(elo))
team_db_22 <- as.data.frame(team_db_list[9]) %>%
arrange(desc(elo))
team_db_23 <- as.data.frame(team_db_list[10]) %>%
arrange(desc(elo))
# Define a list of team_db data frames
team_db_list <- list(team_db_14, team_db_15, team_db_16, team_db_17,
team_db_18, team_db_19, team_db_20, team_db_21,
team_db_22, team_db_23)
# Define lists of teams for Eastern and Western conferences
eastern <- c("BOS", "MIL", "PHI", "CLE", "BKN",
"NYK", "MIA", "ATL", "WAS", "TOR",
"CHI", "IND", "ORL", "CHA", "DET")
western <- c("DEN", "MEM", "SAC", "LAC", "PHX",
"DAL", "NOP", "MIN", "GSW", "OKC",
"UTA", "POR", "LAL", "SAS", "HOU")
# Loop through each team_db data frame
for (i in seq_along(team_db_list)) {
team_db <- team_db_list[[i]]
# Calculate conference and conf_rank
conference <- rep(NA, nrow(team_db))
conf_rank <- rep(NA, nrow(team_db))
conference[team_db$teams %in% eastern] <- "East"
conference[team_db$teams %in% western] <- "West"
conf_rank[team_db$teams %in% eastern] <- 16 - rank(team_db$win_pct[team_db$teams %in% eastern], ties.method = "random")
conf_rank[team_db$teams %in% western] <- 16 - rank(team_db$win_pct[team_db$teams %in% western], ties.method = "random")
team_db$conference <- conference
team_db$conf_rank <- conf_rank
# Manually correcting the conference rankings based on actual standings for teams with the same record to reflect tie-breaker system & play-in results (only for teams in the playoffs so we get the correct playoff matchups so some teams outside of the top 8 in their conference may have the wrong conference rankings)
if (i == 1) { # For the 2014 season
team_db$conf_rank[team_db$teams == "BKN"] <- 8
team_db$conf_rank[team_db$teams == "IND"] <- 9
team_db$conf_rank[team_db$teams == "HOU"] <- 2
team_db$conf_rank[team_db$teams == "LAC"] <- 3
team_db$conf_rank[team_db$teams == "POR"] <- 4
team_db$conf_rank[team_db$teams == "MEM"] <- 5
team_db$conf_rank[team_db$teams == "SAS"] <- 6
}
if (i == 2) { # For the 2015 season
team_db$conf_rank[team_db$teams == "MIA"] <- 3
team_db$conf_rank[team_db$teams == "ATL"] <- 4
team_db$conf_rank[team_db$teams == "BOS"] <- 5
team_db$conf_rank[team_db$teams == "CHA"] <- 6
team_db$conf_rank[team_db$teams == "MEM"] <- 7
team_db$conf_rank[team_db$teams == "DAL"] <- 6
}
if (i == 3) { # For the 2016 season
team_db$conf_rank[team_db$teams == "MIL"] <- 6
team_db$conf_rank[team_db$teams == "IND"] <- 7
team_db$conf_rank[team_db$teams == "LAC"] <- 4
team_db$conf_rank[team_db$teams == "UTA"] <- 5
}
if (i == 4) { # For the 2017 season
team_db$conf_rank[team_db$teams == "MIA"] <- 6
team_db$conf_rank[team_db$teams == "MIL"] <- 7
team_db$conf_rank[team_db$teams == "OKC"] <- 4
team_db$conf_rank[team_db$teams == "UTA"] <- 5
team_db$conf_rank[team_db$teams == "NOP"] <- 6
team_db$conf_rank[team_db$teams == "SAS"] <- 7
team_db$conf_rank[team_db$teams == "MIN"] <- 8
}
if (i == 5) { # For the 2018 season
team_db$conf_rank[team_db$teams == "BKN"] <- 6
team_db$conf_rank[team_db$teams == "ORL"] <- 7
team_db$conf_rank[team_db$teams == "POR"] <- 3
team_db$conf_rank[team_db$teams == "HOU"] <- 4
}
if (i == 6) { # For the 2019 season
team_db$conf_rank[team_db$teams == "HOU"] <- 4
team_db$conf_rank[team_db$teams == "OKC"] <- 5
team_db$conf_rank[team_db$teams == "UTA"] <- 6
}
if (i == 7) { # For the 2020 season
team_db$conf_rank[team_db$teams == "NYK"] <- 4
team_db$conf_rank[team_db$teams == "ATL"] <- 5
team_db$conf_rank[team_db$teams == "WAS"] <- 8
team_db$conf_rank[team_db$teams == "IND"] <- 9
team_db$conf_rank[team_db$teams == "DEN"] <- 3
team_db$conf_rank[team_db$teams == "LAC"] <- 4
team_db$conf_rank[team_db$teams == "DAL"] <- 5
team_db$conf_rank[team_db$teams == "POR"] <- 6
team_db$conf_rank[team_db$teams == "LAL"] <- 7
team_db$conf_rank[team_db$teams == "MEM"] <- 8
}
if (i == 8) { # For the 2021 season
team_db$conf_rank[team_db$teams == "CEL"] <- 2
team_db$conf_rank[team_db$teams == "MIL"] <- 3
team_db$conf_rank[team_db$teams == "PHI"] <- 4
team_db$conf_rank[team_db$teams == "ATL"] <- 8
team_db$conf_rank[team_db$teams == "CLE"] <- 9
team_db$conf_rank[team_db$teams == "CHA"] <- 10
team_db$conf_rank[team_db$teams == "NOP"] <- 8
team_db$conf_rank[team_db$teams == "LAC"] <- 9
}
if (i == 9) { # For the 2022 season
team_db$conf_rank[team_db$teams == "ATL"] <- 7
team_db$conf_rank[team_db$teams == "MIA"] <- 8
team_db$conf_rank[team_db$teams == "LAC"] <- 5
team_db$conf_rank[team_db$teams == "GSW"] <- 6
team_db$conf_rank[team_db$teams == "MIN"] <- 8
team_db$conf_rank[team_db$teams == "NOP"] <- 9
}
if (i == 10) { # For the 2023 season
team_db$conf_rank[team_db$teams == "ORL"] <- 5
team_db$conf_rank[team_db$teams == "IND"] <- 6
team_db$conf_rank[team_db$teams == "PHI"] <- 7
team_db$conf_rank[team_db$teams == "OKC"] <- 1
team_db$conf_rank[team_db$teams == "DEN"] <- 2
team_db$conf_rank[team_db$teams == "LAL"] <- 7
team_db$conf_rank[team_db$teams == "NOP"] <- 8
}
team_db_list[[i]] <- team_db
}
team_db_14 <- as.data.frame(team_db_list[1]) %>%
arrange(desc(elo))
team_db_15 <- as.data.frame(team_db_list[2]) %>%
arrange(desc(elo))
team_db_16 <- as.data.frame(team_db_list[3]) %>%
arrange(desc(elo))
team_db_17 <- as.data.frame(team_db_list[4]) %>%
arrange(desc(elo))
team_db_18 <- as.data.frame(team_db_list[5]) %>%
arrange(desc(elo))
team_db_19 <- as.data.frame(team_db_list[6]) %>%
arrange(desc(elo))
team_db_20 <- as.data.frame(team_db_list[7]) %>%
arrange(desc(elo))
team_db_21 <- as.data.frame(team_db_list[8]) %>%
arrange(desc(elo))
team_db_22 <- as.data.frame(team_db_list[9]) %>%
arrange(desc(elo))
team_db_23 <- as.data.frame(team_db_list[10]) %>%
arrange(desc(elo))
team_db2 <- team_db_list
# Function to calculate the probability of team A winning
elo.prob <- function(elo.A, elo.B) {
return(1 / (1 + 10^((elo.B - elo.A) / 400)))
}
# Function to simulate a playoff series
sim_series <- function(team_1, team_2, team_db) {
series_res <- data.frame(game_res = rep(NA, 7),
game_win_prob = rep(NA, 7),
game_sim_val = rep(NA, 7))
stop <- FALSE
i <- 0
team_1_wins <- 0
team_2_wins <- 0
# Get Elo ratings
team_1_elo <- team_db$elo[team_db$teams == team_1]
team_2_elo <- team_db$elo[team_db$teams == team_2]
while (!stop && i < 7) {
i <- i + 1
# Calculate win probability for team_1
series_res$game_win_prob[i] <- elo.prob(team_1_elo, team_2_elo)
# Simulate game outcome
series_res$game_sim_val[i] <- runif(1, min = 0, max = 1)
if (series_res$game_sim_val[i] <= series_res$game_win_prob[i]) {
series_res$game_res[i] <- 1
team_1_wins <- team_1_wins + 1
} else {
series_res$game_res[i] <- 0
team_2_wins <- team_2_wins + 1
}
if (team_1_wins == 4 || team_2_wins == 4) {
stop <- TRUE
}
}
# Determine winner and loser
if (team_1_wins == 4) {
winner <- team_1
loser <- team_2
series_win_prob <- mean(series_res$game_win_prob[series_res$game_res == 1], na.rm = TRUE)
} else {
winner <- team_2
loser <- team_1
series_win_prob <- mean(1 - series_res$game_win_prob[series_res$game_res == 0], na.rm = TRUE)
}
num_games <- i
return(list(winner = winner,
loser = loser,
series_res = series_res,
num_games = num_games,
series_win_prob = series_win_prob))
}
# Function to run the simulation
run_simulation <- function(team1, team2, team_db) {
set.seed(123456)
result <- sim_series(team_1 = team1, team_2 = team2, team_db = team_db)
print(paste(result$winner, "is projected to beat", result$loser,
"with a", round(result$series_win_prob * 100, 0),
"% chance of winning in", result$num_games,
"games but", result$loser, "has a",
round((1 - result$series_win_prob) * 100, 0),
"% chance of beating", result$winner))
}
# Round 1
run_simulation("BKN", "ATL", team_db_14)
## [1] "BKN is projected to beat ATL with a 54 % chance of winning in 6 games but ATL has a 46 % chance of beating BKN"
run_simulation("MIL", "CHI", team_db_14)
## [1] "CHI is projected to beat MIL with a 63 % chance of winning in 7 games but MIL has a 37 % chance of beating CHI"
run_simulation("BOS", "CLE", team_db_14)
## [1] "BOS is projected to beat CLE with a 59 % chance of winning in 6 games but CLE has a 41 % chance of beating BOS"
run_simulation("TOR", "WAS", team_db_14)
## [1] "TOR is projected to beat WAS with a 55 % chance of winning in 6 games but WAS has a 45 % chance of beating TOR"
run_simulation("NOP", "GSW", team_db_14)
## [1] "GSW is projected to beat NOP with a 74 % chance of winning in 4 games but NOP has a 26 % chance of beating GSW"
run_simulation("DAL", "HOU", team_db_14)
## [1] "HOU is projected to beat DAL with a 70 % chance of winning in 4 games but DAL has a 30 % chance of beating HOU"
run_simulation("SAS", "LAC", team_db_14)
## [1] "SAS is projected to beat LAC with a 52 % chance of winning in 6 games but LAC has a 48 % chance of beating SAS"
run_simulation("MEM", "POR", team_db_14)
## [1] "MEM is projected to beat POR with a 74 % chance of winning in 6 games but POR has a 26 % chance of beating MEM"
# Round 2
run_simulation("TOR", "BKN", team_db_14)
## [1] "TOR is projected to beat BKN with a 55 % chance of winning in 6 games but BKN has a 45 % chance of beating TOR"
run_simulation("CHI", "BOS", team_db_14)
## [1] "BOS is projected to beat CHI with a 71 % chance of winning in 4 games but CHI has a 29 % chance of beating BOS"
run_simulation("GSW", "MEM", team_db_14)
## [1] "GSW is projected to beat MEM with a 74 % chance of winning in 6 games but MEM has a 26 % chance of beating GSW"
run_simulation("SAS", "HOU", team_db_14)
## [1] "SAS is projected to beat HOU with a 62 % chance of winning in 6 games but HOU has a 38 % chance of beating SAS"
# Round 3
run_simulation("TOR", "BOS", team_db_14)
## [1] "BOS is projected to beat TOR with a 77 % chance of winning in 4 games but TOR has a 23 % chance of beating BOS"
run_simulation("GSW", "SAS", team_db_14)
## [1] "GSW is projected to beat SAS with a 50 % chance of winning in 6 games but SAS has a 50 % chance of beating GSW"
# Finals
run_simulation("GSW", "BOS", team_db_14)
## [1] "GSW is projected to beat BOS with a 60 % chance of winning in 6 games but BOS has a 40 % chance of beating GSW"
# Round 1
run_simulation("BOS", "ATL", team_db_15)
## [1] "BOS is projected to beat ATL with a 53 % chance of winning in 6 games but ATL has a 47 % chance of beating BOS"
run_simulation("DET", "CLE", team_db_15)
## [1] "DET is projected to beat CLE with a 59 % chance of winning in 6 games but CLE has a 41 % chance of beating DET"
run_simulation("CHA", "MIA", team_db_15)
## [1] "CHA is projected to beat MIA with a 56 % chance of winning in 6 games but MIA has a 44 % chance of beating CHA"
run_simulation("TOR", "IND", team_db_15)
## [1] "TOR is projected to beat IND with a 67 % chance of winning in 6 games but IND has a 33 % chance of beating TOR"
run_simulation("GSW", "HOU", team_db_15)
## [1] "GSW is projected to beat HOU with a 84 % chance of winning in 4 games but HOU has a 16 % chance of beating GSW"
run_simulation("DAL", "OKC", team_db_15)
## [1] "DAL is projected to beat OKC with a 57 % chance of winning in 6 games but OKC has a 43 % chance of beating DAL"
run_simulation("LAC", "POR", team_db_15)
## [1] "LAC is projected to beat POR with a 52 % chance of winning in 6 games but POR has a 48 % chance of beating LAC"
run_simulation("SAS", "MEM", team_db_15)
## [1] "SAS is projected to beat MEM with a 92 % chance of winning in 4 games but MEM has a 8 % chance of beating SAS"
# Round 2
run_simulation("BOS", "DET", team_db_15)
## [1] "BOS is projected to beat DET with a 49 % chance of winning in 6 games but DET has a 51 % chance of beating BOS"
run_simulation("TOR", "CHA", team_db_15)
## [1] "TOR is projected to beat CHA with a 60 % chance of winning in 6 games but CHA has a 40 % chance of beating TOR"
run_simulation("GSW", "LAC", team_db_15)
## [1] "GSW is projected to beat LAC with a 76 % chance of winning in 5 games but LAC has a 24 % chance of beating GSW"
run_simulation("DAL", "SAS", team_db_15)
## [1] "SAS is projected to beat DAL with a 70 % chance of winning in 4 games but DAL has a 30 % chance of beating SAS"
# Round 3
run_simulation("BOS", "TOR", team_db_15)
## [1] "TOR is projected to beat BOS with a 61 % chance of winning in 7 games but BOS has a 39 % chance of beating TOR"
run_simulation("GSW", "SAS", team_db_15)
## [1] "GSW is projected to beat SAS with a 64 % chance of winning in 6 games but SAS has a 36 % chance of beating GSW"
# Finals
run_simulation("GSW", "BOS", team_db_15)
## [1] "GSW is projected to beat BOS with a 80 % chance of winning in 5 games but BOS has a 20 % chance of beating GSW"
# Round 1
run_simulation("BOS", "CHI", team_db_16)
## [1] "BOS is projected to beat CHI with a 63 % chance of winning in 6 games but CHI has a 37 % chance of beating BOS"
run_simulation("IND", "CLE", team_db_16)
## [1] "IND is projected to beat CLE with a 73 % chance of winning in 6 games but CLE has a 27 % chance of beating IND"
run_simulation("TOR", "MIL", team_db_16)
## [1] "TOR is projected to beat MIL with a 71 % chance of winning in 6 games but MIL has a 29 % chance of beating TOR"
run_simulation("WAS", "ATL", team_db_16)
## [1] "WAS is projected to beat ATL with a 52 % chance of winning in 6 games but ATL has a 48 % chance of beating WAS"
run_simulation("POR", "GSW", team_db_16)
## [1] "GSW is projected to beat POR with a 70 % chance of winning in 4 games but POR has a 30 % chance of beating GSW"
run_simulation("HOU", "OKC", team_db_16)
## [1] "HOU is projected to beat OKC with a 55 % chance of winning in 6 games but OKC has a 45 % chance of beating HOU"
run_simulation("SAS", "MEM", team_db_16)
## [1] "SAS is projected to beat MEM with a 80 % chance of winning in 4 games but MEM has a 20 % chance of beating SAS"
run_simulation("UTA", "LAC", team_db_16)
## [1] "UTA is projected to beat LAC with a 53 % chance of winning in 6 games but LAC has a 47 % chance of beating UTA"
# Round 2
run_simulation("BOS", "WAS", team_db_16)
## [1] "BOS is projected to beat WAS with a 66 % chance of winning in 6 games but WAS has a 34 % chance of beating BOS"
run_simulation("IND", "TOR", team_db_16)
## [1] "TOR is projected to beat IND with a 62 % chance of winning in 7 games but IND has a 38 % chance of beating TOR"
run_simulation("GSW", "UTA", team_db_16)
## [1] "GSW is projected to beat UTA with a 63 % chance of winning in 6 games but UTA has a 37 % chance of beating GSW"
run_simulation("HOU", "SAS", team_db_16)
## [1] "HOU is projected to beat SAS with a 53 % chance of winning in 6 games but SAS has a 47 % chance of beating HOU"
# Round 3
run_simulation("TOR", "BOS", team_db_16)
## [1] "TOR is projected to beat BOS with a 55 % chance of winning in 6 games but BOS has a 45 % chance of beating TOR"
run_simulation("GSW", "HOU", team_db_16)
## [1] "GSW is projected to beat HOU with a 80 % chance of winning in 5 games but HOU has a 20 % chance of beating GSW"
# Finals
run_simulation("GSW", "TOR", team_db_16)
## [1] "GSW is projected to beat TOR with a 71 % chance of winning in 6 games but TOR has a 29 % chance of beating GSW"
# Round 1
run_simulation("BOS", "MIL", team_db_17)
## [1] "BOS is projected to beat MIL with a 53 % chance of winning in 6 games but MIL has a 47 % chance of beating BOS"
run_simulation("CLE", "IND", team_db_17)
## [1] "CLE is projected to beat IND with a 50 % chance of winning in 6 games but IND has a 50 % chance of beating CLE"
run_simulation("PHI", "MIA", team_db_17)
## [1] "PHI is projected to beat MIA with a 85 % chance of winning in 4 games but MIA has a 15 % chance of beating PHI"
run_simulation("TOR", "WAS", team_db_17)
## [1] "TOR is projected to beat WAS with a 86 % chance of winning in 4 games but WAS has a 14 % chance of beating TOR"
run_simulation("SAS", "GSW", team_db_17)
## [1] "SAS is projected to beat GSW with a 65 % chance of winning in 6 games but GSW has a 35 % chance of beating SAS"
run_simulation("HOU", "MIN", team_db_17)
## [1] "HOU is projected to beat MIN with a 72 % chance of winning in 6 games but MIN has a 28 % chance of beating HOU"
run_simulation("NOP", "POR", team_db_17)
## [1] "NOP is projected to beat POR with a 56 % chance of winning in 6 games but POR has a 44 % chance of beating NOP"
run_simulation("UTA", "OKC", team_db_17)
## [1] "UTA is projected to beat OKC with a 56 % chance of winning in 6 games but OKC has a 44 % chance of beating UTA"
# Round 2
run_simulation("BOS", "PHI", team_db_17)
## [1] "PHI is projected to beat BOS with a 83 % chance of winning in 4 games but BOS has a 17 % chance of beating PHI"
run_simulation("TOR", "CLE", team_db_17)
## [1] "TOR is projected to beat CLE with a 59 % chance of winning in 6 games but CLE has a 41 % chance of beating TOR"
run_simulation("SAS", "NOP", team_db_17)
## [1] "NOP is projected to beat SAS with a 63 % chance of winning in 7 games but SAS has a 37 % chance of beating NOP"
run_simulation("HOU", "UTA", team_db_17)
## [1] "HOU is projected to beat UTA with a 58 % chance of winning in 6 games but UTA has a 42 % chance of beating HOU"
# Round 3
run_simulation("TOR", "BOS", team_db_17)
## [1] "TOR is projected to beat BOS with a 67 % chance of winning in 6 games but BOS has a 33 % chance of beating TOR"
run_simulation("HOU", "NOP", team_db_17)
## [1] "HOU is projected to beat NOP with a 60 % chance of winning in 6 games but NOP has a 40 % chance of beating HOU"
# Finals
run_simulation("TOR", "HOU", team_db_17)
## [1] "HOU is projected to beat TOR with a 63 % chance of winning in 7 games but TOR has a 37 % chance of beating HOU"
# Round 1
run_simulation("BOS", "IND", team_db_18)
## [1] "BOS is projected to beat IND with a 64 % chance of winning in 6 games but IND has a 36 % chance of beating BOS"
run_simulation("MIL", "DET", team_db_18)
## [1] "MIL is projected to beat DET with a 72 % chance of winning in 6 games but DET has a 28 % chance of beating MIL"
run_simulation("PHI", "BKN", team_db_18)
## [1] "BKN is projected to beat PHI with a 67 % chance of winning in 4 games but PHI has a 33 % chance of beating BKN"
run_simulation("ORL", "TOR", team_db_18)
## [1] "ORL is projected to beat TOR with a 61 % chance of winning in 6 games but TOR has a 39 % chance of beating ORL"
run_simulation("SAS", "DEN", team_db_18)
## [1] "SAS is projected to beat DEN with a 54 % chance of winning in 6 games but DEN has a 46 % chance of beating SAS"
run_simulation("GSW", "LAC", team_db_18)
## [1] "GSW is projected to beat LAC with a 53 % chance of winning in 6 games but LAC has a 47 % chance of beating GSW"
run_simulation("HOU", "UTA", team_db_18)
## [1] "HOU is projected to beat UTA with a 63 % chance of winning in 6 games but UTA has a 37 % chance of beating HOU"
run_simulation("POR", "OKC", team_db_18)
## [1] "POR is projected to beat OKC with a 58 % chance of winning in 6 games but OKC has a 42 % chance of beating POR"
# Round 2
run_simulation("MIL", "BOS", team_db_18)
## [1] "MIL is projected to beat BOS with a 56 % chance of winning in 6 games but BOS has a 44 % chance of beating MIL"
run_simulation("BKN", "ORL", team_db_18)
## [1] "ORL is projected to beat BKN with a 65 % chance of winning in 5 games but BKN has a 35 % chance of beating ORL"
run_simulation("GSW", "HOU", team_db_18)
## [1] "HOU is projected to beat GSW with a 62 % chance of winning in 7 games but GSW has a 38 % chance of beating HOU"
run_simulation("POR", "SAS", team_db_18)
## [1] "POR is projected to beat SAS with a 67 % chance of winning in 6 games but SAS has a 33 % chance of beating POR"
# Round 3
run_simulation("ORL", "MIL", team_db_18)
## [1] "ORL is projected to beat MIL with a 62 % chance of winning in 6 games but MIL has a 38 % chance of beating ORL"
run_simulation("POR", "HOU", team_db_18)
## [1] "POR is projected to beat HOU with a 53 % chance of winning in 6 games but HOU has a 47 % chance of beating POR"
# Finals
run_simulation("POR", "ORL", team_db_18)
## [1] "POR is projected to beat ORL with a 51 % chance of winning in 6 games but ORL has a 49 % chance of beating POR"
# Round 1
run_simulation("BOS", "PHI", team_db_19)
## [1] "BOS is projected to beat PHI with a 58 % chance of winning in 6 games but PHI has a 42 % chance of beating BOS"
run_simulation("MIA", "IND", team_db_19)
## [1] "IND is projected to beat MIA with a 76 % chance of winning in 4 games but MIA has a 24 % chance of beating IND"
run_simulation("ORL", "MIL", team_db_19)
## [1] "ORL is projected to beat MIL with a 55 % chance of winning in 6 games but MIL has a 45 % chance of beating ORL"
run_simulation("TOR", "BKN", team_db_19)
## [1] "TOR is projected to beat BKN with a 74 % chance of winning in 6 games but BKN has a 26 % chance of beating TOR"
run_simulation("UTA", "DEN", team_db_19)
## [1] "UTA is projected to beat DEN with a 54 % chance of winning in 6 games but DEN has a 46 % chance of beating UTA"
run_simulation("HOU", "OKC", team_db_19)
## [1] "OKC is projected to beat HOU with a 66 % chance of winning in 4 games but HOU has a 34 % chance of beating OKC"
run_simulation("LAC", "DAL", team_db_19)
## [1] "LAC is projected to beat DAL with a 79 % chance of winning in 5 games but DAL has a 21 % chance of beating LAC"
run_simulation("LAL", "POR", team_db_19)
## [1] "POR is projected to beat LAL with a 66 % chance of winning in 5 games but LAL has a 34 % chance of beating POR"
# Round 2
run_simulation("BOS", "TOR", team_db_19)
## [1] "TOR is projected to beat BOS with a 75 % chance of winning in 4 games but BOS has a 25 % chance of beating TOR"
run_simulation("IND", "ORL", team_db_19)
## [1] "IND is projected to beat ORL with a 74 % chance of winning in 6 games but ORL has a 26 % chance of beating IND"
run_simulation("UTA", "LAC", team_db_19)
## [1] "LAC is projected to beat UTA with a 77 % chance of winning in 4 games but UTA has a 23 % chance of beating LAC"
run_simulation("POR", "OKC", team_db_19)
## [1] "POR is projected to beat OKC with a 62 % chance of winning in 6 games but OKC has a 38 % chance of beating POR"
# Round 3
run_simulation("TOR", "IND", team_db_19)
## [1] "TOR is projected to beat IND with a 67 % chance of winning in 6 games but IND has a 33 % chance of beating TOR"
run_simulation("LAC", "POR", team_db_19)
## [1] "LAC is projected to beat POR with a 54 % chance of winning in 6 games but POR has a 46 % chance of beating LAC"
# Finals
run_simulation("LAC", "TOR", team_db_19)
## [1] "TOR is projected to beat LAC with a 62 % chance of winning in 7 games but LAC has a 38 % chance of beating TOR"
# Round 1
run_simulation("ATL", "NYK", team_db_20)
## [1] "ATL is projected to beat NYK with a 57 % chance of winning in 6 games but NYK has a 43 % chance of beating ATL"
run_simulation("BKN", "BOS", team_db_20)
## [1] "BKN is projected to beat BOS with a 78 % chance of winning in 5 games but BOS has a 22 % chance of beating BKN"
run_simulation("MIA", "MIL", team_db_20)
## [1] "MIA is projected to beat MIL with a 53 % chance of winning in 6 games but MIL has a 47 % chance of beating MIA"
run_simulation("PHI", "WAS", team_db_20)
## [1] "PHI is projected to beat WAS with a 54 % chance of winning in 6 games but WAS has a 46 % chance of beating PHI"
run_simulation("DEN", "POR", team_db_20)
## [1] "POR is projected to beat DEN with a 63 % chance of winning in 7 games but DEN has a 37 % chance of beating POR"
run_simulation("DAL", "LAC", team_db_20)
## [1] "DAL is projected to beat LAC with a 60 % chance of winning in 6 games but LAC has a 40 % chance of beating DAL"
run_simulation("PHX", "LAL", team_db_20)
## [1] "PHX is projected to beat LAL with a 61 % chance of winning in 6 games but LAL has a 39 % chance of beating PHX"
run_simulation("UTA", "MEM", team_db_20)
## [1] "UTA is projected to beat MEM with a 67 % chance of winning in 6 games but MEM has a 33 % chance of beating UTA"
# Round 2
run_simulation("ATL", "PHI", team_db_20)
## [1] "ATL is projected to beat PHI with a 63 % chance of winning in 6 games but PHI has a 37 % chance of beating ATL"
run_simulation("BKN", "MIA", team_db_20)
## [1] "BKN is projected to beat MIA with a 53 % chance of winning in 6 games but MIA has a 47 % chance of beating BKN"
run_simulation("DAL", "UTA", team_db_20)
## [1] "UTA is projected to beat DAL with a 61 % chance of winning in 7 games but DAL has a 39 % chance of beating UTA"
run_simulation("PHX", "POR", team_db_20)
## [1] "PHX is projected to beat POR with a 52 % chance of winning in 6 games but POR has a 48 % chance of beating PHX"
# Round 3
run_simulation("ATL", "BKN", team_db_20)
## [1] "ATL is projected to beat BKN with a 57 % chance of winning in 6 games but BKN has a 43 % chance of beating ATL"
run_simulation("PHX", "UTA", team_db_20)
## [1] "PHX is projected to beat UTA with a 62 % chance of winning in 6 games but UTA has a 38 % chance of beating PHX"
# Finals
run_simulation("PHX", "ATL", team_db_20)
## [1] "PHX is projected to beat ATL with a 50 % chance of winning in 6 games but ATL has a 50 % chance of beating PHX"
# Round 1
run_simulation("BOS", "BKN", team_db_21)
## [1] "BOS is projected to beat BKN with a 66 % chance of winning in 6 games but BKN has a 34 % chance of beating BOS"
run_simulation("MIA", "ATL", team_db_21)
## [1] "MIA is projected to beat ATL with a 58 % chance of winning in 6 games but ATL has a 42 % chance of beating MIA"
run_simulation("MIL", "CHI", team_db_21)
## [1] "MIL is projected to beat CHI with a 69 % chance of winning in 6 games but CHI has a 31 % chance of beating MIL"
run_simulation("TOR", "PHI", team_db_21)
## [1] "TOR is projected to beat PHI with a 57 % chance of winning in 6 games but PHI has a 43 % chance of beating TOR"
run_simulation("DAL", "UTA", team_db_21)
## [1] "DAL is projected to beat UTA with a 78 % chance of winning in 5 games but UTA has a 22 % chance of beating DAL"
run_simulation("GSW", "DEN", team_db_21)
## [1] "GSW is projected to beat DEN with a 59 % chance of winning in 6 games but DEN has a 41 % chance of beating GSW"
run_simulation("MEM", "MIN", team_db_21)
## [1] "MEM is projected to beat MIN with a 60 % chance of winning in 6 games but MIN has a 40 % chance of beating MEM"
run_simulation("PHX", "NOP", team_db_21)
## [1] "PHX is projected to beat NOP with a 65 % chance of winning in 6 games but NOP has a 35 % chance of beating PHX"
# Round 2
run_simulation("BOS", "MIL", team_db_21)
## [1] "BOS is projected to beat MIL with a 64 % chance of winning in 6 games but MIL has a 36 % chance of beating BOS"
run_simulation("TOR", "MIA", team_db_21)
## [1] "TOR is projected to beat MIA with a 53 % chance of winning in 6 games but MIA has a 47 % chance of beating TOR"
run_simulation("DAL", "PHX", team_db_21)
## [1] "DAL is projected to beat PHX with a 69 % chance of winning in 6 games but PHX has a 31 % chance of beating DAL"
run_simulation("MEM", "GSW", team_db_21)
## [1] "MEM is projected to beat GSW with a 56 % chance of winning in 6 games but GSW has a 44 % chance of beating MEM"
# Round 3
run_simulation("BOS", "TOR", team_db_21)
## [1] "BOS is projected to beat TOR with a 55 % chance of winning in 6 games but TOR has a 45 % chance of beating BOS"
run_simulation("MEM", "DAL", team_db_21)
## [1] "DAL is projected to beat MEM with a 61 % chance of winning in 7 games but MEM has a 39 % chance of beating DAL"
# Finals
run_simulation("BOS", "DAL", team_db_21)
## [1] "BOS is projected to beat DAL with a 50 % chance of winning in 6 games but DAL has a 50 % chance of beating BOS"
# Round 1
run_simulation("BOS", "ATL", team_db_22)
## [1] "BOS is projected to beat ATL with a 76 % chance of winning in 5 games but ATL has a 24 % chance of beating BOS"
run_simulation("MIA", "MIL", team_db_22)
## [1] "MIL is projected to beat MIA with a 62 % chance of winning in 7 games but MIA has a 38 % chance of beating MIL"
run_simulation("CLE", "NYK", team_db_22)
## [1] "CLE is projected to beat NYK with a 59 % chance of winning in 6 games but NYK has a 41 % chance of beating CLE"
run_simulation("PHI", "BKN", team_db_22)
## [1] "PHI is projected to beat BKN with a 70 % chance of winning in 6 games but BKN has a 30 % chance of beating PHI"
run_simulation("DEN", "MIN", team_db_22)
## [1] "MIN is projected to beat DEN with a 68 % chance of winning in 4 games but DEN has a 32 % chance of beating MIN"
run_simulation("GSW", "SAC", team_db_22)
## [1] "GSW is projected to beat SAC with a 76 % chance of winning in 5 games but SAC has a 24 % chance of beating GSW"
run_simulation("LAL", "MEM", team_db_22)
## [1] "LAL is projected to beat MEM with a 65 % chance of winning in 6 games but MEM has a 35 % chance of beating LAL"
run_simulation("LAC", "PHX", team_db_22)
## [1] "LAC is projected to beat PHX with a 61 % chance of winning in 6 games but PHX has a 39 % chance of beating LAC"
# Round 2
run_simulation("BOS", "PHI", team_db_22)
## [1] "BOS is projected to beat PHI with a 56 % chance of winning in 6 games but PHI has a 44 % chance of beating BOS"
run_simulation("MIL", "CLE", team_db_22)
## [1] "MIL is projected to beat CLE with a 54 % chance of winning in 6 games but CLE has a 46 % chance of beating MIL"
run_simulation("LAC", "MIN", team_db_22)
## [1] "LAC is projected to beat MIN with a 60 % chance of winning in 6 games but MIN has a 40 % chance of beating LAC"
run_simulation("LAL", "GSW", team_db_22)
## [1] "LAL is projected to beat GSW with a 59 % chance of winning in 6 games but GSW has a 41 % chance of beating LAL"
# Round 3
run_simulation("MIL", "BOS", team_db_22)
## [1] "BOS is projected to beat MIL with a 63 % chance of winning in 7 games but MIL has a 37 % chance of beating BOS"
run_simulation("LAL", "LAC", team_db_22)
## [1] "LAL is projected to beat LAC with a 51 % chance of winning in 6 games but LAC has a 49 % chance of beating LAL"
# Finals
run_simulation("BOS", "LAL", team_db_22)
## [1] "BOS is projected to beat LAL with a 51 % chance of winning in 6 games but LAL has a 49 % chance of beating BOS"
# Round 1
run_simulation("BOS", "MIA", team_db_23)
## [1] "BOS is projected to beat MIA with a 67 % chance of winning in 6 games but MIA has a 33 % chance of beating BOS"
run_simulation("CLE", "ORL", team_db_23)
## [1] "CLE is projected to beat ORL with a 64 % chance of winning in 6 games but ORL has a 36 % chance of beating CLE"
run_simulation("IND", "MIL", team_db_23)
## [1] "IND is projected to beat MIL with a 77 % chance of winning in 5 games but MIL has a 23 % chance of beating IND"
run_simulation("NYK", "PHI", team_db_23)
## [1] "PHI is projected to beat NYK with a 65 % chance of winning in 5 games but NYK has a 35 % chance of beating PHI"
run_simulation("DAL", "LAC", team_db_23)
## [1] "DAL is projected to beat LAC with a 74 % chance of winning in 6 games but LAC has a 26 % chance of beating DAL"
run_simulation("LAL", "DEN", team_db_23)
## [1] "LAL is projected to beat DEN with a 77 % chance of winning in 5 games but DEN has a 23 % chance of beating LAL"
run_simulation("PHX", "MIN", team_db_23)
## [1] "MIN is projected to beat PHX with a 62 % chance of winning in 7 games but PHX has a 38 % chance of beating MIN"
run_simulation("OKC", "NOP", team_db_23)
## [1] "OKC is projected to beat NOP with a 65 % chance of winning in 6 games but NOP has a 35 % chance of beating OKC"
# Round 2
run_simulation("BOS", "CLE", team_db_23)
## [1] "BOS is projected to beat CLE with a 67 % chance of winning in 6 games but CLE has a 33 % chance of beating BOS"
run_simulation("IND", "PHI", team_db_23)
## [1] "IND is projected to beat PHI with a 52 % chance of winning in 6 games but PHI has a 48 % chance of beating IND"
run_simulation("DAL", "OKC", team_db_23)
## [1] "DAL is projected to beat OKC with a 55 % chance of winning in 6 games but OKC has a 45 % chance of beating DAL"
run_simulation("MIN", "LAL", team_db_23)
## [1] "LAL is projected to beat MIN with a 62 % chance of winning in 7 games but MIN has a 38 % chance of beating LAL"
# Round 3
run_simulation("IND", "BOS", team_db_23)
## [1] "IND is projected to beat BOS with a 54 % chance of winning in 6 games but BOS has a 46 % chance of beating IND"
run_simulation("LAL", "DAL", team_db_23)
## [1] "LAL is projected to beat DAL with a 52 % chance of winning in 6 games but DAL has a 48 % chance of beating LAL"
# Finals
run_simulation("LAL", "IND", team_db_23)
## [1] "LAL is projected to beat IND with a 61 % chance of winning in 6 games but IND has a 39 % chance of beating LAL"
At a high level, the model predicts NBA playoff series outcomes for multiple years based on the preceding regular season (e.g., 2014-15 regular season data is used to predicts playoffs series played in 2015 - both in the 2014 season), and is designed to provide insightful and accurate forecasts by simulating multiple seasons of data. The model uses Elo ratings, a well-established method for assessing team strength, to calculate the probability of each team winning a game. By inputting the current Elo ratings of the teams in a series, we can simulate the outcome of each game and, ultimately, the series. The simulation considers series lengths between 4 and 7 games, ensuring realistic scenarios and consistent win probability calculations. The model’s output offers a comprehensive view of potential series outcomes, helping front office decision-makers understand the likely trajectories and prepare strategic plans accordingly. This user-friendly tool requires minimal statistical expertise, making it accessible for informed decision-making in the fast-paced environment of NBA playoffs.
Data Preparation
For each season from 2014 to 2023, the game data is filtered to include only regular season games for each season (2014-2023) so that each season’s playoffs is only predicted from the stats of that previous regular season. These games are then organized by game ID and date to ensure that ELo rating is calculated correctly since the order of the games matter. The refined data retains only essential columns: season, game date, game ID, offensive team, offensive team name, defensive team, defensive team name, and game outcome (whether the offensive team won).
Creating Elo Databases
To create team data frames, unique team identifiers and names are extracted for each season. Each team starts with an initial Elo rating of 1500.
Updating Elo Ratings
The Elo ratings are updated for each game through a loop that iterates through each season’s team and game data frames. For each game, the Elo ratings of the competing teams are retrieved and updated using the elo.calc function, which considers the game outcome, initial Elo ratings, and a k-factor that determines the adjustment magnitude. The updated ratings are then stored back in the team data frame.
Calculating Win Percentages
For each team, the number of wins and losses is calculated from the game results, and win percentages are computed as the ratio of wins to total games played.
Determining Conference Rankings
Teams are assigned to either the Eastern or Western conference and ranked within their conferences based on win percentages. Manual adjustments are made to correct the rankings for tie-breaker systems and play-in results, ensuring accurate playoff matchups.
Organizing and Displaying Results
Teams are sorted by their updated Elo ratings and win percentages for each season. The team databases are then printed for review.
Playoff Simulation Execution
With the updated Elo ratings and accurate conference rankings, playoff series are simulated by iterating through matchups and determining the likelihood of a team winning a series based on their Elo ratings.
Notes
The code uses the elo.calc function for updating Elo ratings, which follows the standard Elo rating formula:
EA = 1/((1+10^RB-RA)/400)
where EA is the expected score for team A, and RA & RB are the Elo ratings for teams A and B, respectively. The k-factor, set to 50, determines the sensitivity of the Elo rating adjustments after each game. This setup allows for simulating playoff series outcomes by considering the updated Elo ratings after each game, ensuring a dynamic and responsive simulation that reflects real-world changes in team performance throughout the playoffs.
Strengths * Uses a good sample size (entire regular season leading up to playoffs) to calculate playoff win percentage based on ELO * Accounts for strength of schedule & quality of wins and loses (the amount ELO goes up and down depends on how good the team they face is - e.g., goes up more for beating better teams and down more for losing to bad teams and vice versa) * Includes some randomness in its calculation that simulates the randomness that can occur in the playoffs
Weaknesses 1) Some variability depending on whether teams are team1
or team2 in run_simulation function which shouldn’t be the case
2) Relies heavily on a team’s regular season performance 3) Doesn’t
account for specific stats (e.g., points, 3-point %, assists, etc.) 4)
Doesn’t incorporate player stats 5) Seems to default to 4 and 6 game
series (not many 5 or 7 game series) indicating that the length of
series isn’t calculated correctly 6) Involves some some complicated
calculations that can be hard to interpret 7) Simplified Elo
Calculations: The model uses a basic Elo rating formula with a fixed
k-factor, which may not capture the full complexity of team dynamics and
matchups 8) Lack of Contextual Factors: The model does not account for
injuries, player trades, or other contextual factors that could
significantly impact team performance 9) Static Initial Ratings: All
teams start with the same initial Elo rating, which may not accurately
reflect the starting strength of teams each season 10) Manual
Adjustments: The need for manual adjustments to correct rankings for
tie-breaker systems and play-in results can introduce human error and
inconsistencies 11) Simplified Outcome Prediction: The model predicts
outcomes based on Elo ratings alone, without incorporating additional
statistical methods or machine learning techniques that could improve
prediction accuracy. 12) Inefficient coding: Repeat code a lot,
especially for the different season, could be more efficient
7) Enhanced Elo Calculations: Introduce a dynamic k-factor that adjusts based on the importance of the game (e.g., regular season vs. playoffs) or the margin of victory. This would make the Elo rating adjustments more sensitive to context
8) Contextual Data Integration: Incorporate additional data such as player injuries, trades, and other contextual factors. This could be achieved by integrating real-time sports analytics and news sources
9) Customized Initial Ratings: Use historical performance data to assign more accurate initial Elo ratings for each team at the start of each season, reflecting their true starting strength.
10) Automated Adjustments: Develop an automated system for handling tie-breaker systems and play-in results to reduce human error and ensure consistency
11) Advanced Prediction Methods: Incorporate machine learning techniques to enhance outcome predictions. Techniques such as regression analysis, decision trees, or neural networks could be used to improve the accuracy of series outcome predictions based on a wider array of variables
12) Take more time to find better ways to be more efficent with my code
# Load team colors database
temp <- teamcolors
# Renaming LA Clippers in team_db to Los Angeles Clippers to match temp df
team_logos <- team_db %>%
mutate(off_team_name = ifelse(off_team_name == "LA Clippers", "Los Angeles Clippers", off_team_name))
# Merge data and team colors
team_plot <- merge(team_logos, temp, by.x = "off_team_name", by.y = "name" , all.x = TRUE)
library(dplyr)
library(ggplot2)
library(gganimate)
library(grid)
library(png)
# Function to simulate a playoff series
sim_series <- function(team_1, team_2, team_db) {
series_res <- data.frame(game_res = rep(NA, 7),
game_win_prob = rep(NA, 7),
game_sim_val = rep(NA, 7))
stop <- FALSE
i <- 0
team_1_wins <- 0
team_2_wins <- 0
# Get Elo ratings
team_1_elo <- team_db$elo[team_db$teams == team_1]
team_2_elo <- team_db$elo[team_db$teams == team_2]
while (!stop && i < 7) {
i <- i + 1
# Calculate win probability for team_1
series_res$game_win_prob[i] <- elo.prob(team_1_elo, team_2_elo)
# Simulate game outcome
series_res$game_sim_val[i] <- runif(1, min = 0, max = 1)
if (series_res$game_sim_val[i] <= series_res$game_win_prob[i]) {
series_res$game_res[i] <- 1
team_1_wins <- team_1_wins + 1
} else {
series_res$game_res[i] <- 0
team_2_wins <- team_2_wins + 1
}
if (team_1_wins == 4 || team_2_wins == 4) {
stop <- TRUE
}
}
# Determine winner and loser
if (team_1_wins == 4) {
winner <- team_1
loser <- team_2
series_win_prob <- mean(series_res$game_win_prob[series_res$game_res == 1], na.rm = TRUE)
} else {
winner <- team_2
loser <- team_1
series_win_prob <- mean(1 - series_res$game_win_prob[series_res$game_res == 0], na.rm = TRUE)
}
num_games <- i
return(list(winner = winner,
loser = loser,
series_res = series_res,
num_games = num_games,
series_win_prob = series_win_prob))
}
# Function to run the simulation and save the results
run_simulation <- function(team1, team2, team_db, round_num) {
result <- sim_series(team_1 = team1, team_2 = team2, team_db = team_db)
return(data.frame(
team1 = team1,
team2 = team2,
winner = result$winner,
loser = result$loser,
win_prob = ifelse(result$winner == team1, result$series_win_prob, 1 - result$series_win_prob),
round = round_num
))
}
# Simulate the playoffs and save the results
results <- list()
# Round 1
results[[1]] <- run_simulation("BOS", "MIA", team_db_23, 1)
results[[2]] <- run_simulation("CLE", "ORL", team_db_23, 1)
results[[3]] <- run_simulation("IND", "MIL", team_db_23, 1)
results[[4]] <- run_simulation("NYK", "PHI", team_db_23, 1)
results[[5]] <- run_simulation("DAL", "LAC", team_db_23, 1)
results[[6]] <- run_simulation("LAL", "DEN", team_db_23, 1)
results[[7]] <- run_simulation("PHX", "MIN", team_db_23, 1)
results[[8]] <- run_simulation("OKC", "NOP", team_db_23, 1)
# Round 2
results[[9]] <- run_simulation("BOS", "CLE", team_db_23, 2)
results[[10]] <- run_simulation("IND", "PHI", team_db_23, 2)
results[[11]] <- run_simulation("DAL", "OKC", team_db_23, 2)
results[[12]] <- run_simulation("MIN", "LAL", team_db_23, 2)
# Round 3
results[[13]] <- run_simulation("IND", "BOS", team_db_23, 3)
results[[14]] <- run_simulation("LAL", "DAL", team_db_23, 3)
# Finals
results[[15]] <- run_simulation("LAL", "IND", team_db_23, 4)
# Combine all results into a single data frame
playoff_results <- do.call(rbind, results)
# Merge playoff results with team logos
playoff_results <- playoff_results %>%
left_join(team_plot, by = c("team1" = "teams")) %>%
rename(team1_logo = logo) %>%
left_join(team_plot, by = c("team2" = "teams")) %>%
rename(team2_logo = logo)
# Prepare data for plotting
plot_data <- playoff_results %>%
pivot_longer(cols = c(team1, team2), names_to = "team_type", values_to = "teams") %>%
mutate(win_prob = ifelse(team_type == "team1", win_prob, 1 - win_prob),
logo = ifelse(team_type == "team1", team1_logo, team2_logo)) %>%
arrange(desc(win_prob))
# Plot for win% of that round (% chance of advancing to next round - or winning Finals for round 4)
plot_data_2 <- data.frame(teams = c(plot_data$winner, plot_data$loser),
win_prob = c(plot_data$win_prob, (1-plot_data$win_prob)),
round = c(plot_data$round, plot_data$round),
logo = c(plot_data$team1_logo, plot_data$team2_logo))
plot_data_2 <- plot_data_2 %>%
distinct(teams, round, .keep_all = TRUE)
# Create the animated plot
p <- ggplot(plot_data_2, aes(x = teams, y = win_prob)) +
geom_image(aes(image = logo), size = 0.1) +
scale_y_continuous(labels = scales::percent, limits = c(0, 1)) +
labs(x = "Teams", y = "Win Probability", title = "Win Probabilities by Round", subtitle = "Round: {closest_state}") +
theme_minimal() +
transition_states(round, transition_length = 2, state_length = 3) +
enter_fade() +
exit_fade()
# Animate and save the plot
animate(p, fps = 10)
Find two teams that had a competitive window of 2 or more consecutive seasons making the playoffs and that under performed your model’s expectations for them, losing series they were expected to win. Why do you think that happened? Classify one of them as bad luck and one of them as relating to a cause not currently accounted for in your model. If given more time and data, how would you use what you found to improve your model?
ANSWER :
One team that was projected to go far in the playoffs for 2 consecutive years according to the model that under performed in real-life was the Portland Trailblazers during the 2018-19 & 2019-20 seasons. According to the model, the Trailblazers were projected to win the NBA Finals in 2018-19 but they got swept by the Golden State Warriors in the Western Conference Finals (Round 3) and the Trailblazers were projected to make it to the Western Conference Finals in 2019-20 but lost to the LA Lakers in the 1st Round.
I think the Trailblazers lack of success in real life in comparison to the model was mostly due to bad luck, particularly injuries. In 2018, Damian Lillard suffered a separated rib in Game 2 of the Western Conference Finals and was not at 100% for the rest of the series. Lillard averaged 33 ppg on 46% FG & 48% 3P in the 1st Round but only averaged 22.3 ppg on 37% FG and 37% 3P in the Conference Finals. It was a devastating blow for the Trailblazers not to have their star player be at full health against a great Warriors team. Also, the Trailblazers were missing their starting center Jusuf Nurkic that series (and the entire playoffs) due to a broken leg that took him out for the season. Nurkic was averaging more than 15 points & 10 rebounds on 51% FG and he would have been a significant addition in that series.
In 2019, Damian Lillard got hurt again in the playoffs - this time it was his knee in Game 4 of the 1st Round. The Trailblazers had won Game 1 and barely lost Game 3 which would have put them up 2-1 and if Damian Lillard doesn’t go down in Game 4, they possibly win that game putting them up 3-1. Also, they only lost by single-digits in Game 5 without Lillard and with him in the lineup, they easily could’ve won that game, and possibly win the series 4-1 or clinch in 6 or 7 games.
Another team that was projected to go far in the playoffs for 2 consecutive years according to the model that under performed in real-life was the Boston Celtics in 2014-15 & 2015-16. The model had the Celtics going to the Finals both seasons but they lost in the 1st Round to the Cavs in 2014-15 and lost to the Hawks in the 1st Round against the Hawks.
I believe that this discrepancy between the model and real-life is due to how the model works.The model does not incorporate player stats, thus does not account for star power. In the NBA playoffs, it is important to have star players that can carry the load, especially offensively, as the rotations become tighter and the opposing defenses’ intensity rise and game planning improves; it is incumbent on the team’s stars to shine through and deliver. The Boston Celtics during this run, only had 1 All-Star caliber player in Isaiah Thomas which typically isn’t enough to be a Finals contender. I think the model overrated their regular season play (ELO rating) since they were a well-rounded team with a great coach in Brad Stevens and they won the games they didn’t lose to many bad teams (which would have significantly dropped their ELO) and had a wins against really good teams (significantly boosted their ELO).
Taking a look at the 2018-2019 TrailBlazers and the 2014-2015 Boston Celtics shows some of the weaknesses of the model mentioned earlier and makes it clear that the model does not account for injuries and level of talent (star power) on a team. In the future, I would want to incorporate a team’s health into the playoff simulation since it is such an important factor in a team winning a championship. I would have have a column that keeps track of availability of players and have that as a component of the simulation. Also, I would want to incorporate a team’s level of talent and give more favorable odds to teams with more All-Stars and players with high stats, especially points.To win championship, teams typically need at least 2 star players. Teams with good players and coaching may be able to have success in the regular season, such as the 60 win Atlanta Hawks in 2014-15, but not having at least 2 players that are All-NBA caliber players drastically decreases a team’s chances of winning a ring. Also, I might incorporate a team’s playoff experience in the model because young teams without deep playoff runs rarely win the championship. It’s typically teams who have players who have played in the Finals (or at least have had a couple of years making deep runs) that win. Take the Boston Celtics, for example; they lost in 2022 against the Warriors even though they had a more talented team and had over an 80% chance of winning according to ESPN’s BPI model. I think the experience of GSW’s main players like Curry and Draymond had a lot to do with it. They had been there and done that and were able to stay on course even after letting up Game 1 at home and being down 2-1 a the start of the series. And fast forward to this past Finals, the Celtics were able to beat the Mavericks and the Celtics core players had more experience in the playoffs and Finals compared to Dallas. Going up against the best teams (players and coaches) in the league that are specifically game-planning for you and playing in the brightest of lights where the pressure is on, you need players who know what to do in high-pressure situations, which comes with experience.