Introduction

The purpose of this project is to gauge your technical skills and problem solving ability by working through something similar to a real NBA data science project. You will work your way through this R Markdown document, answering questions as you go along. Please begin by adding your name to the “author” key in the YAML header. When you’re finished with the document, come back and type your answers into the answer key at the top. Please leave all your work below and have your answers where indicated below as well. Please note that we will be reviewing your code so make it clear, concise and avoid long printouts. Feel free to add in as many new code chunks as you’d like.

Remember that we will be grading the quality of your code and visuals alongside the correctness of your answers. Please try to use the tidyverse as much as possible (instead of base R and explicit loops). Please do not bring in any outside data.

Note:

Throughout this document, any season column represents the year each season started. For example, the 2015-16 season will be in the dataset as 2015. For most of the rest of the project, we will refer to a season by just this number (e.g. 2015) instead of the full text (e.g. 2015-16).

Answers

library(tidyverse)

# Importing the player & team data
library(readr)
team_game_data <- read_csv("team_game_data.csv")
player_game_data <- read_csv("player_game_data.csv")

# Converting from tibble to dataframe to make 
team_game_data <- as.data.frame(team_game_data)
player_game_data <- as.data.frame(player_game_data)

Part 1

Question 1:

Offensive: 56.3% eFG
Defensive: 47.9% eFG

Question 2: 40.9%

Question 3: 23.1%

Question 4: This is a written question. Please leave your response in the document under Question 5.

Question 5: 83.5% of games

Question 6:

Round 1: 84.7%
Round 2: 63.9%
Conference Finals: 55.6%
Finals: 77.8%

Question 7:

Percent of +5.0 net rating teams making the 2nd round next year: 63.6%
Percent of top 5 minutes played players who played in those 2nd round series: 78.1%

Part 2

Please show your work in the document, you don’t need anything here.

Part 3

Please write your response in the document, you don’t need anything here.

Setup and Data

Part 1 – Data Cleaning

In this section, you’re going to work to answer questions using data from both team and player stats. All provided stats are on the game level.

Question 1

QUESTION: What was the Warriors’ Team offensive and defensive eFG% in the 2015-16 regular season? Remember that this is in the data as the 2015 season.

# According to (Source - https://www.breakthroughbasketball.com/stats/effective-field-goal-percentage.html), eFG% = (2pt FGM + 1.5*3pt FGM) / FGA. 

# Calculating offensive eFG% for the GSW Warriors in the regular season (gametype == 2) for the 2015 season
GSW_off_efg <- team_game_data %>% 
  filter(gametype == 2 & season == 2015 & off_team == "GSW") %>%
  summarise(off_efg = round((sum(fg2made) + 1.5*sum(fg3made))/sum(fgattempted)*100, 1))

# Calculating defensive eFG% for the GSW Warriors in the regular season for the 2015 season
GSW_def_efg <- team_game_data %>% 
  filter(gametype == 2 & season == 2015 & def_team == "GSW") %>%
  summarise(def_efg = round((sum(fg2made) + 1.5*sum(fg3made))/sum(fgattempted)*100, 1))

GSW_off_efg # GSW Warriors' offensive eFG% for the 2015 season

##   off_efg
## 1    56.3

GSW_def_efg # GSW Warriors' defensive eFG% for the 2015 season

##   def_efg
## 1    47.9

ANSWER 1:

Offensive: 56.3% eFG
Defensive: 47.9% eFG

Question 2

QUESTION: What percent of the time does the team with the higher eFG% in a given game win that game? Use games from the 2014-2023 regular seasons. If the two teams have an exactly equal eFG%, remove that game from the calculation.

# Making a df that contains a offensive eFG% column for the 2014-2023 regular seasons and putting them in order by nbagameid so that teams that played in the same game are next to each other - ordering them this way will be important for the next code chunk where defensive eFG% will be calculated using swap_pair_of_stats function

team_game_data_2 <- team_game_data %>%
  filter(gametype == 2) %>%
  mutate(off_efg = round(((fg2made + 1.5*fg3made)/fgattempted*100), 1)) %>%
  arrange(nbagameid)

# creating def_efg column by taking the off_efg of the team they faced (team with the same nbagameid) using swap_pair_of_stats function

swap_pair_of_stats <- function(x) {
  for (i in seq(1, length(x) - 1, by = 2)) {
    temp <- x[i]
    x[i] <- x[i + 1]
    x[i + 1] <- temp
  }
  return(x)
}

team_game_data_2$def_efg <- swap_pair_of_stats(team_game_data_2$off_efg)

# Calculating the winning percentage of teams that had the higher eFG% in the game they played while excluding games where teams had the same eFG%

team_game_data_2 %>% 
  filter(off_efg != def_efg) %>%
  summarise(efg_winner = round(100*sum((off_win == 1 & off_efg > def_efg))/n(),1))

##   efg_winner
## 1       40.9

ANSWER 2:

40.9X%

Question 3

QUESTION: What percent of the time does the team with more offensive rebounds in a given game win that game? Use games from the 2014-2023 regular seasons. If the two teams have an exactly equal number of offensive rebounds, remove that game from the calculation.

# creating a new df that includes the number of offensive rebounds a team conceded while they were on defense by taking the reboffensive from the team with the same nbagameid using swap_pair_of_stats function for the 2014-2023 regular seasons (similar process to question #2 above)

team_game_data_3 <- team_game_data %>%
  filter(gametype == 2) %>%
  arrange(nbagameid) %>%
  mutate(offreb_conceded = swap_pair_of_stats(reboffensive), .after = reboffensive)

# Calculating the win percentage of teams that had more offensive rebounds than the team they faced & excluding games where teams had the same amount of offensive rebounds 

team_game_data_3 %>% 
  filter(reboffensive != offreb_conceded) %>%
  summarise(offreb_winner = round(100*sum((off_win == 1 & reboffensive > offreb_conceded))/n(),1))

##   offreb_winner
## 1          23.1

ANSWER 3:

23.1%

Question 4

QUESTION: Do you have any theories as to why the answer to question 3 is lower than the answer to question 2? Try to be clear and concise with your answer.

ANSWER 4:

To win a basketball game, you need to score more points than your opponent, and effective field goal percentage (eFG%) is more directly correlated to scoring than offensive rebounds. Having a higher eFG% means you will score more than your opponents if you’re taking around the same number of shots and making around the same number of free throws. In contrast, grabbing more offensive rebounds than your opponent only provides additional possessions but does not guarantee more points since you can still miss on the following attempt. Also, the pace of the game has significantly increased and teams are already getting a good quantity of possessions/shot attempts so getting a couple of extra possessions from offensive rebounds isn’t as important; having higher quality possessions and being more efficient, especially on 3-pointers, is more advantageous and eFG% accounts for added value of 3-pointers by giving them more weight than 2-pointers. Making 3-pointers have become an instrumental factor in winning games over the past decade which is better reflected in eFG% than offensive rebounds.

Question 5

QUESTION: Look at players who played at least 25% of their possible games in a season and scored at least 25 points per game played. Of those player-seasons, what percent of games were they available for on average? Use games from the 2014-2023 regular seasons.

For example:

Ja Morant does not count in the 2023-24 season, as he played just 9 out of 82 games this year, even though he scored 25.1 points per game.
Chet Holmgren does not count in the 2023-24 season, as he played all 82 games this year but scored 16.5 points per game.
LeBron James does count in the 2023-24 season, as he played 71 games and scored 25.7 points per game.

# Calculating the total number of games played by each team in each season
reg_games <- player_game_data %>%
  filter(gametype == 2) %>%
  group_by(team, season) %>%
  summarise(total_games = sum(starter) / 5)

## `summarise()` has grouped output by 'team'. You can override using the
## `.groups` argument.

head(reg_games)

## # A tibble: 6 × 3
## # Groups:   team [1]
##   team  season total_games
##   <chr>  <dbl>       <dbl>
## 1 ATL     2014          82
## 2 ATL     2015          82
## 3 ATL     2016          82
## 4 ATL     2017          82
## 5 ATL     2018          82
## 6 ATL     2019          67

# Calculating the percentage of games each player was available for while calculating the number of games they played, their ppg that season and the number of games they were unavailable 
player_availability <- player_game_data %>%
  filter(gametype == 2) %>%
  group_by(player_name, team, season) %>%
  summarise(
    games_played = sum(seconds > 0),
    points_per_game = ifelse(games_played > 0, sum(points)/games_played, 0),
    total_missed = sum(missed)
  ) %>%
  left_join(reg_games, by = c("team", "season")) %>%
  mutate(availability = (total_games - total_missed) / total_games)

## `summarise()` has grouped output by 'player_name', 'team'. You can override
## using the `.groups` argument.

head(player_availability)

## # A tibble: 6 × 8
## # Groups:   player_name, team [4]
##   player_name team  season games_played points_per_game total_missed total_games
##   <chr>       <chr>  <dbl>        <int>           <dbl>        <dbl>       <dbl>
## 1 A.J. Green  MIL     2022           35            4.4            11          82
## 2 A.J. Green  MIL     2023           56            4.5             4          82
## 3 A.J. Hammo… DAL     2016           22            2.18            1          82
## 4 A.J. Hammo… MIA     2017            0            0              55          82
## 5 A.J. Lawson DAL     2022           14            3.86            5          82
## 6 A.J. Lawson DAL     2023           42            3.24            0          82
## # ℹ 1 more variable: availability <dbl>

# Filtering players who played at least 25% of their team's games and scored at least 25 points per game
player_game_data_5 <- player_availability %>%
  filter(
    points_per_game >= 25,
    games_played >= 0.25 * total_games
  )

# View the resulting data
head(player_game_data_5)

## # A tibble: 6 × 8
## # Groups:   player_name, team [3]
##   player_name team  season games_played points_per_game total_missed total_games
##   <chr>       <chr>  <dbl>        <int>           <dbl>        <dbl>       <dbl>
## 1 Anthony Da… LAL     2019           62            26.1            9          71
## 2 Anthony Da… LAL     2022           56            25.9           25          82
## 3 Anthony Da… NOP     2016           75            28.0            7          82
## 4 Anthony Da… NOP     2017           75            28.1            7          82
## 5 Anthony Da… NOP     2018           56            25.9           25          82
## 6 Anthony Ed… MIN     2023           79            25.9            3          82
## # ℹ 1 more variable: availability <dbl>

# Getting the percentage of games that players who played 25% of their teams games while averaging at least 25 points per game were available for 
round(mean(player_game_data_5$availability)*100, 1)

## [1] 83.5

ANSWER 5:

83.5% of games

Question 6

QUESTION: What % of playoff series are won by the team with home court advantage? Give your answer by round. Use playoffs series from the 2014-2022 seasons. Remember that the 2023 playoffs took place during the 2022 season (i.e. 2022-23 season).

# Creating a df that has playoff data for each team for the 2014-2022 seasons
playoffs_6 <- team_game_data %>% 
  filter(gametype == 4, season %in% 2014:2022) %>%
  arrange(nbagameid, gamedate)

# Creating a function to extract the digits from a number - will be used on nbagameid to create columns for the round, and number game in the series for each playoff game 
extract_digit <- function(number, position) {
  # Convert the number to a character string
  number_str <- as.character(abs(number))
  
  # Reverse the string to simplify extraction from right to left
  number_str <- rev(strsplit(number_str, "")[[1]])
  
  # Check if the position is valid
  if(position > length(number_str) || position < 1) {
    return(NA) # Return NA if the position is out of bounds
  }
  
  # Extract the digit at the specified position
  digit <- number_str[position]
  
  # Convert the digit back to a numeric value
  digit <- as.numeric(digit)
  
  return(digit)
}

# Creating a round column based on nbagameid to know which round in the playoffs each game is played
playoffs_6$round <- sapply(playoffs_6$nbagameid, extract_digit, position = 3)

# Creating a series column based on nbagameid to know which series each game is played
playoffs_6 <- playoffs_6 %>%
  mutate(series = if_else(nbagameid < 0, 
                             -as.numeric(substr(as.character(abs(nbagameid)), 1, nchar(as.character(abs(nbagameid))) - 1)), 
                             as.numeric(substr(as.character(nbagameid), 1, nchar(as.character(nbagameid)) - 1))))

# Creating a game column based on nbagameid to know which game in the series each game is played
playoffs_6$game <- sapply(playoffs_6$nbagameid, extract_digit, position = 1)

library(dplyr)



# Identifying the home-court team for each series based on who the home team is for game 1 in the series since the team with homecourt advantage in the series always plays at home 
home_court_teams <- playoffs_6 %>%
  filter(game == 1 & off_home == 1) %>%
  select(series, off_team) %>%
  rename(home_court_team = off_team)

# Determining the series winners
series_max_game <- playoffs_6 %>%
  group_by(series) %>%
  summarise(max_game = max(game)) %>%
  ungroup()

series_winners <- playoffs_6 %>%
  inner_join(series_max_game, by = "series") %>%
  filter(game == max_game & off_win == 1) %>%
  select(series, off_team) %>%
  rename(series_winner = off_team)

# Merging to identify wins by home-court teams
home_court_wins <- home_court_teams %>%
  inner_join(series_winners, by = "series") %>%
  mutate(home_court_win = home_court_team == series_winner)

# Calculating the percentage of wins by home-court teams by round
series_rounds <- playoffs_6 %>%
  distinct(series, round)

home_court_win_percentage <- home_court_wins %>%
  inner_join(series_rounds, by = "series") %>%
  group_by(round) %>%
  summarise(home_court_win_percentage = round(mean(home_court_win) * 100, 1))

# Percentage of teams with homecourt advantage that won the series from the 2014-2022 playoffs by round
home_court_win_percentage

## # A tibble: 4 × 2
##   round home_court_win_percentage
##   <dbl>                     <dbl>
## 1     1                      84.7
## 2     2                      63.9
## 3     3                      55.6
## 4     4                      77.8

ANSWER 6:

Round 1: 84.7%
Round 2: 63.9%
Conference Finals: 55.6%
Finals: 77.8%

Question 7

QUESTION: Among teams that had at least a +5.0 net rating in the regular season, what percent of them made the second round of the playoffs the following year? Among those teams, what percent of their top 5 total minutes played players (regular season) in the +5.0 net rating season played in that 2nd round playoffs series? Use the 2014-2021 regular seasons to determine the +5 teams and the 2015-2022 seasons of playoffs data.

For example, the Thunder had a better than +5 net rating in the 2023 season. If we make the 2nd round of the playoffs next season (2024-25), we would qualify for this question. Our top 5 minutes played players this season were Shai Gilgeous-Alexander, Chet Holmgren, Luguentz Dort, Jalen Williams, and Josh Giddey. If three of them play in a hypothetical 2nd round series next season, it would count as 3/5 for this question.

Hint: The definition for net rating is in the data dictionary.

# Creating a new df that has team stats and adding 2 additional columns: 1) includes the number of points a team allowed while they were on defense (points scored by the other team with the same nbagameid) and 2) includes the number of defensive possessions (# of offensive possessions the other team with the same nbagameid had) a team had for each regular season from 2014-2021

team_game_data_7 <- team_game_data %>%
  filter(gametype == 2 & season %in% 2014:2021) %>%
  arrange(nbagameid) %>%
  mutate(points_allowed = swap_pair_of_stats(points)) %>%
  mutate(def_possessions = swap_pair_of_stats(possessions))

# Adding columns to include offensive rating, defensive rating and net rating to the df made in the chunk above

team_game_data_7 <- team_game_data_7 %>%
  mutate(off_rating = points/(possessions/100)) %>%
  mutate(def_rating = points_allowed/(def_possessions/100)) %>%
  mutate(net_rating = off_rating - def_rating)

# Getting the net ratings for all 30 teams for each regular season from 2014-2021
team_game_data_7_2 <- team_game_data_7 %>%
  group_by(off_team, season) %>%
  summarise(mean_net_rating = mean(net_rating, na.rm = TRUE)) %>%
  ungroup()

## `summarise()` has grouped output by 'off_team'. You can override using the
## `.groups` argument.

head(team_game_data_7_2)

## # A tibble: 6 × 3
##   off_team season mean_net_rating
##   <chr>     <dbl>           <dbl>
## 1 ATL        2014            6.21
## 2 ATL        2015            3.62
## 3 ATL        2016           -1.21
## 4 ATL        2017           -5.79
## 5 ATL        2018           -5.93
## 6 ATL        2019           -7.41

# Filtering to get the teams with a +5 net rating each regular season from 2014-2021

plus5_net_rating <- team_game_data_7_2 %>%
  filter(mean_net_rating > 5)

plus5_net_rating <- as.data.frame(plus5_net_rating)

# Getting the teams that made it to the 2nd round of the playoffs each season from 2015-2022 using the team playoff df from question 6 

unique_teams_second_round <- playoffs_6 %>%
  filter(season != 2014 & round == 2) %>%
  distinct(off_team, season)

head(unique_teams_second_round)

##   off_team season
## 1      ATL   2015
## 2      CLE   2015
## 3      MIA   2015
## 4      TOR   2015
## 5      GSW   2015
## 6      POR   2015

# Adding next_season in net_ratings to be one year ahead to align with the following year's playoffs
net_ratings_next_year <- plus5_net_rating %>%
  mutate(next_season = season + 1)

head(net_ratings_next_year)

##   off_team season mean_net_rating next_season
## 1      ATL   2014        6.207150        2015
## 2      BOS   2019        6.370748        2020
## 3      BOS   2021        7.783639        2022
## 4      CLE   2015        6.324848        2016
## 5      GSW   2014       10.629787        2015
## 6      GSW   2015       10.766214        2016

# Performing the join operation to check if the same team made the playoffs the next year
teams_made_second_round_next_year <- unique_teams_second_round %>%
  inner_join(net_ratings_next_year, by = c("off_team", "season" = "next_season"))

# Getting the teams with +5 net rating during the 2014-2021 regular seasons that made the 2nd round of the playoffs the following year while naming the reg_season they had the +5 net rating and playoff_season as the season they made it to the 2nd round 
teams_made_second_round_next_year <- teams_made_second_round_next_year %>%
  rename(reg_season = season.y) %>%
  rename(playoff_season = season)

teams_made_second_round_next_year

##    off_team playoff_season reg_season mean_net_rating
## 1       ATL           2015       2014        6.207150
## 2       GSW           2015       2014       10.629787
## 3       SAS           2015       2014        6.464015
## 4       CLE           2016       2015        6.324848
## 5       GSW           2016       2015       10.766214
## 6       SAS           2016       2015       11.094909
## 7       HOU           2017       2016        5.687542
## 8       GSW           2017       2016       11.786450
## 9       TOR           2018       2017        7.990256
## 10      GSW           2018       2017        6.272020
## 11      HOU           2018       2017        8.570834
## 12      MIL           2019       2018        8.599643
## 13      TOR           2019       2018        5.726189
## 14      MIL           2020       2019        9.631936
## 15      LAC           2020       2019        6.485266
## 16      PHI           2021       2020        5.675074
## 17      MIL           2021       2020        5.564283
## 18      PHX           2021       2020        5.978664
## 19      BOS           2022       2021        7.783639
## 20      PHX           2022       2021        7.666856
## 21      GSW           2022       2021        5.676754

# Calculating the percentage of teams with a +5 net rating during the 2014-2021 regular seasons that made it to the 2nd round of the playoffs the next year 

round(nrow(teams_made_second_round_next_year)/nrow(plus5_net_rating)*100, 1)

## [1] 63.6

# Creating a df that has the regular season stats of the players on the teams with +5 net rating from the 2014-2021 regular seasons that made it to the 2nd round the next year

player_game_data_7  <- player_game_data %>%
  filter(gametype == 2) %>%
  semi_join(teams_made_second_round_next_year, by = c("team" = "off_team", "season" = "reg_season"))

# Getting the players who were top 5 in minutes for their team (only teams with +5 net rating from 2014-2021 regular seasons)

top_5_minutes <- player_game_data_7 %>%
  group_by(team, season, player_name) %>%
  summarise(minutes = sum(seconds/60)) %>%
  slice_max(minutes, n = 5) %>%
  rename(reg_season = season) %>%
  mutate(playoff_season = reg_season + 1)

## `summarise()` has grouped output by 'team', 'season'. You can override using
## the `.groups` argument.

# Creating a df that has the playoff stats of the players on the teams with +5 net rating from the 2014-2021 regular seasons that made it to the 2nd round the next year

player_playoff_data_7 <- player_game_data %>%
  mutate(round = sapply(player_game_data$nbagameid, extract_digit, position = 3)) %>%
  filter(gametype == 4, round == 2) %>%
  semi_join(teams_made_second_round_next_year, by = c("team" = "off_team", "season" = "playoff_season"))

top_5_minutes_next_playoffs <- inner_join(top_5_minutes, player_playoff_data_7, by = c("player_name", "team", "playoff_season" = "season"))

# Making a df of the players who played in the 2nd round 

top_5_minutes_2nd_round <-  top_5_minutes_next_playoffs %>%
  group_by(player_name, team, playoff_season) %>%
  filter(sum(seconds) > 0)

top_5_minutes_2nd_round <- as.data.frame(top_5_minutes_2nd_round)

# Getting the distinct players for each team/season 
top_5_minutes_2nd_round <- top_5_minutes_2nd_round %>%
  distinct(player_name, team, playoff_season)

# The percent of top 5 minutes played players who played in those 2nd round series
round(nrow(top_5_minutes_2nd_round)/nrow(top_5_minutes)*100, 1)

## [1] 78.1

ANSWER 7:

Percent of +5.0 net rating teams making the 2nd round next year: 63.6%
Percent of top 5 minutes played players who played in those 2nd round series: 78.1%

Part 2 – Playoffs Series Modeling

For this part, you will work to fit a model that predicts the winner and the number of games in a playoffs series between any given two teams.

This is an intentionally open ended question, and there are multiple approaches you could take. Here are a few notes and specifications:

Your final output must include the probability of each team winning the series. For example: “Team A has a 30% chance to win and team B has a 70% chance.” instead of “Team B will win.” You must also predict the number of games in the series. This can be probabilistic or a point estimate.
You may use any data provided in this project, but please do not bring in any external sources of data.
You can only use data available prior to the start of the series. For example, you can’t use a team’s stats from the 2016-17 season to predict a playoffs series from the 2015-16 season.
The best models are explainable and lead to actionable insights around team and roster construction. We’re more interested in your thought process and critical thinking than we are in specific modeling techniques. Using smart features is more important than using fancy mathematical machinery.
Include, as part of your answer:

A brief written overview of how your model works, targeted towards a decision maker in the front office without a strong statistical background.
What you view as the strengths and weaknesses of your model.
How you’d address the weaknesses if you had more time and/or more data.
Apply your model to the 2024 NBA playoffs (2023 season) and create a high quality visual (a table, a plot, or a plotly) showing the 16 teams’ (that made the first round) chances of advancing to each round.

library(ggplot2)
library(teamcolors)
library(ggdark)
library(ggimage)
source("elo_funcs.r")

# Making reg_data_year dfs to contain regular season team data for each season from 2014-2023
reg_data_14 <- team_game_data %>%
  filter(season == 2014, gametype == 2) %>%
  arrange(nbagameid, gamedate)

reg_data_15 <- team_game_data %>%
  filter(season == 2015, gametype == 2) %>%
  arrange(nbagameid, gamedate)

reg_data_16 <- team_game_data %>%
  filter(season == 2016, gametype == 2) %>%
  arrange(nbagameid, gamedate)

reg_data_17 <- team_game_data %>%
  filter(season == 2017, gametype == 2) %>%
  arrange(nbagameid, gamedate)

reg_data_18 <- team_game_data %>%
  filter(season == 2018, gametype == 2) %>%
  arrange(nbagameid, gamedate)

reg_data_19 <- team_game_data %>%
  filter(season == 2019, gametype == 2) %>%
  arrange(nbagameid, gamedate)

reg_data_20 <- team_game_data %>%
  filter(season == 2020, gametype == 2) %>%
  arrange(nbagameid, gamedate)

reg_data_21 <- team_game_data %>%
  filter(season == 2021, gametype == 2) %>%
  arrange(nbagameid, gamedate)

reg_data_22 <- team_game_data %>%
  filter(season == 2022, gametype == 2) %>%
  arrange(nbagameid, gamedate)

reg_data_23 <- team_game_data %>%
  filter(season == 2023, gametype == 2) %>%
  arrange(nbagameid, gamedate)

# Making gam_res df that has specific columns from reg_data above 

game_res_14 <- reg_data_14[,c("season", "gamedate","nbagameid","off_team", "off_team_name","def_team", "def_team_name", "off_win")]

game_res_15 <- reg_data_15[,c("season", "gamedate","nbagameid","off_team", "off_team_name","def_team", "def_team_name", "off_win")]

game_res_16 <- reg_data_16[,c("season", "gamedate","nbagameid","off_team", "off_team_name","def_team", "def_team_name", "off_win")]

game_res_17 <- reg_data_17[,c("season", "gamedate","nbagameid","off_team", "off_team_name","def_team", "def_team_name", "off_win")]

game_res_18 <- reg_data_18[,c("season", "gamedate","nbagameid","off_team", "off_team_name","def_team", "def_team_name", "off_win")]

game_res_19 <- reg_data_19[,c("season", "gamedate","nbagameid","off_team", "off_team_name","def_team", "def_team_name", "off_win")]

game_res_20 <- reg_data_20[,c("season", "gamedate","nbagameid","off_team", "off_team_name","def_team", "def_team_name", "off_win")]

game_res_21 <- reg_data_21[,c("season", "gamedate","nbagameid","off_team", "off_team_name","def_team", "def_team_name", "off_win")]

game_res_22 <- reg_data_22[,c("season", "gamedate","nbagameid","off_team", "off_team_name","def_team", "def_team_name", "off_win")]

game_res_23 <- reg_data_23[,c("season", "gamedate","nbagameid","off_team", "off_team_name","def_team", "def_team_name", "off_win")]

# Getting the unique teams from each year

teams_14 <- unique(game_res_14[,c("off_team", "off_team_name" )])
teams_15 <- unique(game_res_14[,c("off_team", "off_team_name" )])
teams_16 <- unique(game_res_14[,c("off_team", "off_team_name" )])
teams_17 <- unique(game_res_14[,c("off_team", "off_team_name" )])
teams_18 <- unique(game_res_14[,c("off_team", "off_team_name" )])
teams_19 <- unique(game_res_14[,c("off_team", "off_team_name" )])
teams_20 <- unique(game_res_14[,c("off_team", "off_team_name" )])
teams_21 <- unique(game_res_14[,c("off_team", "off_team_name" )])
teams_22 <- unique(game_res_14[,c("off_team", "off_team_name" )])
teams_23 <- unique(game_res_14[,c("off_team", "off_team_name" )])

# Create data frame of teams and a column of 1500 
team_db_14 <- cbind.data.frame(teams_14, rep(1500, nrow(teams_14)))
# Name second column Elo
names(team_db_14)[c(1,3)] <- c("teams" ,"elo")

team_db_15 <- cbind.data.frame(teams_15, rep(1500, nrow(teams_15)))
# Name second column Elo
names(team_db_15)[c(1,3)] <- c("teams" ,"elo")

team_db_16 <- cbind.data.frame(teams_16, rep(1500, nrow(teams_16)))
# Name second column Elo
names(team_db_16)[c(1,3)] <- c("teams" ,"elo")

team_db_17 <- cbind.data.frame(teams_17, rep(1500, nrow(teams_17)))
# Name second column Elo
names(team_db_17)[c(1,3)] <- c("teams" ,"elo")

team_db_18 <- cbind.data.frame(teams_18, rep(1500, nrow(teams_18)))
# Name second column Elo
names(team_db_18)[c(1,3)] <- c("teams" ,"elo")

team_db_19 <- cbind.data.frame(teams_19, rep(1500, nrow(teams_19)))
# Name second column Elo
names(team_db_19)[c(1,3)] <- c("teams" ,"elo")

team_db_20 <- cbind.data.frame(teams_20, rep(1500, nrow(teams_20)))
# Name second column Elo
names(team_db_20)[c(1,3)] <- c("teams" ,"elo")

team_db_21 <- cbind.data.frame(teams_21, rep(1500, nrow(teams_21)))
# Name second column Elo
names(team_db_21)[c(1,3)] <- c("teams" ,"elo")

team_db_22 <- cbind.data.frame(teams_22, rep(1500, nrow(teams_22)))
# Name second column Elo
names(team_db_22)[c(1,3)] <- c("teams" ,"elo")

team_db_23 <- cbind.data.frame(teams_23, rep(1500, nrow(teams_23)))
# Name second column Elo
names(team_db_23)[c(1,3)] <- c("teams" ,"elo")

# Define the list of team_db and game_res data frames
team_db_list <- list(team_db_14, team_db_15, team_db_16, team_db_17,
                     team_db_18, team_db_19, team_db_20, team_db_21,
                     team_db_22, team_db_23)
game_res_list <- list(game_res_14, game_res_15, game_res_16, game_res_17,
                      game_res_18, game_res_19, game_res_20, game_res_21,
                      game_res_22, game_res_23)

# Loop through each pair of team_db and game_res data frames
for (i in seq_along(team_db_list)) {
  team_db <- team_db_list[[i]]
  game_res <- game_res_list[[i]]
  
  for (j in 1:nrow(game_res)) {
    # Extract match
    match <- game_res[j, ]
    
    # Extract team 1 Elo
    team1_elo <- team_db$elo[team_db$teams == match$off_team]
    # Extract team 2 Elo
    team2_elo <- team_db$elo[team_db$teams == match$def_team]
    
    # Calculate new Elo ratings
    new_elo <- elo.calc(wins.A = match$off_win, # Select game outcome
                        elo.A = team1_elo,     # Set Elo for team 1
                        elo.B = team2_elo,     # Set Elo for team 2
                        k = 50)                # Set update speed
    
    # Store new Elo ratings for home team
    team_db$elo[team_db$teams == match$off_team] <- new_elo[1, 1]
    # Store new Elo ratings for away team
    team_db$elo[team_db$teams == match$def_team] <- new_elo[1, 2]
  }
  
  # Assign the updated Elo ratings back to the correct team_db in the list
  team_db_list[[i]] <- team_db
}

team_db_14 <- as.data.frame(team_db_list[1]) %>%
  arrange(desc(elo))

team_db_15 <- as.data.frame(team_db_list[2]) %>%
  arrange(desc(elo)) 


team_db_16 <- as.data.frame(team_db_list[3]) %>%
  arrange(desc(elo)) 


team_db_17 <- as.data.frame(team_db_list[4]) %>%
  arrange(desc(elo)) 


team_db_18 <- as.data.frame(team_db_list[5]) %>%
  arrange(desc(elo)) 


team_db_19 <- as.data.frame(team_db_list[6]) %>%
  arrange(desc(elo)) 


team_db_20 <- as.data.frame(team_db_list[7]) %>%
  arrange(desc(elo)) 


team_db_21 <- as.data.frame(team_db_list[8]) %>%
  arrange(desc(elo)) 


team_db_22 <- as.data.frame(team_db_list[9]) %>%
  arrange(desc(elo)) 


team_db_23 <- as.data.frame(team_db_list[10]) %>%
  arrange(desc(elo))

# Define a list of team_db data frames
team_db_list <- list(team_db_14, team_db_15, team_db_16, team_db_17,
                     team_db_18, team_db_19, team_db_20, team_db_21,
                     team_db_22, team_db_23)

# Loop through each team_db data frame
for (i in seq_along(team_db_list)) {
  team_db <- team_db_list[[i]]
  
  # Initialize wins and losses vectors
  wins <- rep(0, nrow(team_db))
  losses <- rep(0, nrow(team_db))

  for(j in 1:nrow(team_db)){
    team <- team_db$teams[j]

    # Calculate wins: Team is offensive team and won
    wins[j] <- sum(game_res_list[[i]]$off_team == team & game_res_list[[i]]$off_win == 1)

    # Calculate losses: Team is defensive team and lost
    losses[j] <- sum(game_res_list[[i]]$def_team == team & game_res_list[[i]]$off_win == 1)
  }

  # Add wins and losses to team_db
  team_db$wins <- wins
  team_db$losses <- losses

  # Calculate win percentage
  team_db$win_pct <- team_db$wins / (team_db$wins + team_db$losses)
  
  team_db_list[[i]] <- team_db

}

team_db_14 <- as.data.frame(team_db_list[1]) %>%
  arrange(desc(elo))

team_db_15 <- as.data.frame(team_db_list[2]) %>%
  arrange(desc(elo)) 


team_db_16 <- as.data.frame(team_db_list[3]) %>%
  arrange(desc(elo)) 


team_db_17 <- as.data.frame(team_db_list[4]) %>%
  arrange(desc(elo)) 


team_db_18 <- as.data.frame(team_db_list[5]) %>%
  arrange(desc(elo)) 


team_db_19 <- as.data.frame(team_db_list[6]) %>%
  arrange(desc(elo)) 


team_db_20 <- as.data.frame(team_db_list[7]) %>%
  arrange(desc(elo)) 


team_db_21 <- as.data.frame(team_db_list[8]) %>%
  arrange(desc(elo)) 


team_db_22 <- as.data.frame(team_db_list[9]) %>%
  arrange(desc(elo)) 


team_db_23 <- as.data.frame(team_db_list[10]) %>%
  arrange(desc(elo))

# Define a list of team_db data frames
team_db_list <- list(team_db_14, team_db_15, team_db_16, team_db_17,
                     team_db_18, team_db_19, team_db_20, team_db_21,
                     team_db_22, team_db_23)

# Define lists of teams for Eastern and Western conferences
eastern <- c("BOS", "MIL", "PHI", "CLE", "BKN", 
             "NYK", "MIA", "ATL", "WAS", "TOR", 
             "CHI", "IND", "ORL", "CHA", "DET")

western <- c("DEN", "MEM", "SAC", "LAC", "PHX",
             "DAL", "NOP", "MIN", "GSW", "OKC", 
             "UTA", "POR", "LAL", "SAS", "HOU")

# Loop through each team_db data frame
for (i in seq_along(team_db_list)) {
  team_db <- team_db_list[[i]]
  
  
  # Calculate conference and conf_rank
  conference <- rep(NA, nrow(team_db))
  conf_rank <- rep(NA, nrow(team_db))
  conference[team_db$teams %in% eastern] <- "East"
  conference[team_db$teams %in% western] <- "West"
  conf_rank[team_db$teams %in% eastern] <- 16 - rank(team_db$win_pct[team_db$teams %in% eastern], ties.method = "random")
  conf_rank[team_db$teams %in% western] <- 16 - rank(team_db$win_pct[team_db$teams %in% western], ties.method = "random")
  team_db$conference <- conference
  team_db$conf_rank <- conf_rank
  
  # Manually correcting the conference rankings based on actual standings for teams with the same record to reflect tie-breaker system & play-in results (only for teams in the playoffs so we get the correct playoff matchups so some teams outside of the top 8 in their conference may have the wrong conference rankings)
  if (i == 1) {  # For the 2014 season
    team_db$conf_rank[team_db$teams == "BKN"] <- 8
    team_db$conf_rank[team_db$teams == "IND"] <- 9
    team_db$conf_rank[team_db$teams == "HOU"] <- 2
    team_db$conf_rank[team_db$teams == "LAC"] <- 3
    team_db$conf_rank[team_db$teams == "POR"] <- 4
    team_db$conf_rank[team_db$teams == "MEM"] <- 5
    team_db$conf_rank[team_db$teams == "SAS"] <- 6
  }
  
  if (i == 2) {  # For the 2015 season
    team_db$conf_rank[team_db$teams == "MIA"] <- 3
    team_db$conf_rank[team_db$teams == "ATL"] <- 4
    team_db$conf_rank[team_db$teams == "BOS"] <- 5
    team_db$conf_rank[team_db$teams == "CHA"] <- 6
    team_db$conf_rank[team_db$teams == "MEM"] <- 7
    team_db$conf_rank[team_db$teams == "DAL"] <- 6
  }
  
  if (i == 3) {  # For the 2016 season
    team_db$conf_rank[team_db$teams == "MIL"] <- 6
    team_db$conf_rank[team_db$teams == "IND"] <- 7
    team_db$conf_rank[team_db$teams == "LAC"] <- 4
    team_db$conf_rank[team_db$teams == "UTA"] <- 5

  }
  
  if (i == 4) {  # For the 2017 season
    team_db$conf_rank[team_db$teams == "MIA"] <- 6
    team_db$conf_rank[team_db$teams == "MIL"] <- 7
    team_db$conf_rank[team_db$teams == "OKC"] <- 4
    team_db$conf_rank[team_db$teams == "UTA"] <- 5
    team_db$conf_rank[team_db$teams == "NOP"] <- 6
    team_db$conf_rank[team_db$teams == "SAS"] <- 7
    team_db$conf_rank[team_db$teams == "MIN"] <- 8
  }
  
  if (i == 5) {  # For the 2018 season
    team_db$conf_rank[team_db$teams == "BKN"] <- 6
    team_db$conf_rank[team_db$teams == "ORL"] <- 7
    team_db$conf_rank[team_db$teams == "POR"] <- 3
    team_db$conf_rank[team_db$teams == "HOU"] <- 4
  }
  
  if (i == 6) {  # For the 2019 season
    team_db$conf_rank[team_db$teams == "HOU"] <- 4
    team_db$conf_rank[team_db$teams == "OKC"] <- 5
    team_db$conf_rank[team_db$teams == "UTA"] <- 6
    
  }
  
  if (i == 7) {  # For the 2020 season
    team_db$conf_rank[team_db$teams == "NYK"] <- 4
    team_db$conf_rank[team_db$teams == "ATL"] <- 5
    team_db$conf_rank[team_db$teams == "WAS"] <- 8
    team_db$conf_rank[team_db$teams == "IND"] <- 9
    team_db$conf_rank[team_db$teams == "DEN"] <- 3
    team_db$conf_rank[team_db$teams == "LAC"] <- 4
    team_db$conf_rank[team_db$teams == "DAL"] <- 5
    team_db$conf_rank[team_db$teams == "POR"] <- 6
    team_db$conf_rank[team_db$teams == "LAL"] <- 7
    team_db$conf_rank[team_db$teams == "MEM"] <- 8

  }
  
  if (i == 8) {  # For the 2021 season
    team_db$conf_rank[team_db$teams == "CEL"] <- 2
    team_db$conf_rank[team_db$teams == "MIL"] <- 3
    team_db$conf_rank[team_db$teams == "PHI"] <- 4
    team_db$conf_rank[team_db$teams == "ATL"] <- 8
    team_db$conf_rank[team_db$teams == "CLE"] <- 9
    team_db$conf_rank[team_db$teams == "CHA"] <- 10
    team_db$conf_rank[team_db$teams == "NOP"] <- 8
    team_db$conf_rank[team_db$teams == "LAC"] <- 9
  }
 
   if (i == 9) {  # For the 2022 season
    team_db$conf_rank[team_db$teams == "ATL"] <- 7
    team_db$conf_rank[team_db$teams == "MIA"] <- 8
    team_db$conf_rank[team_db$teams == "LAC"] <- 5
    team_db$conf_rank[team_db$teams == "GSW"] <- 6
    team_db$conf_rank[team_db$teams == "MIN"] <- 8
    team_db$conf_rank[team_db$teams == "NOP"] <- 9
  }
  
  if (i == 10) {  # For the 2023 season
    team_db$conf_rank[team_db$teams == "ORL"] <- 5
    team_db$conf_rank[team_db$teams == "IND"] <- 6
    team_db$conf_rank[team_db$teams == "PHI"] <- 7
    team_db$conf_rank[team_db$teams == "OKC"] <- 1
    team_db$conf_rank[team_db$teams == "DEN"] <- 2
    team_db$conf_rank[team_db$teams == "LAL"] <- 7
    team_db$conf_rank[team_db$teams == "NOP"] <- 8
  }
  

  
  team_db_list[[i]] <- team_db
  
  
}

team_db_14 <- as.data.frame(team_db_list[1]) %>%
  arrange(desc(elo))

team_db_15 <- as.data.frame(team_db_list[2]) %>%
  arrange(desc(elo)) 


team_db_16 <- as.data.frame(team_db_list[3]) %>%
  arrange(desc(elo)) 


team_db_17 <- as.data.frame(team_db_list[4]) %>%
  arrange(desc(elo)) 


team_db_18 <- as.data.frame(team_db_list[5]) %>%
  arrange(desc(elo)) 


team_db_19 <- as.data.frame(team_db_list[6]) %>%
  arrange(desc(elo)) 


team_db_20 <- as.data.frame(team_db_list[7]) %>%
  arrange(desc(elo)) 


team_db_21 <- as.data.frame(team_db_list[8]) %>%
  arrange(desc(elo)) 


team_db_22 <- as.data.frame(team_db_list[9]) %>%
  arrange(desc(elo)) 


team_db_23 <- as.data.frame(team_db_list[10]) %>%
  arrange(desc(elo))

team_db2 <- team_db_list

# Function to calculate the probability of team A winning
elo.prob <- function(elo.A, elo.B) {
  return(1 / (1 + 10^((elo.B - elo.A) / 400)))
}

# Function to simulate a playoff series
sim_series <- function(team_1, team_2, team_db) {
  series_res <- data.frame(game_res = rep(NA, 7),
                           game_win_prob = rep(NA, 7),
                           game_sim_val = rep(NA, 7))
  
  stop <- FALSE
  i <- 0
  team_1_wins <- 0
  team_2_wins <- 0
  
  # Get Elo ratings
  team_1_elo <- team_db$elo[team_db$teams == team_1]
  team_2_elo <- team_db$elo[team_db$teams == team_2]
  
  while (!stop && i < 7) {
    i <- i + 1
    
    # Calculate win probability for team_1
    series_res$game_win_prob[i] <- elo.prob(team_1_elo, team_2_elo)
    
    # Simulate game outcome
    series_res$game_sim_val[i] <- runif(1, min = 0, max = 1)
    
    if (series_res$game_sim_val[i] <= series_res$game_win_prob[i]) {
      series_res$game_res[i] <- 1
      team_1_wins <- team_1_wins + 1
    } else {
      series_res$game_res[i] <- 0
      team_2_wins <- team_2_wins + 1
    }
    
    if (team_1_wins == 4 || team_2_wins == 4) {
      stop <- TRUE
    }
  }
  
  # Determine winner and loser
  if (team_1_wins == 4) {
    winner <- team_1
    loser <- team_2
    series_win_prob <- mean(series_res$game_win_prob[series_res$game_res == 1], na.rm = TRUE)
  } else {
    winner <- team_2
    loser <- team_1
    series_win_prob <- mean(1 - series_res$game_win_prob[series_res$game_res == 0], na.rm = TRUE)
  }
  
  num_games <- i
  
  return(list(winner = winner, 
              loser = loser,
              series_res = series_res,
              num_games = num_games,
              series_win_prob = series_win_prob))
}

# Function to run the simulation
run_simulation <- function(team1, team2, team_db) {
  set.seed(123456)
  
  result <- sim_series(team_1 = team1, team_2 = team2, team_db = team_db)
  
  print(paste(result$winner, "is projected to beat", result$loser, 
              "with a", round(result$series_win_prob * 100, 0), 
              "% chance of winning in", result$num_games, 
              "games but", result$loser, "has a", 
              round((1 - result$series_win_prob) * 100, 0), 
              "% chance of beating", result$winner))
}

2014 Playoffs

Round 1

# Round 1
run_simulation("BKN", "ATL", team_db_14)

## [1] "BKN is projected to beat ATL with a 54 % chance of winning in 6 games but ATL has a 46 % chance of beating BKN"

run_simulation("MIL", "CHI", team_db_14)

## [1] "CHI is projected to beat MIL with a 63 % chance of winning in 7 games but MIL has a 37 % chance of beating CHI"

run_simulation("BOS", "CLE", team_db_14)

## [1] "BOS is projected to beat CLE with a 59 % chance of winning in 6 games but CLE has a 41 % chance of beating BOS"

run_simulation("TOR", "WAS", team_db_14)

## [1] "TOR is projected to beat WAS with a 55 % chance of winning in 6 games but WAS has a 45 % chance of beating TOR"

run_simulation("NOP", "GSW", team_db_14)

## [1] "GSW is projected to beat NOP with a 74 % chance of winning in 4 games but NOP has a 26 % chance of beating GSW"

run_simulation("DAL", "HOU", team_db_14)

## [1] "HOU is projected to beat DAL with a 70 % chance of winning in 4 games but DAL has a 30 % chance of beating HOU"

run_simulation("SAS", "LAC", team_db_14)

## [1] "SAS is projected to beat LAC with a 52 % chance of winning in 6 games but LAC has a 48 % chance of beating SAS"

run_simulation("MEM", "POR", team_db_14)

## [1] "MEM is projected to beat POR with a 74 % chance of winning in 6 games but POR has a 26 % chance of beating MEM"

Round 2

# Round 2
run_simulation("TOR", "BKN", team_db_14)

## [1] "TOR is projected to beat BKN with a 55 % chance of winning in 6 games but BKN has a 45 % chance of beating TOR"

run_simulation("CHI", "BOS", team_db_14)

## [1] "BOS is projected to beat CHI with a 71 % chance of winning in 4 games but CHI has a 29 % chance of beating BOS"

run_simulation("GSW", "MEM", team_db_14)

## [1] "GSW is projected to beat MEM with a 74 % chance of winning in 6 games but MEM has a 26 % chance of beating GSW"

run_simulation("SAS", "HOU", team_db_14)

## [1] "SAS is projected to beat HOU with a 62 % chance of winning in 6 games but HOU has a 38 % chance of beating SAS"

Round 3

# Round 3
run_simulation("TOR", "BOS", team_db_14)

## [1] "BOS is projected to beat TOR with a 77 % chance of winning in 4 games but TOR has a 23 % chance of beating BOS"

run_simulation("GSW", "SAS", team_db_14)

## [1] "GSW is projected to beat SAS with a 50 % chance of winning in 6 games but SAS has a 50 % chance of beating GSW"

Finals

# Finals
run_simulation("GSW", "BOS", team_db_14)

## [1] "GSW is projected to beat BOS with a 60 % chance of winning in 6 games but BOS has a 40 % chance of beating GSW"

2015 Playoffs

Round 1

# Round 1
run_simulation("BOS", "ATL", team_db_15)

## [1] "BOS is projected to beat ATL with a 53 % chance of winning in 6 games but ATL has a 47 % chance of beating BOS"

run_simulation("DET", "CLE", team_db_15)

## [1] "DET is projected to beat CLE with a 59 % chance of winning in 6 games but CLE has a 41 % chance of beating DET"

run_simulation("CHA", "MIA", team_db_15)

## [1] "CHA is projected to beat MIA with a 56 % chance of winning in 6 games but MIA has a 44 % chance of beating CHA"

run_simulation("TOR", "IND", team_db_15)

## [1] "TOR is projected to beat IND with a 67 % chance of winning in 6 games but IND has a 33 % chance of beating TOR"

run_simulation("GSW", "HOU", team_db_15)

## [1] "GSW is projected to beat HOU with a 84 % chance of winning in 4 games but HOU has a 16 % chance of beating GSW"

run_simulation("DAL", "OKC", team_db_15)

## [1] "DAL is projected to beat OKC with a 57 % chance of winning in 6 games but OKC has a 43 % chance of beating DAL"

run_simulation("LAC", "POR", team_db_15)

## [1] "LAC is projected to beat POR with a 52 % chance of winning in 6 games but POR has a 48 % chance of beating LAC"

run_simulation("SAS", "MEM", team_db_15)

## [1] "SAS is projected to beat MEM with a 92 % chance of winning in 4 games but MEM has a 8 % chance of beating SAS"

Round 2

# Round 2
run_simulation("BOS", "DET", team_db_15)

## [1] "BOS is projected to beat DET with a 49 % chance of winning in 6 games but DET has a 51 % chance of beating BOS"

run_simulation("TOR", "CHA", team_db_15)

## [1] "TOR is projected to beat CHA with a 60 % chance of winning in 6 games but CHA has a 40 % chance of beating TOR"

run_simulation("GSW", "LAC", team_db_15)

## [1] "GSW is projected to beat LAC with a 76 % chance of winning in 5 games but LAC has a 24 % chance of beating GSW"

run_simulation("DAL", "SAS", team_db_15)

## [1] "SAS is projected to beat DAL with a 70 % chance of winning in 4 games but DAL has a 30 % chance of beating SAS"

Round 3

# Round 3
run_simulation("BOS", "TOR", team_db_15)

## [1] "TOR is projected to beat BOS with a 61 % chance of winning in 7 games but BOS has a 39 % chance of beating TOR"

run_simulation("GSW", "SAS", team_db_15)

## [1] "GSW is projected to beat SAS with a 64 % chance of winning in 6 games but SAS has a 36 % chance of beating GSW"

Finals

# Finals
run_simulation("GSW", "BOS", team_db_15)

## [1] "GSW is projected to beat BOS with a 80 % chance of winning in 5 games but BOS has a 20 % chance of beating GSW"

2016 Playoffs

Round 1

# Round 1
run_simulation("BOS", "CHI", team_db_16)

## [1] "BOS is projected to beat CHI with a 63 % chance of winning in 6 games but CHI has a 37 % chance of beating BOS"

run_simulation("IND", "CLE", team_db_16)

## [1] "IND is projected to beat CLE with a 73 % chance of winning in 6 games but CLE has a 27 % chance of beating IND"

run_simulation("TOR", "MIL", team_db_16)

## [1] "TOR is projected to beat MIL with a 71 % chance of winning in 6 games but MIL has a 29 % chance of beating TOR"

run_simulation("WAS", "ATL", team_db_16)

## [1] "WAS is projected to beat ATL with a 52 % chance of winning in 6 games but ATL has a 48 % chance of beating WAS"

run_simulation("POR", "GSW", team_db_16)

## [1] "GSW is projected to beat POR with a 70 % chance of winning in 4 games but POR has a 30 % chance of beating GSW"

run_simulation("HOU", "OKC", team_db_16)

## [1] "HOU is projected to beat OKC with a 55 % chance of winning in 6 games but OKC has a 45 % chance of beating HOU"

run_simulation("SAS", "MEM", team_db_16)

## [1] "SAS is projected to beat MEM with a 80 % chance of winning in 4 games but MEM has a 20 % chance of beating SAS"

run_simulation("UTA", "LAC", team_db_16)

## [1] "UTA is projected to beat LAC with a 53 % chance of winning in 6 games but LAC has a 47 % chance of beating UTA"

Round 2

# Round 2
run_simulation("BOS", "WAS", team_db_16)

## [1] "BOS is projected to beat WAS with a 66 % chance of winning in 6 games but WAS has a 34 % chance of beating BOS"

run_simulation("IND", "TOR", team_db_16)

## [1] "TOR is projected to beat IND with a 62 % chance of winning in 7 games but IND has a 38 % chance of beating TOR"

run_simulation("GSW", "UTA", team_db_16)

## [1] "GSW is projected to beat UTA with a 63 % chance of winning in 6 games but UTA has a 37 % chance of beating GSW"

run_simulation("HOU", "SAS", team_db_16)

## [1] "HOU is projected to beat SAS with a 53 % chance of winning in 6 games but SAS has a 47 % chance of beating HOU"

Round 3

# Round 3
run_simulation("TOR", "BOS", team_db_16)

## [1] "TOR is projected to beat BOS with a 55 % chance of winning in 6 games but BOS has a 45 % chance of beating TOR"

run_simulation("GSW", "HOU", team_db_16)

## [1] "GSW is projected to beat HOU with a 80 % chance of winning in 5 games but HOU has a 20 % chance of beating GSW"

Finals

# Finals
run_simulation("GSW", "TOR", team_db_16)

## [1] "GSW is projected to beat TOR with a 71 % chance of winning in 6 games but TOR has a 29 % chance of beating GSW"

2017 Playoffs

Round 1

# Round 1
run_simulation("BOS", "MIL", team_db_17)

## [1] "BOS is projected to beat MIL with a 53 % chance of winning in 6 games but MIL has a 47 % chance of beating BOS"

run_simulation("CLE", "IND", team_db_17)

## [1] "CLE is projected to beat IND with a 50 % chance of winning in 6 games but IND has a 50 % chance of beating CLE"

run_simulation("PHI", "MIA", team_db_17)

## [1] "PHI is projected to beat MIA with a 85 % chance of winning in 4 games but MIA has a 15 % chance of beating PHI"

run_simulation("TOR", "WAS", team_db_17)

## [1] "TOR is projected to beat WAS with a 86 % chance of winning in 4 games but WAS has a 14 % chance of beating TOR"

run_simulation("SAS", "GSW", team_db_17)

## [1] "SAS is projected to beat GSW with a 65 % chance of winning in 6 games but GSW has a 35 % chance of beating SAS"

run_simulation("HOU", "MIN", team_db_17)

## [1] "HOU is projected to beat MIN with a 72 % chance of winning in 6 games but MIN has a 28 % chance of beating HOU"

run_simulation("NOP", "POR", team_db_17)

## [1] "NOP is projected to beat POR with a 56 % chance of winning in 6 games but POR has a 44 % chance of beating NOP"

run_simulation("UTA", "OKC", team_db_17)

## [1] "UTA is projected to beat OKC with a 56 % chance of winning in 6 games but OKC has a 44 % chance of beating UTA"

Round 2

# Round 2
run_simulation("BOS", "PHI", team_db_17)

## [1] "PHI is projected to beat BOS with a 83 % chance of winning in 4 games but BOS has a 17 % chance of beating PHI"

run_simulation("TOR", "CLE", team_db_17)

## [1] "TOR is projected to beat CLE with a 59 % chance of winning in 6 games but CLE has a 41 % chance of beating TOR"

run_simulation("SAS", "NOP", team_db_17)

## [1] "NOP is projected to beat SAS with a 63 % chance of winning in 7 games but SAS has a 37 % chance of beating NOP"

run_simulation("HOU", "UTA", team_db_17)

## [1] "HOU is projected to beat UTA with a 58 % chance of winning in 6 games but UTA has a 42 % chance of beating HOU"

Round 3

# Round 3
run_simulation("TOR", "BOS", team_db_17)

## [1] "TOR is projected to beat BOS with a 67 % chance of winning in 6 games but BOS has a 33 % chance of beating TOR"

run_simulation("HOU", "NOP", team_db_17)

## [1] "HOU is projected to beat NOP with a 60 % chance of winning in 6 games but NOP has a 40 % chance of beating HOU"

Finals

# Finals
run_simulation("TOR", "HOU", team_db_17)

## [1] "HOU is projected to beat TOR with a 63 % chance of winning in 7 games but TOR has a 37 % chance of beating HOU"

2018 Playoffs

Round 1

# Round 1
run_simulation("BOS", "IND", team_db_18)

## [1] "BOS is projected to beat IND with a 64 % chance of winning in 6 games but IND has a 36 % chance of beating BOS"

run_simulation("MIL", "DET", team_db_18)

## [1] "MIL is projected to beat DET with a 72 % chance of winning in 6 games but DET has a 28 % chance of beating MIL"

run_simulation("PHI", "BKN", team_db_18)

## [1] "BKN is projected to beat PHI with a 67 % chance of winning in 4 games but PHI has a 33 % chance of beating BKN"

run_simulation("ORL", "TOR", team_db_18)

## [1] "ORL is projected to beat TOR with a 61 % chance of winning in 6 games but TOR has a 39 % chance of beating ORL"

run_simulation("SAS", "DEN", team_db_18)

## [1] "SAS is projected to beat DEN with a 54 % chance of winning in 6 games but DEN has a 46 % chance of beating SAS"

run_simulation("GSW", "LAC", team_db_18)

## [1] "GSW is projected to beat LAC with a 53 % chance of winning in 6 games but LAC has a 47 % chance of beating GSW"

run_simulation("HOU", "UTA", team_db_18)

## [1] "HOU is projected to beat UTA with a 63 % chance of winning in 6 games but UTA has a 37 % chance of beating HOU"

run_simulation("POR", "OKC", team_db_18)

## [1] "POR is projected to beat OKC with a 58 % chance of winning in 6 games but OKC has a 42 % chance of beating POR"

Round 2

# Round 2
run_simulation("MIL", "BOS", team_db_18)

## [1] "MIL is projected to beat BOS with a 56 % chance of winning in 6 games but BOS has a 44 % chance of beating MIL"

run_simulation("BKN", "ORL", team_db_18)

## [1] "ORL is projected to beat BKN with a 65 % chance of winning in 5 games but BKN has a 35 % chance of beating ORL"

run_simulation("GSW", "HOU", team_db_18)

## [1] "HOU is projected to beat GSW with a 62 % chance of winning in 7 games but GSW has a 38 % chance of beating HOU"

run_simulation("POR", "SAS", team_db_18)

## [1] "POR is projected to beat SAS with a 67 % chance of winning in 6 games but SAS has a 33 % chance of beating POR"

Round 3

# Round 3
run_simulation("ORL", "MIL", team_db_18)

## [1] "ORL is projected to beat MIL with a 62 % chance of winning in 6 games but MIL has a 38 % chance of beating ORL"

run_simulation("POR", "HOU", team_db_18)

## [1] "POR is projected to beat HOU with a 53 % chance of winning in 6 games but HOU has a 47 % chance of beating POR"

Finals

# Finals
run_simulation("POR", "ORL", team_db_18)

## [1] "POR is projected to beat ORL with a 51 % chance of winning in 6 games but ORL has a 49 % chance of beating POR"

2019 Playoffs

Round 1

# Round 1
run_simulation("BOS", "PHI", team_db_19)

## [1] "BOS is projected to beat PHI with a 58 % chance of winning in 6 games but PHI has a 42 % chance of beating BOS"

run_simulation("MIA", "IND", team_db_19)

## [1] "IND is projected to beat MIA with a 76 % chance of winning in 4 games but MIA has a 24 % chance of beating IND"

run_simulation("ORL", "MIL", team_db_19)

## [1] "ORL is projected to beat MIL with a 55 % chance of winning in 6 games but MIL has a 45 % chance of beating ORL"

run_simulation("TOR", "BKN", team_db_19)

## [1] "TOR is projected to beat BKN with a 74 % chance of winning in 6 games but BKN has a 26 % chance of beating TOR"

run_simulation("UTA", "DEN", team_db_19)

## [1] "UTA is projected to beat DEN with a 54 % chance of winning in 6 games but DEN has a 46 % chance of beating UTA"

run_simulation("HOU", "OKC", team_db_19)

## [1] "OKC is projected to beat HOU with a 66 % chance of winning in 4 games but HOU has a 34 % chance of beating OKC"

run_simulation("LAC", "DAL", team_db_19)

## [1] "LAC is projected to beat DAL with a 79 % chance of winning in 5 games but DAL has a 21 % chance of beating LAC"

run_simulation("LAL", "POR", team_db_19)

## [1] "POR is projected to beat LAL with a 66 % chance of winning in 5 games but LAL has a 34 % chance of beating POR"

Round 2

# Round 2
run_simulation("BOS", "TOR", team_db_19)

## [1] "TOR is projected to beat BOS with a 75 % chance of winning in 4 games but BOS has a 25 % chance of beating TOR"

run_simulation("IND", "ORL", team_db_19)

## [1] "IND is projected to beat ORL with a 74 % chance of winning in 6 games but ORL has a 26 % chance of beating IND"

run_simulation("UTA", "LAC", team_db_19)

## [1] "LAC is projected to beat UTA with a 77 % chance of winning in 4 games but UTA has a 23 % chance of beating LAC"

run_simulation("POR", "OKC", team_db_19)

## [1] "POR is projected to beat OKC with a 62 % chance of winning in 6 games but OKC has a 38 % chance of beating POR"

Round 3

# Round 3
run_simulation("TOR", "IND", team_db_19)

## [1] "TOR is projected to beat IND with a 67 % chance of winning in 6 games but IND has a 33 % chance of beating TOR"

run_simulation("LAC", "POR", team_db_19)

## [1] "LAC is projected to beat POR with a 54 % chance of winning in 6 games but POR has a 46 % chance of beating LAC"

Finals

# Finals
run_simulation("LAC", "TOR", team_db_19)

## [1] "TOR is projected to beat LAC with a 62 % chance of winning in 7 games but LAC has a 38 % chance of beating TOR"

2020 Playoffs

Round 1

# Round 1
run_simulation("ATL", "NYK", team_db_20)

## [1] "ATL is projected to beat NYK with a 57 % chance of winning in 6 games but NYK has a 43 % chance of beating ATL"

run_simulation("BKN", "BOS", team_db_20)

## [1] "BKN is projected to beat BOS with a 78 % chance of winning in 5 games but BOS has a 22 % chance of beating BKN"

run_simulation("MIA", "MIL", team_db_20)

## [1] "MIA is projected to beat MIL with a 53 % chance of winning in 6 games but MIL has a 47 % chance of beating MIA"

run_simulation("PHI", "WAS", team_db_20)

## [1] "PHI is projected to beat WAS with a 54 % chance of winning in 6 games but WAS has a 46 % chance of beating PHI"

run_simulation("DEN", "POR", team_db_20)

## [1] "POR is projected to beat DEN with a 63 % chance of winning in 7 games but DEN has a 37 % chance of beating POR"

run_simulation("DAL", "LAC", team_db_20)

## [1] "DAL is projected to beat LAC with a 60 % chance of winning in 6 games but LAC has a 40 % chance of beating DAL"

run_simulation("PHX", "LAL", team_db_20)

## [1] "PHX is projected to beat LAL with a 61 % chance of winning in 6 games but LAL has a 39 % chance of beating PHX"

run_simulation("UTA", "MEM", team_db_20)

## [1] "UTA is projected to beat MEM with a 67 % chance of winning in 6 games but MEM has a 33 % chance of beating UTA"

Round 2

# Round 2
run_simulation("ATL", "PHI", team_db_20)

## [1] "ATL is projected to beat PHI with a 63 % chance of winning in 6 games but PHI has a 37 % chance of beating ATL"

run_simulation("BKN", "MIA", team_db_20)

## [1] "BKN is projected to beat MIA with a 53 % chance of winning in 6 games but MIA has a 47 % chance of beating BKN"

run_simulation("DAL", "UTA", team_db_20)

## [1] "UTA is projected to beat DAL with a 61 % chance of winning in 7 games but DAL has a 39 % chance of beating UTA"

run_simulation("PHX", "POR", team_db_20)

## [1] "PHX is projected to beat POR with a 52 % chance of winning in 6 games but POR has a 48 % chance of beating PHX"

Round 3

# Round 3
run_simulation("ATL", "BKN", team_db_20)

## [1] "ATL is projected to beat BKN with a 57 % chance of winning in 6 games but BKN has a 43 % chance of beating ATL"

run_simulation("PHX", "UTA", team_db_20)

## [1] "PHX is projected to beat UTA with a 62 % chance of winning in 6 games but UTA has a 38 % chance of beating PHX"

Finals

# Finals
run_simulation("PHX", "ATL", team_db_20)

## [1] "PHX is projected to beat ATL with a 50 % chance of winning in 6 games but ATL has a 50 % chance of beating PHX"

2021 Playoffs

Round 1

# Round 1
run_simulation("BOS", "BKN", team_db_21)

## [1] "BOS is projected to beat BKN with a 66 % chance of winning in 6 games but BKN has a 34 % chance of beating BOS"

run_simulation("MIA", "ATL", team_db_21)

## [1] "MIA is projected to beat ATL with a 58 % chance of winning in 6 games but ATL has a 42 % chance of beating MIA"

run_simulation("MIL", "CHI", team_db_21)

## [1] "MIL is projected to beat CHI with a 69 % chance of winning in 6 games but CHI has a 31 % chance of beating MIL"

run_simulation("TOR", "PHI", team_db_21)

## [1] "TOR is projected to beat PHI with a 57 % chance of winning in 6 games but PHI has a 43 % chance of beating TOR"

run_simulation("DAL", "UTA", team_db_21)

## [1] "DAL is projected to beat UTA with a 78 % chance of winning in 5 games but UTA has a 22 % chance of beating DAL"

run_simulation("GSW", "DEN", team_db_21)

## [1] "GSW is projected to beat DEN with a 59 % chance of winning in 6 games but DEN has a 41 % chance of beating GSW"

run_simulation("MEM", "MIN", team_db_21)

## [1] "MEM is projected to beat MIN with a 60 % chance of winning in 6 games but MIN has a 40 % chance of beating MEM"

run_simulation("PHX", "NOP", team_db_21)

## [1] "PHX is projected to beat NOP with a 65 % chance of winning in 6 games but NOP has a 35 % chance of beating PHX"

Round 2

# Round 2
run_simulation("BOS", "MIL", team_db_21)

## [1] "BOS is projected to beat MIL with a 64 % chance of winning in 6 games but MIL has a 36 % chance of beating BOS"

run_simulation("TOR", "MIA", team_db_21)

## [1] "TOR is projected to beat MIA with a 53 % chance of winning in 6 games but MIA has a 47 % chance of beating TOR"

run_simulation("DAL", "PHX", team_db_21)

## [1] "DAL is projected to beat PHX with a 69 % chance of winning in 6 games but PHX has a 31 % chance of beating DAL"

run_simulation("MEM", "GSW", team_db_21)

## [1] "MEM is projected to beat GSW with a 56 % chance of winning in 6 games but GSW has a 44 % chance of beating MEM"

Round 3

# Round 3
run_simulation("BOS", "TOR", team_db_21)

## [1] "BOS is projected to beat TOR with a 55 % chance of winning in 6 games but TOR has a 45 % chance of beating BOS"

run_simulation("MEM", "DAL", team_db_21)

## [1] "DAL is projected to beat MEM with a 61 % chance of winning in 7 games but MEM has a 39 % chance of beating DAL"

Finals

# Finals
run_simulation("BOS", "DAL", team_db_21)

## [1] "BOS is projected to beat DAL with a 50 % chance of winning in 6 games but DAL has a 50 % chance of beating BOS"

2022 Playoffs

Round 1

# Round 1
run_simulation("BOS", "ATL", team_db_22)

## [1] "BOS is projected to beat ATL with a 76 % chance of winning in 5 games but ATL has a 24 % chance of beating BOS"

run_simulation("MIA", "MIL", team_db_22)

## [1] "MIL is projected to beat MIA with a 62 % chance of winning in 7 games but MIA has a 38 % chance of beating MIL"

run_simulation("CLE", "NYK", team_db_22)

## [1] "CLE is projected to beat NYK with a 59 % chance of winning in 6 games but NYK has a 41 % chance of beating CLE"

run_simulation("PHI", "BKN", team_db_22)

## [1] "PHI is projected to beat BKN with a 70 % chance of winning in 6 games but BKN has a 30 % chance of beating PHI"

run_simulation("DEN", "MIN", team_db_22)

## [1] "MIN is projected to beat DEN with a 68 % chance of winning in 4 games but DEN has a 32 % chance of beating MIN"

run_simulation("GSW", "SAC", team_db_22)

## [1] "GSW is projected to beat SAC with a 76 % chance of winning in 5 games but SAC has a 24 % chance of beating GSW"

run_simulation("LAL", "MEM", team_db_22)

## [1] "LAL is projected to beat MEM with a 65 % chance of winning in 6 games but MEM has a 35 % chance of beating LAL"

run_simulation("LAC", "PHX", team_db_22)

## [1] "LAC is projected to beat PHX with a 61 % chance of winning in 6 games but PHX has a 39 % chance of beating LAC"

Round 2

# Round 2
run_simulation("BOS", "PHI", team_db_22)

## [1] "BOS is projected to beat PHI with a 56 % chance of winning in 6 games but PHI has a 44 % chance of beating BOS"

run_simulation("MIL", "CLE", team_db_22)

## [1] "MIL is projected to beat CLE with a 54 % chance of winning in 6 games but CLE has a 46 % chance of beating MIL"

run_simulation("LAC", "MIN", team_db_22)

## [1] "LAC is projected to beat MIN with a 60 % chance of winning in 6 games but MIN has a 40 % chance of beating LAC"

run_simulation("LAL", "GSW", team_db_22)

## [1] "LAL is projected to beat GSW with a 59 % chance of winning in 6 games but GSW has a 41 % chance of beating LAL"

Round 3

# Round 3
run_simulation("MIL", "BOS", team_db_22)

## [1] "BOS is projected to beat MIL with a 63 % chance of winning in 7 games but MIL has a 37 % chance of beating BOS"

run_simulation("LAL", "LAC", team_db_22)

## [1] "LAL is projected to beat LAC with a 51 % chance of winning in 6 games but LAC has a 49 % chance of beating LAL"

Finals

# Finals
run_simulation("BOS", "LAL", team_db_22)

## [1] "BOS is projected to beat LAL with a 51 % chance of winning in 6 games but LAL has a 49 % chance of beating BOS"

2023 Playoffs

Round 1

# Round 1
run_simulation("BOS", "MIA", team_db_23)

## [1] "BOS is projected to beat MIA with a 67 % chance of winning in 6 games but MIA has a 33 % chance of beating BOS"

run_simulation("CLE", "ORL", team_db_23)

## [1] "CLE is projected to beat ORL with a 64 % chance of winning in 6 games but ORL has a 36 % chance of beating CLE"

run_simulation("IND", "MIL", team_db_23)

## [1] "IND is projected to beat MIL with a 77 % chance of winning in 5 games but MIL has a 23 % chance of beating IND"

run_simulation("NYK", "PHI", team_db_23)

## [1] "PHI is projected to beat NYK with a 65 % chance of winning in 5 games but NYK has a 35 % chance of beating PHI"

run_simulation("DAL", "LAC", team_db_23)

## [1] "DAL is projected to beat LAC with a 74 % chance of winning in 6 games but LAC has a 26 % chance of beating DAL"

run_simulation("LAL", "DEN", team_db_23)

## [1] "LAL is projected to beat DEN with a 77 % chance of winning in 5 games but DEN has a 23 % chance of beating LAL"

run_simulation("PHX", "MIN", team_db_23)

## [1] "MIN is projected to beat PHX with a 62 % chance of winning in 7 games but PHX has a 38 % chance of beating MIN"

run_simulation("OKC", "NOP", team_db_23)

## [1] "OKC is projected to beat NOP with a 65 % chance of winning in 6 games but NOP has a 35 % chance of beating OKC"

Round 2

# Round 2
run_simulation("BOS", "CLE", team_db_23)

## [1] "BOS is projected to beat CLE with a 67 % chance of winning in 6 games but CLE has a 33 % chance of beating BOS"

run_simulation("IND", "PHI", team_db_23)

## [1] "IND is projected to beat PHI with a 52 % chance of winning in 6 games but PHI has a 48 % chance of beating IND"

run_simulation("DAL", "OKC", team_db_23)

## [1] "DAL is projected to beat OKC with a 55 % chance of winning in 6 games but OKC has a 45 % chance of beating DAL"

run_simulation("MIN", "LAL", team_db_23)

## [1] "LAL is projected to beat MIN with a 62 % chance of winning in 7 games but MIN has a 38 % chance of beating LAL"

Round 3

# Round 3
run_simulation("IND", "BOS", team_db_23)

## [1] "IND is projected to beat BOS with a 54 % chance of winning in 6 games but BOS has a 46 % chance of beating IND"

run_simulation("LAL", "DAL", team_db_23)

## [1] "LAL is projected to beat DAL with a 52 % chance of winning in 6 games but DAL has a 48 % chance of beating LAL"

Finals

# Finals
run_simulation("LAL", "IND", team_db_23)

## [1] "LAL is projected to beat IND with a 61 % chance of winning in 6 games but IND has a 39 % chance of beating LAL"

Overview of Model - Simple

At a high level, the model predicts NBA playoff series outcomes for multiple years based on the preceding regular season (e.g., 2014-15 regular season data is used to predicts playoffs series played in 2015 - both in the 2014 season), and is designed to provide insightful and accurate forecasts by simulating multiple seasons of data. The model uses Elo ratings, a well-established method for assessing team strength, to calculate the probability of each team winning a game. By inputting the current Elo ratings of the teams in a series, we can simulate the outcome of each game and, ultimately, the series. The simulation considers series lengths between 4 and 7 games, ensuring realistic scenarios and consistent win probability calculations. The model’s output offers a comprehensive view of potential series outcomes, helping front office decision-makers understand the likely trajectories and prepare strategic plans accordingly. This user-friendly tool requires minimal statistical expertise, making it accessible for informed decision-making in the fast-paced environment of NBA playoffs.

Overview of Model - Detailed

Data Preparation

For each season from 2014 to 2023, the game data is filtered to include only regular season games for each season (2014-2023) so that each season’s playoffs is only predicted from the stats of that previous regular season. These games are then organized by game ID and date to ensure that ELo rating is calculated correctly since the order of the games matter. The refined data retains only essential columns: season, game date, game ID, offensive team, offensive team name, defensive team, defensive team name, and game outcome (whether the offensive team won).

Creating Elo Databases

To create team data frames, unique team identifiers and names are extracted for each season. Each team starts with an initial Elo rating of 1500.

Updating Elo Ratings

The Elo ratings are updated for each game through a loop that iterates through each season’s team and game data frames. For each game, the Elo ratings of the competing teams are retrieved and updated using the elo.calc function, which considers the game outcome, initial Elo ratings, and a k-factor that determines the adjustment magnitude. The updated ratings are then stored back in the team data frame.

Calculating Win Percentages

For each team, the number of wins and losses is calculated from the game results, and win percentages are computed as the ratio of wins to total games played.

Determining Conference Rankings

Teams are assigned to either the Eastern or Western conference and ranked within their conferences based on win percentages. Manual adjustments are made to correct the rankings for tie-breaker systems and play-in results, ensuring accurate playoff matchups.

Organizing and Displaying Results

Teams are sorted by their updated Elo ratings and win percentages for each season. The team databases are then printed for review.

Playoff Simulation Execution

With the updated Elo ratings and accurate conference rankings, playoff series are simulated by iterating through matchups and determining the likelihood of a team winning a series based on their Elo ratings.

Notes

The code uses the elo.calc function for updating Elo ratings, which follows the standard Elo rating formula:

EA = 1/((1+10^RB-RA)/400)

where EA is the expected score for team A, and RA & RB are the Elo ratings for teams A and B, respectively. The k-factor, set to 50, determines the sensitivity of the Elo rating adjustments after each game. This setup allows for simulating playoff series outcomes by considering the updated Elo ratings after each game, ensuring a dynamic and responsive simulation that reflects real-world changes in team performance throughout the playoffs.

Strengths & Weaknesses of Model

Strengths * Uses a good sample size (entire regular season leading up to playoffs) to calculate playoff win percentage based on ELO * Accounts for strength of schedule & quality of wins and loses (the amount ELO goes up and down depends on how good the team they face is - e.g., goes up more for beating better teams and down more for losing to bad teams and vice versa) * Includes some randomness in its calculation that simulates the randomness that can occur in the playoffs

Weaknesses 1) Some variability depending on whether teams are team1 or team2 in run_simulation function which shouldn’t be the case
2) Relies heavily on a team’s regular season performance 3) Doesn’t account for specific stats (e.g., points, 3-point %, assists, etc.) 4) Doesn’t incorporate player stats 5) Seems to default to 4 and 6 game series (not many 5 or 7 game series) indicating that the length of series isn’t calculated correctly 6) Involves some some complicated calculations that can be hard to interpret 7) Simplified Elo Calculations: The model uses a basic Elo rating formula with a fixed k-factor, which may not capture the full complexity of team dynamics and matchups 8) Lack of Contextual Factors: The model does not account for injuries, player trades, or other contextual factors that could significantly impact team performance 9) Static Initial Ratings: All teams start with the same initial Elo rating, which may not accurately reflect the starting strength of teams each season 10) Manual Adjustments: The need for manual adjustments to correct rankings for tie-breaker systems and play-in results can introduce human error and inconsistencies 11) Simplified Outcome Prediction: The model predicts outcomes based on Elo ratings alone, without incorporating additional statistical methods or machine learning techniques that could improve prediction accuracy. 12) Inefficient coding: Repeat code a lot, especially for the different season, could be more efficient

How to Address Weaknesses (#’s algin with # of weaknesses above - not all weaknesses are directly addressed)

7) Enhanced Elo Calculations: Introduce a dynamic k-factor that adjusts based on the importance of the game (e.g., regular season vs. playoffs) or the margin of victory. This would make the Elo rating adjustments more sensitive to context
8) Contextual Data Integration: Incorporate additional data such as player injuries, trades, and other contextual factors. This could be achieved by integrating real-time sports analytics and news sources
9) Customized Initial Ratings: Use historical performance data to assign more accurate initial Elo ratings for each team at the start of each season, reflecting their true starting strength.
10) Automated Adjustments: Develop an automated system for handling tie-breaker systems and play-in results to reduce human error and ensure consistency
11) Advanced Prediction Methods: Incorporate machine learning techniques to enhance outcome predictions. Techniques such as regression analysis, decision trees, or neural networks could be used to improve the accuracy of series outcome predictions based on a wider array of variables
12) Take more time to find better ways to be more efficent with my code

2023 Plot of Playoff Teams Chances of Advancing to Next Round

# Load team colors database
temp <- teamcolors

# Renaming LA Clippers in team_db to Los Angeles Clippers to match temp df 
team_logos <- team_db %>%
  mutate(off_team_name = ifelse(off_team_name == "LA Clippers", "Los Angeles Clippers", off_team_name))
  
# Merge data and team colors
team_plot <- merge(team_logos, temp, by.x = "off_team_name", by.y = "name" , all.x  = TRUE)

library(dplyr)
library(ggplot2)
library(gganimate)
library(grid)
library(png)

# Function to simulate a playoff series
sim_series <- function(team_1, team_2, team_db) {
  series_res <- data.frame(game_res = rep(NA, 7),
                           game_win_prob = rep(NA, 7),
                           game_sim_val = rep(NA, 7))
  
  stop <- FALSE
  i <- 0
  team_1_wins <- 0
  team_2_wins <- 0
  
  # Get Elo ratings
  team_1_elo <- team_db$elo[team_db$teams == team_1]
  team_2_elo <- team_db$elo[team_db$teams == team_2]
  
  while (!stop && i < 7) {
    i <- i + 1
    
    # Calculate win probability for team_1
    series_res$game_win_prob[i] <- elo.prob(team_1_elo, team_2_elo)
    
    # Simulate game outcome
    series_res$game_sim_val[i] <- runif(1, min = 0, max = 1)
    
    if (series_res$game_sim_val[i] <= series_res$game_win_prob[i]) {
      series_res$game_res[i] <- 1
      team_1_wins <- team_1_wins + 1
    } else {
      series_res$game_res[i] <- 0
      team_2_wins <- team_2_wins + 1
    }
    
    if (team_1_wins == 4 || team_2_wins == 4) {
      stop <- TRUE
    }
  }
  
  # Determine winner and loser
  if (team_1_wins == 4) {
    winner <- team_1
    loser <- team_2
    series_win_prob <- mean(series_res$game_win_prob[series_res$game_res == 1], na.rm = TRUE)
  } else {
    winner <- team_2
    loser <- team_1
    series_win_prob <- mean(1 - series_res$game_win_prob[series_res$game_res == 0], na.rm = TRUE)
  }
  
  num_games <- i
  
  return(list(winner = winner, 
              loser = loser,
              series_res = series_res,
              num_games = num_games,
              series_win_prob = series_win_prob))
}

# Function to run the simulation and save the results
run_simulation <- function(team1, team2, team_db, round_num) {
  result <- sim_series(team_1 = team1, team_2 = team2, team_db = team_db)
  
  return(data.frame(
    team1 = team1,
    team2 = team2,
    winner = result$winner,
    loser = result$loser,
    win_prob = ifelse(result$winner == team1, result$series_win_prob, 1 - result$series_win_prob),
    round = round_num
  ))
}

# Simulate the playoffs and save the results
results <- list()

# Round 1
results[[1]] <- run_simulation("BOS", "MIA", team_db_23, 1)
results[[2]] <- run_simulation("CLE", "ORL", team_db_23, 1)
results[[3]] <- run_simulation("IND", "MIL", team_db_23, 1)
results[[4]] <- run_simulation("NYK", "PHI", team_db_23, 1)

results[[5]] <- run_simulation("DAL", "LAC", team_db_23, 1)
results[[6]] <- run_simulation("LAL", "DEN", team_db_23, 1)
results[[7]] <- run_simulation("PHX", "MIN", team_db_23, 1)
results[[8]] <- run_simulation("OKC", "NOP", team_db_23, 1)

# Round 2
results[[9]] <- run_simulation("BOS", "CLE", team_db_23, 2)
results[[10]] <- run_simulation("IND", "PHI", team_db_23, 2)

results[[11]] <- run_simulation("DAL", "OKC", team_db_23, 2)
results[[12]] <- run_simulation("MIN", "LAL", team_db_23, 2)

# Round 3
results[[13]] <- run_simulation("IND", "BOS", team_db_23, 3)
results[[14]] <- run_simulation("LAL", "DAL", team_db_23, 3)

# Finals
results[[15]] <- run_simulation("LAL", "IND", team_db_23, 4)

# Combine all results into a single data frame
playoff_results <- do.call(rbind, results)

# Merge playoff results with team logos
playoff_results <- playoff_results %>%
  left_join(team_plot, by = c("team1" = "teams")) %>%
  rename(team1_logo = logo) %>%
  left_join(team_plot, by = c("team2" = "teams")) %>%
  rename(team2_logo = logo)

# Prepare data for plotting
plot_data <- playoff_results %>%
  pivot_longer(cols = c(team1, team2), names_to = "team_type", values_to = "teams") %>%
  mutate(win_prob = ifelse(team_type == "team1", win_prob, 1 - win_prob),
         logo = ifelse(team_type == "team1", team1_logo, team2_logo)) %>%
  arrange(desc(win_prob))

# Plot for win% of that round (% chance of advancing to next round - or winning Finals for round 4)
plot_data_2 <- data.frame(teams = c(plot_data$winner, plot_data$loser),
                          win_prob = c(plot_data$win_prob, (1-plot_data$win_prob)),
                          round = c(plot_data$round, plot_data$round),
                          logo = c(plot_data$team1_logo, plot_data$team2_logo))

plot_data_2 <- plot_data_2 %>%
  distinct(teams, round, .keep_all = TRUE)

# Create the animated plot
p <- ggplot(plot_data_2, aes(x = teams, y = win_prob)) +
  geom_image(aes(image = logo), size = 0.1) +
  scale_y_continuous(labels = scales::percent, limits = c(0, 1)) +
  labs(x = "Teams", y = "Win Probability", title = "Win Probabilities by Round", subtitle = "Round: {closest_state}") +
  theme_minimal() +
  transition_states(round, transition_length = 2, state_length = 3) +
  enter_fade() +
  exit_fade()

# Animate and save the plot
animate(p, fps = 10)

Part 3 – Finding Insights from Your Model

Find two teams that had a competitive window of 2 or more consecutive seasons making the playoffs and that under performed your model’s expectations for them, losing series they were expected to win. Why do you think that happened? Classify one of them as bad luck and one of them as relating to a cause not currently accounted for in your model. If given more time and data, how would you use what you found to improve your model?

ANSWER :

Bad Luck

One team that was projected to go far in the playoffs for 2 consecutive years according to the model that under performed in real-life was the Portland Trailblazers during the 2018-19 & 2019-20 seasons. According to the model, the Trailblazers were projected to win the NBA Finals in 2018-19 but they got swept by the Golden State Warriors in the Western Conference Finals (Round 3) and the Trailblazers were projected to make it to the Western Conference Finals in 2019-20 but lost to the LA Lakers in the 1st Round.

I think the Trailblazers lack of success in real life in comparison to the model was mostly due to bad luck, particularly injuries. In 2018, Damian Lillard suffered a separated rib in Game 2 of the Western Conference Finals and was not at 100% for the rest of the series. Lillard averaged 33 ppg on 46% FG & 48% 3P in the 1st Round but only averaged 22.3 ppg on 37% FG and 37% 3P in the Conference Finals. It was a devastating blow for the Trailblazers not to have their star player be at full health against a great Warriors team. Also, the Trailblazers were missing their starting center Jusuf Nurkic that series (and the entire playoffs) due to a broken leg that took him out for the season. Nurkic was averaging more than 15 points & 10 rebounds on 51% FG and he would have been a significant addition in that series.

In 2019, Damian Lillard got hurt again in the playoffs - this time it was his knee in Game 4 of the 1st Round. The Trailblazers had won Game 1 and barely lost Game 3 which would have put them up 2-1 and if Damian Lillard doesn’t go down in Game 4, they possibly win that game putting them up 3-1. Also, they only lost by single-digits in Game 5 without Lillard and with him in the lineup, they easily could’ve won that game, and possibly win the series 4-1 or clinch in 6 or 7 games.

Inaccuracy of Model

Another team that was projected to go far in the playoffs for 2 consecutive years according to the model that under performed in real-life was the Boston Celtics in 2014-15 & 2015-16. The model had the Celtics going to the Finals both seasons but they lost in the 1st Round to the Cavs in 2014-15 and lost to the Hawks in the 1st Round against the Hawks.

I believe that this discrepancy between the model and real-life is due to how the model works.The model does not incorporate player stats, thus does not account for star power. In the NBA playoffs, it is important to have star players that can carry the load, especially offensively, as the rotations become tighter and the opposing defenses’ intensity rise and game planning improves; it is incumbent on the team’s stars to shine through and deliver. The Boston Celtics during this run, only had 1 All-Star caliber player in Isaiah Thomas which typically isn’t enough to be a Finals contender. I think the model overrated their regular season play (ELO rating) since they were a well-rounded team with a great coach in Brad Stevens and they won the games they didn’t lose to many bad teams (which would have significantly dropped their ELO) and had a wins against really good teams (significantly boosted their ELO).

Improving Model Based on Case Studies

Taking a look at the 2018-2019 TrailBlazers and the 2014-2015 Boston Celtics shows some of the weaknesses of the model mentioned earlier and makes it clear that the model does not account for injuries and level of talent (star power) on a team. In the future, I would want to incorporate a team’s health into the playoff simulation since it is such an important factor in a team winning a championship. I would have have a column that keeps track of availability of players and have that as a component of the simulation. Also, I would want to incorporate a team’s level of talent and give more favorable odds to teams with more All-Stars and players with high stats, especially points.To win championship, teams typically need at least 2 star players. Teams with good players and coaching may be able to have success in the regular season, such as the 60 win Atlanta Hawks in 2014-15, but not having at least 2 players that are All-NBA caliber players drastically decreases a team’s chances of winning a ring. Also, I might incorporate a team’s playoff experience in the model because young teams without deep playoff runs rarely win the championship. It’s typically teams who have players who have played in the Finals (or at least have had a couple of years making deep runs) that win. Take the Boston Celtics, for example; they lost in 2022 against the Warriors even though they had a more talented team and had over an 80% chance of winning according to ESPN’s BPI model. I think the experience of GSW’s main players like Curry and Draymond had a lot to do with it. They had been there and done that and were able to stay on course even after letting up Game 1 at home and being down 2-1 a the start of the series. And fast forward to this past Finals, the Celtics were able to beat the Mavericks and the Celtics core players had more experience in the playoffs and Finals compared to Dallas. Going up against the best teams (players and coaches) in the league that are specifically game-planning for you and playing in the brightest of lights where the pressure is on, you need players who know what to do in high-pressure situations, which comes with experience.

Data Science Project

Daniel Baptist

07/15/24

Introduction

Answers

Part 1

Part 2

Part 3

Setup and Data

Part 1 – Data Cleaning

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Part 2 – Playoffs Series Modeling

2014 Playoffs

Round 1

Round 2

Round 3

Finals

2015 Playoffs

Round 1

Round 2

Round 3

Finals

2016 Playoffs

Round 1

Round 2

Round 3

Finals

2017 Playoffs

Round 1

Round 2

Round 3

Finals

2018 Playoffs

Round 1

Round 2

Round 3

Finals

2019 Playoffs

Round 1

Round 2

Round 3

Finals

2020 Playoffs

Round 1

Round 2

Round 3

Finals

2021 Playoffs

Round 1

Round 2

Round 3

Finals

2022 Playoffs

Round 1

Round 2

Round 3

Finals

2023 Playoffs

Round 1

Round 2

Round 3

Finals

Overview of Model - Simple

Overview of Model - Detailed

Strengths & Weaknesses of Model

How to Address Weaknesses (#’s algin with # of weaknesses above - not all weaknesses are directly addressed)

2023 Plot of Playoff Teams Chances of Advancing to Next Round

Part 3 – Finding Insights from Your Model

Bad Luck

Inaccuracy of Model

Improving Model Based on Case Studies