In this project, I will attempt to perform statistical analysis and prediction models to determine the most important variables/factors that contribute to an NBA team winning the championship. Particularly, I will utilize both general variables to what most people would agree on that determines the NBA team’s chances to win the championship, such as the team’s regular season record and offensive efficiency, as well as more intangible variables that I believe have more subtle impact to an NBA team’s chances of winning a championship. These intangible variables may include how similar an NBA team’s roster is to last years or mental hangovers from winning an NBA championship the year before. These types of variables are what are going to make this project unique to other algorithms and models that try to predict the NBA team that wins the championship. My main passion for this project came from my gradual realization that lots of intangible variables potentially have a huge impact to an NBA team’s chances of winning an NBA championship.
To collect data, we observe the last 30 nba champions:
Teams <- c("Rockets", "Rockets", "Bulls", "Bulls", "Bulls", "Spurs", "Lakers", "Lakers", "Lakers", "Spurs", "Pistons", "Spurs", "Heat", "Spurs", "Celtics", "Lakers", "Lakers", "Mavericks", "Heat", "Heat", "Spurs", "Warriors", "Cavaliers", "Warriors", "Warriors", "Raptors", "Lakers", "Bucks", "Warriors", "Nuggets")
Lets also create a time factor called Year, where year 1 starts from 1994.
Year <- seq(1994, 2023)
Lets combine these two factors into a data frame:
main_data <- data.frame(
Year = Year,
Teams = Teams
)
Before we start adding predictors, lets quickly see which NBA teams have won the most championships the last 30 years:
unique_occurence <- unique(Teams)
color_palette <- rainbow(length(unique_occurence))
color_mapping <- setNames(color_palette, unique_occurence)
barplot(sort(table(Teams), decreasing = TRUE), cex.names = 0.5, col = color_mapping[Teams], main = "NBA Championships the last 30 Years")
The Lakers, Spurs, and Warriors have had the most success the last 30 years.
Now, we add variables that potentially determine the outcome of a team’s chances of an NBA championship. We need to remember that our goal is to capture factors that aren’t directly analytical. From my observations, one big determinant is experience in playoffs. This is a great determinant because having experiences gives a good indication on how old the roster is (assuming not much roster change) and how much persistence the teams have (assuming they didn’t win the championship).
For each team, we will observe how far they got into the playoffs for the previous four years. If a team has made it to the 1st round of the playoffs, that will count for one pt. For each playoff round they win, that will count for double the previous, except for the last round (finals). This is because the team that loses in the finals has still gained the maximum amount of experience in that year. To summarize, 1st round will gives 1 pt, 2nd round will give 2 pts, 3rd round will gives 4 pts, and making it to the finals will gives 8 pts. We now give more weight to the more recent years. The cumulative points based on how far each team got into the playoffs will be calculated, then those values will be multiplied by (5 - years past) to give heavier weight to teams that had more success in the recent years, 4 pts being last year and 1 pt being the oldest year.
experience <- c(4*2+2+1, 4*8+3*2+1, 4*2+3*2+2*8+8, 4*8+3*2+2*2+8, 4*8+3*8+2*2+2, 4*2 + 2*2+2, 4*2 + 3*4+2*2+1, 4*8+3*2+2*4+2, 4*8+3*8+2*2+4, 4*2+3*4+2+8, 4*4+3*2+1, 4*2+3*4+2+8, 4*4+3*2, 4*2+3*8+2*2+8, 2+1, 4*8+3+2, 4*8+3*8+2+1, 4+3*2+2+1, 4*8+3+2, 4*8+3*8+2+1, 4*8+3*4+2+2, 4+3*2, 4*8, 4*8+3*8+2+2, 4*8+3*8+2*8+1, 4*2+3*2+2*4+1, 0, 4*2+3*4+2+1, 2*8+8, 4+3*2+2*4+2)
boxplot(experience)
main_data <- data.frame(Year = Year, Teams = Teams, Experience = experience)
main_data
## Year Teams Experience
## 1 1994 Rockets 11
## 2 1995 Rockets 39
## 3 1996 Bulls 38
## 4 1997 Bulls 50
## 5 1998 Bulls 62
## 6 1999 Spurs 14
## 7 2000 Lakers 25
## 8 2001 Lakers 48
## 9 2002 Lakers 64
## 10 2003 Spurs 30
## 11 2004 Pistons 23
## 12 2005 Spurs 30
## 13 2006 Heat 22
## 14 2007 Spurs 44
## 15 2008 Celtics 3
## 16 2009 Lakers 37
## 17 2010 Lakers 59
## 18 2011 Mavericks 13
## 19 2012 Heat 37
## 20 2013 Heat 59
## 21 2014 Spurs 48
## 22 2015 Warriors 10
## 23 2016 Cavaliers 32
## 24 2017 Warriors 60
## 25 2018 Warriors 73
## 26 2019 Raptors 23
## 27 2020 Lakers 0
## 28 2021 Bucks 23
## 29 2022 Warriors 24
## 30 2023 Nuggets 20
The median will be more accurate than the mean in this case, since there are some clear outliers:
median(experience)
## [1] 31
Next, lets also each consider the nba team’s regular season performance of that same year. Because some seasons were shortened, we will only observe win %.
Win_Percentage <- c(.707, .573, .878, .841, .756, .740, .817, .683, .707, .732, .659, .720, .634, .707, .805, .793, .695, .695, .697, .805, .756, .817, .890, .817, .707, .707, .732, .639, .646, .646)
main_data <- data.frame(Year = Year, Teams = Teams, Experience = experience, Win_Percentage = Win_Percentage)
main_data
## Year Teams Experience Win_Percentage
## 1 1994 Rockets 11 0.707
## 2 1995 Rockets 39 0.573
## 3 1996 Bulls 38 0.878
## 4 1997 Bulls 50 0.841
## 5 1998 Bulls 62 0.756
## 6 1999 Spurs 14 0.740
## 7 2000 Lakers 25 0.817
## 8 2001 Lakers 48 0.683
## 9 2002 Lakers 64 0.707
## 10 2003 Spurs 30 0.732
## 11 2004 Pistons 23 0.659
## 12 2005 Spurs 30 0.720
## 13 2006 Heat 22 0.634
## 14 2007 Spurs 44 0.707
## 15 2008 Celtics 3 0.805
## 16 2009 Lakers 37 0.793
## 17 2010 Lakers 59 0.695
## 18 2011 Mavericks 13 0.695
## 19 2012 Heat 37 0.697
## 20 2013 Heat 59 0.805
## 21 2014 Spurs 48 0.756
## 22 2015 Warriors 10 0.817
## 23 2016 Cavaliers 32 0.890
## 24 2017 Warriors 60 0.817
## 25 2018 Warriors 73 0.707
## 26 2019 Raptors 23 0.707
## 27 2020 Lakers 0 0.732
## 28 2021 Bucks 23 0.639
## 29 2022 Warriors 24 0.646
## 30 2023 Nuggets 20 0.646
Now that we’ve added a couple factors within our data set, lets see if experience/win percentage is mattering more/less/the same over time.
experience_over_time <- lm(experience ~ Year)
Win_Percentage_over_time <- lm(Win_Percentage ~ Year)
plot(Year, experience, main = "Experience over Time")
abline(experience_over_time)
plot(Year, Win_Percentage, main = "Win % over Time")
abline(Win_Percentage_over_time)
Both experience and win % still seems to matter, but there is a slight decline over time. We will not regress experience and win % over time, since it does not make intuitive sense to think that NBA teams with less experience and win % will have more success over time.
Now to add more predictors. Star power also seems to be a big determinant of championship outcomes. The easiest way to objectively measure this is to see if any players on the NBA championship teams have won an MVP in their past career. The recency of when the player won the MVP/came close to winning MVP won’t matter, since they’ve already validated themselves as a star player at that point. Furthermore, it is harder for star players to win MVP when they join another team that already has star players. However, they must have been a top 3 scorer within their team, since old MVP players may not contribute as much to an NBA team. Finishing 4th in the MVP race will count for 1 pt, and that value will be doubled for each place further. Additionally, MVP finishes within the championship year will not be counted, since mvp results come after the playoffs start, and our goal is to predict the NBA champion before the start of the playoffs.
Star_players <- c("Hakeem Olajuwwon", "Hakeem Olajuwwon", "Michael Jordan", "Michael Jordan", "Michael Jordan", "Tim Duncan and David Robinson", "Shaq O'Neal", "Shaq O'Neal", "Shaq O'Neal", "Tim Duncan", "None", "Tim Duncan", "Shaq O'Neal", "Tim Duncan", "Kevin Garnett", "Kobe Bryant", "Kobe Bryant", "Dirk Nowitzki", "Lebron James and Dwayne Wade", "Lebron James and Dwayne Wade", "Tim Duncan", "None","Lebron James", "Stephen Curry and Kevin Durant", "Stephen Curry and Kevin Durant", "Kawhi Leonard", "Lebron James and Anthony Davis", "Giannis Antetokounmpo", "Stephen Curry", "Nikola Jokic")
Star_power <- c(4+1, 8+4+1, 2+8+8+2+4+8+4, 2+8+8+2+4+8+4+8, 2+8+8+2+4+8+4+8+4, 1+4+8+4+2+2, 1+4+1, 1+4+1+8, 1+4+1+8+2, 8+4+2, 0, 8+4+2+8+4, 1+4+1+8+2+4, 8+4+2+8+4+1+4, 4+4+8, 2+1+2+8, 2+1+2+8+4, 8+2+2, 2+8+8+1+4+2, 2+8+8+1+4+2+8, 8+4+2+8+4+1+4+2, 0, 2+8+8+1+4+4+2+2, 8+8+8+4+4+4, 8+8+8+4+4+4, 2+4, 2+8+8+1+4+4+2+2+1+4+2, 1+8+8, 8+8+2, 8+8+4)
main_data <- data.frame(Year = Year, Teams = Teams, Experience = experience, Win_Percentage = Win_Percentage, Star_Players = Star_players, Star_power = Star_power)
main_data
## Year Teams Experience Win_Percentage Star_Players
## 1 1994 Rockets 11 0.707 Hakeem Olajuwwon
## 2 1995 Rockets 39 0.573 Hakeem Olajuwwon
## 3 1996 Bulls 38 0.878 Michael Jordan
## 4 1997 Bulls 50 0.841 Michael Jordan
## 5 1998 Bulls 62 0.756 Michael Jordan
## 6 1999 Spurs 14 0.740 Tim Duncan and David Robinson
## 7 2000 Lakers 25 0.817 Shaq O'Neal
## 8 2001 Lakers 48 0.683 Shaq O'Neal
## 9 2002 Lakers 64 0.707 Shaq O'Neal
## 10 2003 Spurs 30 0.732 Tim Duncan
## 11 2004 Pistons 23 0.659 None
## 12 2005 Spurs 30 0.720 Tim Duncan
## 13 2006 Heat 22 0.634 Shaq O'Neal
## 14 2007 Spurs 44 0.707 Tim Duncan
## 15 2008 Celtics 3 0.805 Kevin Garnett
## 16 2009 Lakers 37 0.793 Kobe Bryant
## 17 2010 Lakers 59 0.695 Kobe Bryant
## 18 2011 Mavericks 13 0.695 Dirk Nowitzki
## 19 2012 Heat 37 0.697 Lebron James and Dwayne Wade
## 20 2013 Heat 59 0.805 Lebron James and Dwayne Wade
## 21 2014 Spurs 48 0.756 Tim Duncan
## 22 2015 Warriors 10 0.817 None
## 23 2016 Cavaliers 32 0.890 Lebron James
## 24 2017 Warriors 60 0.817 Stephen Curry and Kevin Durant
## 25 2018 Warriors 73 0.707 Stephen Curry and Kevin Durant
## 26 2019 Raptors 23 0.707 Kawhi Leonard
## 27 2020 Lakers 0 0.732 Lebron James and Anthony Davis
## 28 2021 Bucks 23 0.639 Giannis Antetokounmpo
## 29 2022 Warriors 24 0.646 Stephen Curry
## 30 2023 Nuggets 20 0.646 Nikola Jokic
## Star_power
## 1 5
## 2 13
## 3 36
## 4 44
## 5 48
## 6 21
## 7 6
## 8 14
## 9 16
## 10 14
## 11 0
## 12 26
## 13 20
## 14 31
## 15 16
## 16 13
## 17 17
## 18 12
## 19 25
## 20 33
## 21 33
## 22 0
## 23 31
## 24 36
## 25 36
## 26 6
## 27 38
## 28 17
## 29 18
## 30 20
Now, lets another less analytical predictor. We want to add a predictor that measures the play style of the NBA teams. We want to measure how often of a team’s shots made are by assists. This is an attempt to measure how efficient the team is offensively.
score_efficiency <- c(25.5/39.0, 25.1/38.5, 24.8/40.2, 26.1/40.0, 23.8/37.4, 22.0/34.8, 23.4/38.3, 23.0/37.9, 23.0/38.4, 19.8/35.5, 20.8/33.5, 21.6/35.6, 20.6/37.1, 22.1/36.6, 22.4/36.4, 23.3/40.3, 21.1/38.3, 23.8/38.7, 20.0/37.1, 23.0/38.4, 25.2/40.6, 27.4/41.6, 22.7/38.7, 30.4/43.1, 29.3/42.8, 25.4/42.2, 25.4/42.3, 25.5/44.7, 27.1/40.5, 28.9/43.6)
main_data <- data.frame(Year = Year, Teams = Teams, Experience = experience, Win_Percentage = Win_Percentage, Score_efficiency = score_efficiency, Star_Players = Star_players, Star_power = Star_power)
main_data
## Year Teams Experience Win_Percentage Score_efficiency
## 1 1994 Rockets 11 0.707 0.6538462
## 2 1995 Rockets 39 0.573 0.6519481
## 3 1996 Bulls 38 0.878 0.6169154
## 4 1997 Bulls 50 0.841 0.6525000
## 5 1998 Bulls 62 0.756 0.6363636
## 6 1999 Spurs 14 0.740 0.6321839
## 7 2000 Lakers 25 0.817 0.6109661
## 8 2001 Lakers 48 0.683 0.6068602
## 9 2002 Lakers 64 0.707 0.5989583
## 10 2003 Spurs 30 0.732 0.5577465
## 11 2004 Pistons 23 0.659 0.6208955
## 12 2005 Spurs 30 0.720 0.6067416
## 13 2006 Heat 22 0.634 0.5552561
## 14 2007 Spurs 44 0.707 0.6038251
## 15 2008 Celtics 3 0.805 0.6153846
## 16 2009 Lakers 37 0.793 0.5781638
## 17 2010 Lakers 59 0.695 0.5509138
## 18 2011 Mavericks 13 0.695 0.6149871
## 19 2012 Heat 37 0.697 0.5390836
## 20 2013 Heat 59 0.805 0.5989583
## 21 2014 Spurs 48 0.756 0.6206897
## 22 2015 Warriors 10 0.817 0.6586538
## 23 2016 Cavaliers 32 0.890 0.5865633
## 24 2017 Warriors 60 0.817 0.7053364
## 25 2018 Warriors 73 0.707 0.6845794
## 26 2019 Raptors 23 0.707 0.6018957
## 27 2020 Lakers 0 0.732 0.6004728
## 28 2021 Bucks 23 0.639 0.5704698
## 29 2022 Warriors 24 0.646 0.6691358
## 30 2023 Nuggets 20 0.646 0.6628440
## Star_Players Star_power
## 1 Hakeem Olajuwwon 5
## 2 Hakeem Olajuwwon 13
## 3 Michael Jordan 36
## 4 Michael Jordan 44
## 5 Michael Jordan 48
## 6 Tim Duncan and David Robinson 21
## 7 Shaq O'Neal 6
## 8 Shaq O'Neal 14
## 9 Shaq O'Neal 16
## 10 Tim Duncan 14
## 11 None 0
## 12 Tim Duncan 26
## 13 Shaq O'Neal 20
## 14 Tim Duncan 31
## 15 Kevin Garnett 16
## 16 Kobe Bryant 13
## 17 Kobe Bryant 17
## 18 Dirk Nowitzki 12
## 19 Lebron James and Dwayne Wade 25
## 20 Lebron James and Dwayne Wade 33
## 21 Tim Duncan 33
## 22 None 0
## 23 Lebron James 31
## 24 Stephen Curry and Kevin Durant 36
## 25 Stephen Curry and Kevin Durant 36
## 26 Kawhi Leonard 6
## 27 Lebron James and Anthony Davis 38
## 28 Giannis Antetokounmpo 17
## 29 Stephen Curry 18
## 30 Nikola Jokic 20
mean(score_efficiency)
## [1] 0.615438
On average, 61.5% of field goals made were from assists by NBA champions. We will have to measure the current shot efficiency to see if this measurement is a useful determinant.
Before we start weighing these predictors, lets add one more predictor: defense. Since teams offensively play at a different pace, points or field goals allowed will not be an accurate measurement. Again, we will measure based on percentage: opponent’s field goal percentage.
defense <- c(.440, .453, .448, .436, .431, .402, .425, .438, .424, .427, .413, .426, .440, .443, .419, .447, .446, .450, .434, .440, .444, .428, .435, .435, .447, .449, .448, .456, .438, .478)
main_data <- data.frame(Year = Year, Teams = Teams, Experience = experience, Win_Percentage = Win_Percentage, Score_efficiency = score_efficiency, Star_Players = Star_players, Star_power = Star_power, Defense = defense)
main_data
## Year Teams Experience Win_Percentage Score_efficiency
## 1 1994 Rockets 11 0.707 0.6538462
## 2 1995 Rockets 39 0.573 0.6519481
## 3 1996 Bulls 38 0.878 0.6169154
## 4 1997 Bulls 50 0.841 0.6525000
## 5 1998 Bulls 62 0.756 0.6363636
## 6 1999 Spurs 14 0.740 0.6321839
## 7 2000 Lakers 25 0.817 0.6109661
## 8 2001 Lakers 48 0.683 0.6068602
## 9 2002 Lakers 64 0.707 0.5989583
## 10 2003 Spurs 30 0.732 0.5577465
## 11 2004 Pistons 23 0.659 0.6208955
## 12 2005 Spurs 30 0.720 0.6067416
## 13 2006 Heat 22 0.634 0.5552561
## 14 2007 Spurs 44 0.707 0.6038251
## 15 2008 Celtics 3 0.805 0.6153846
## 16 2009 Lakers 37 0.793 0.5781638
## 17 2010 Lakers 59 0.695 0.5509138
## 18 2011 Mavericks 13 0.695 0.6149871
## 19 2012 Heat 37 0.697 0.5390836
## 20 2013 Heat 59 0.805 0.5989583
## 21 2014 Spurs 48 0.756 0.6206897
## 22 2015 Warriors 10 0.817 0.6586538
## 23 2016 Cavaliers 32 0.890 0.5865633
## 24 2017 Warriors 60 0.817 0.7053364
## 25 2018 Warriors 73 0.707 0.6845794
## 26 2019 Raptors 23 0.707 0.6018957
## 27 2020 Lakers 0 0.732 0.6004728
## 28 2021 Bucks 23 0.639 0.5704698
## 29 2022 Warriors 24 0.646 0.6691358
## 30 2023 Nuggets 20 0.646 0.6628440
## Star_Players Star_power Defense
## 1 Hakeem Olajuwwon 5 0.440
## 2 Hakeem Olajuwwon 13 0.453
## 3 Michael Jordan 36 0.448
## 4 Michael Jordan 44 0.436
## 5 Michael Jordan 48 0.431
## 6 Tim Duncan and David Robinson 21 0.402
## 7 Shaq O'Neal 6 0.425
## 8 Shaq O'Neal 14 0.438
## 9 Shaq O'Neal 16 0.424
## 10 Tim Duncan 14 0.427
## 11 None 0 0.413
## 12 Tim Duncan 26 0.426
## 13 Shaq O'Neal 20 0.440
## 14 Tim Duncan 31 0.443
## 15 Kevin Garnett 16 0.419
## 16 Kobe Bryant 13 0.447
## 17 Kobe Bryant 17 0.446
## 18 Dirk Nowitzki 12 0.450
## 19 Lebron James and Dwayne Wade 25 0.434
## 20 Lebron James and Dwayne Wade 33 0.440
## 21 Tim Duncan 33 0.444
## 22 None 0 0.428
## 23 Lebron James 31 0.435
## 24 Stephen Curry and Kevin Durant 36 0.435
## 25 Stephen Curry and Kevin Durant 36 0.447
## 26 Kawhi Leonard 6 0.449
## 27 Lebron James and Anthony Davis 38 0.448
## 28 Giannis Antetokounmpo 17 0.456
## 29 Stephen Curry 18 0.438
## 30 Nikola Jokic 20 0.478
Besides a couple of the recent champions, Denver Nuggets and the Milwaukee Bucks, all teams had an defensive efficiency of .450 or less. Perhaps defense does not matter as in the recent years.
defense_over_time <- lm(defense ~ Year)
plot(Year, defense, main = "Opponent's FG% over Time")
abline(defense_over_time)
There is a clear increase in opponent’s FG% allowed over time. Intuitively, it does not make sense that for championship winning team’s defense to get worse. Thus, we will completely ignore this predictor for our model.
Lets see what we have so far: We will use the median for star power and experience since there are clear outliers in the data. However, we cannot just ignore these outliers, since common occurrences of these “outliers” can possibly occur again. Thus, our solution is to weigh these factors a little less compared to other factors.
median(experience)
## [1] 31
median(Star_power)
## [1] 19
Next, we have win percentage and score efficiency. Because there are no outliers in this data compared to the previous two, we can use the mean statistic to calculate the general value of these measurements.
mean(score_efficiency)
## [1] 0.615438
mean(Win_Percentage)
## [1] 0.7333667
Now, lets calculate some of the NBA team’s necessary statistics for the 2034-2024 season. This is going to be calculated during the NBA break weekend, so the data will have to be updated when the regular season ends. For the sake of efficiency, we will only consider the top 6 teams of each conference. When the regular season ends, we will consider all teams that are in the playoffs. The first index of each team will be score efficiency, then win percentage, then experience, then star power.
Timberwolves <- c(26.4/41.4, .709, 4*1+3*1, 0)
Thunder <- c(27.3/44.5, .685, 1, 0)
Clippers <- c(26.0/43.0, .679, 4*1+2*4+1*2, 2+2+4+2+4+8+4+4)
Nuggets <- c(28.6/43.4, .655, 4*4+3+2*4+2, 4+8+8+1)
Suns <- c(26.8/42.7, .600, 4*2+3*2+2*8, 4+4+4+8+1)
Pelicans <- c(27.1/42.9, .600, 2, 0)
Celtics <- c(26.2/43.5, .782, 4*3+3*8+2+4, 1)
Cavaliers <- c(27.3/42.8, .679, 4,0)
Bucks <- c(26.7/44.1, .625, 4+3*2+8*2+2, 8+8+1+2+2+1)
Knicks <- c(23.7/41.7, .600, 4*2+2, 0)
Sixers <- c(24.9/42.5, .593, 4*2+3*2+2*2+1, 4+4+8)
Pacers <- c(30.9/46.8, .554, 1, 0)
When we calculated score efficiency, we were doubtful if the predictor had any use. A good way to determine this is if we calculate the mean/median of score efficiency for the 2023-2024 nba teams, and compare it to the nba champions.
current_score_efficiency <- c(Thunder[1], Timberwolves[1], Clippers[1], Nuggets[1], Suns[1], Pelicans[1], Celtics[1], Cavaliers[1], Bucks[1], Knicks[1], Sixers[1], Pacers[1])
mean(current_score_efficiency)
## [1] 0.6195178
The average score efficiency of the 2024 teams (0.613) is not too much different from the average score efficiency of the nba champions. Therefore, we will exclude this variable as well. We only have three variables left that can help us with our prediction model: win percentage, experience, and star power.
Now, we need to figure our two things: How we will weigh these variables in terms of importance relative to another, and how we will use these predictor values of the nba champions to compare it to the current values for the 2024 nba teams.
For weighing the variables, we will weigh experience and star power less. This is because lots of the current teams have 0 experience/star power, and giving too much credit to the teams that do have experience may lead to too much bias. Therefore, win % will take 50% of the weights, and experience and star power taking 25% of the weights each.
For the algorithm of how we will compare the values between the current teams and the past champions, any team that has values closer to the median experience/star power of the past champions will get more credit, and any team that has more will only get partial credit (half more), since having too much wins, star power, or experience (due to age) may not necessarily be a good thing.
In order to weigh these variables properly, we can take advantage of the median/mean values to scale the values properly. If we say that the median of our experience is the baseline (31), we must create the median of star power (19) and mean of the win percentage (0.733) equivalent to 31. To do so, we can just divide the baseline value by the value we want to scale, and multiply that result by each team’s non-scaled values of star power and win percentage. However, we must also multiply win percentage by 2 afterwards, since we are weighing them heavier than the other two variables.
total_credit <- function(team) {
win_diff <- team[2] - mean(Win_Percentage)
if (win_diff > 0) {
win_diff <- (1/2) * (team[2] - mean(Win_Percentage))
}
exp_diff <- team[3] - mean(experience)
if (exp_diff > 0) {
exp_diff <- (1/2) * (team[3] - mean(experience))
}
star_diff <- team[4] - mean(Star_power)
if (star_diff > 0) {
star_diff <- (1/2) * (team[4] - mean(Star_power))
}
win_diff * median(experience)/mean(Win_Percentage) * 2 + exp_diff + star_diff * median(experience)/median(Star_power)
}
Lets see how this works with one every first:
total_credit(Timberwolves)
## [1] -64.17228
total_credit(Thunder)
## [1] -72.20128
total_credit(Clippers)
## [1] -17.69537
total_credit(Nuggets)
## [1] -12.47437
total_credit(Suns)
## [1] -16.12416
total_credit(Pelicans)
## [1] -78.38731
total_credit(Celtics)
## [1] -27.40826
total_credit(Cavaliers)
## [1] -69.70853
total_credit(Bucks)
## [1] -14.78693
total_credit(Knicks)
## [1] -70.38731
total_credit(Sixers)
## [1] -35.87384
total_credit(Pacers)
## [1] -83.27623
The results being mostly negative makes sense, since only one of these teams will win the championship. Since these values are from the differences of the current nba teams’ values and the championship team’s values, generally, the higher the value of the output, the better chances the team has at winning the championship. Therefore, we can add 100 from all of these values, to give more credit to the teams with lower output.
100 + total_credit(Timberwolves)
## [1] 35.82772
100 + total_credit(Thunder)
## [1] 27.79872
100 + total_credit(Clippers)
## [1] 82.30463
100 + total_credit(Nuggets)
## [1] 87.52563
100 + total_credit(Suns)
## [1] 83.87584
100 + total_credit(Pelicans)
## [1] 21.61269
100 + total_credit(Celtics)
## [1] 72.59174
100 + total_credit(Cavaliers)
## [1] 30.29147
100 + total_credit(Bucks)
## [1] 85.21307
100 + total_credit(Knicks)
## [1] 29.61269
100 + total_credit(Sixers)
## [1] 64.12616
100 + total_credit(Pacers)
## [1] 16.72377
Now, we can divide each of these numbers by the total output by all teams.
total_credit <- 100 + total_credit(Timberwolves) + 100 + total_credit(Thunder) + 100 + total_credit(Clippers) + 100 + total_credit(Nuggets) + 100 + total_credit(Suns) + 100 + total_credit(Pelicans) + 100 + total_credit(Celtics) + 100 + total_credit(Cavaliers) + 100 + total_credit(Bucks) + 100 + total_credit(Knicks) + 100 + total_credit(Sixers) + 100 + total_credit(Pacers)
Contenders <- c("Timberwolves", "Thunder", "Clippers", "Nuggets", "Suns", "Pelicans", "Celtics", "Cavaliers", "Bucks", "Knicks", "Sixers", "Pacers")
Chances <- c(35.82772/637.5041,27.79872/637.5041, 82.30463/637.5041, 87.52563/637.5041, 83.87584/637.5041, 21.61269/637.5041, 72.59174/637.5041, 30.29147/637.5041, 85.21307/637.5041, 29.61269/637.5041, 64.12616/637.5041, 16.72377/637.5041)
ContendersChances <- data.frame(Contenders = Contenders, Chances = Chances)
NewContendersChances <- ContendersChances[order(-ContendersChances$Chances), ]
NewContendersChances
## Contenders Chances
## 4 Nuggets 0.13729422
## 9 Bucks 0.13366670
## 5 Suns 0.13156910
## 3 Clippers 0.12910447
## 7 Celtics 0.11386866
## 11 Sixers 0.10058941
## 1 Timberwolves 0.05619998
## 8 Cavaliers 0.04751573
## 10 Knicks 0.04645098
## 2 Thunder 0.04360555
## 6 Pelicans 0.03390204
## 12 Pacers 0.02623320
What we did was first collect the useful data for the past 30 nba teams that can potentially determine the outcome of an NBA championship team. We took the mean/median of these values so that we could compare these values to the current NBA teams. Values that were too similar to an average team or seemed intuitively useless were removed from consideration, which ended up being the opponent’s field goal percentage and score efficiency (assists divided by field goals made). The variables that proved to be useful were win percentage, experience, and star power. These variables were measured using a systematic and consistent algorithm but were still subjective by construction.
We then used these three predictive variables to find the mean/median of the variables from all of the NBA champions, depending on if there were outliers or not. We then collected the data for all of the top 6 current NBA teams of each conference that were recorded during the break of the season.
We were able to give each team their “credit” value by measuring how similar their own predictive variable to the mean/median of the previous NBA championship teams, and also give additional credit if their measurement for any of the variables were higher than the past mean/median. We then weighted these three values, giving higher weights to win percentage and lower weights to experience and star power, since the latter variables contained some outliers.
Finally, we were able to get the output in the form of a proportion by adding all the teams’ credit output, and dividing each team’s own credit by the total credit output.
Based on our final model that uses predictors of win percentage, experience, and star power, our results tell us that the Nuggets have the highest chance to win the NBA champinship at 13.7%, followed by the Bucks at 13.3% and Suns at 13.1%.
There are many flaws and acknowledgements that can be mentioned for how I completed this project, but I will try to address the most important ones.
Firstly, there is a good chance we missed a lot of other predictors that could have been more useful for our predictive model than we used. My original reason for the project was to use more intangible factors to predict the next NBA champion, but that turned out to be harder than I thought because of time constraints and inability to find simiilar statistical patterns of the NBA championship teams.
We also did not utilize time within our dataset, and gave the same weights to the statistics of a team 30 years ago and one year ago. Although we did observe some of the variables on a plot and remove them if they were changing over time, our model would have been more accurate if we gave more weights to teams that have won the championship more recently, as well as assessing the trends and using differentiation to predict the outcome instead of just finding the mean or median of the variables.
Finally, we only included the top 6 teams of each conference to consider the outcome of the championship, when ideally we could have included more or less. Although history shows us that only teams that finish top 3 within their conference have a realistic shot of winning the championship, our model should have included all teams that are in the playoffs for formality.
There are many other errors/flaws in this project that may have occured without notice as well. Additionally, there was a lot of subjectivity to creating the algorithms for the weights of experience, star power, etc. Realistically, because the game of Basketball involves more than just statistics, it very difficult to mathematically predict the future NBA championship team without some bias and subjectivity. But with that consideration, we were able to create a statistical model includes more intangible variables that are difficult to quantify, such as star power and experience, that distributes the chance of winning to each contending NBA team.
ChatGPT
https://www.basketball-reference.com/
https://basketball.realgm.com/nba/teams/New-York-Knicks/20/Playoff-History