Part One: Introduction

In this project, I will attempt to perform statistical analysis and prediction models to determine the most important variables/factors that contribute to an NBA team winning the championship. Particularly, I will utilize both general variables to what most people would agree on that determines the NBA team’s chances to win the championship, such as the team’s regular season record and offensive efficiency, as well as more intangible variables that I believe have more subtle impact to an NBA team’s chances of winning a championship. These intangible variables may include how similar an NBA team’s roster is to last years or mental hangovers from winning an NBA championship the year before. These types of variables are what are going to make this project unique to other algorithms and models that try to predict the NBA team that wins the championship. My main passion for this project came from my gradual realization that lots of intangible variables potentially have a huge impact to an NBA team’s chances of winning an NBA championship.

Part Two: Research and Preparation of the data frame

To collect data, we observe the last 30 nba champions:

Teams <- c("Rockets", "Rockets", "Bulls", "Bulls", "Bulls", "Spurs", "Lakers", "Lakers", "Lakers", "Spurs", "Pistons", "Spurs", "Heat", "Spurs", "Celtics", "Lakers", "Lakers", "Mavericks", "Heat", "Heat", "Spurs", "Warriors", "Cavaliers", "Warriors", "Warriors", "Raptors", "Lakers", "Bucks", "Warriors", "Nuggets")

Lets also create a time factor called Year, where year 1 starts from 1994.

Year <- seq(1994, 2023)

Lets combine these two factors into a data frame:

main_data <- data.frame(
  Year = Year,
  Teams = Teams
)

Before we start adding predictors, lets quickly see which NBA teams have won the most championships the last 30 years:

unique_occurence <- unique(Teams)
color_palette <- rainbow(length(unique_occurence))
color_mapping <- setNames(color_palette, unique_occurence)

barplot(sort(table(Teams), decreasing = TRUE), cex.names = 0.5, col = color_mapping[Teams], main = "NBA Championships the last 30 Years")

The Lakers, Spurs, and Warriors have had the most success the last 30 years.

Now, we add variables that potentially determine the outcome of a team’s chances of an NBA championship. We need to remember that our goal is to capture factors that aren’t directly analytical. From my observations, one big determinant is experience in playoffs. This is a great determinant because having experiences gives a good indication on how old the roster is (assuming not much roster change) and how much persistence the teams have (assuming they didn’t win the championship).

For each team, we will observe how far they got into the playoffs for the previous four years. If a team has made it to the 1st round of the playoffs, that will count for one pt. For each playoff round they win, that will count for double the previous, except for the last round (finals). This is because the team that loses in the finals has still gained the maximum amount of experience in that year. To summarize, 1st round will gives 1 pt, 2nd round will give 2 pts, 3rd round will gives 4 pts, and making it to the finals will gives 8 pts. We now give more weight to the more recent years. The cumulative points based on how far each team got into the playoffs will be calculated, then those values will be multiplied by (5 - years past) to give heavier weight to teams that had more success in the recent years, 4 pts being last year and 1 pt being the oldest year.

experience <- c(4*2+2+1, 4*8+3*2+1, 4*2+3*2+2*8+8, 4*8+3*2+2*2+8, 4*8+3*8+2*2+2, 4*2 + 2*2+2, 4*2 + 3*4+2*2+1, 4*8+3*2+2*4+2, 4*8+3*8+2*2+4, 4*2+3*4+2+8, 4*4+3*2+1, 4*2+3*4+2+8, 4*4+3*2, 4*2+3*8+2*2+8, 2+1, 4*8+3+2, 4*8+3*8+2+1, 4+3*2+2+1, 4*8+3+2, 4*8+3*8+2+1, 4*8+3*4+2+2, 4+3*2, 4*8, 4*8+3*8+2+2, 4*8+3*8+2*8+1, 4*2+3*2+2*4+1, 0, 4*2+3*4+2+1, 2*8+8, 4+3*2+2*4+2)
boxplot(experience)

main_data <- data.frame(Year = Year, Teams = Teams, Experience = experience)
main_data
##    Year     Teams Experience
## 1  1994   Rockets         11
## 2  1995   Rockets         39
## 3  1996     Bulls         38
## 4  1997     Bulls         50
## 5  1998     Bulls         62
## 6  1999     Spurs         14
## 7  2000    Lakers         25
## 8  2001    Lakers         48
## 9  2002    Lakers         64
## 10 2003     Spurs         30
## 11 2004   Pistons         23
## 12 2005     Spurs         30
## 13 2006      Heat         22
## 14 2007     Spurs         44
## 15 2008   Celtics          3
## 16 2009    Lakers         37
## 17 2010    Lakers         59
## 18 2011 Mavericks         13
## 19 2012      Heat         37
## 20 2013      Heat         59
## 21 2014     Spurs         48
## 22 2015  Warriors         10
## 23 2016 Cavaliers         32
## 24 2017  Warriors         60
## 25 2018  Warriors         73
## 26 2019   Raptors         23
## 27 2020    Lakers          0
## 28 2021     Bucks         23
## 29 2022  Warriors         24
## 30 2023   Nuggets         20

The median will be more accurate than the mean in this case, since there are some clear outliers:

median(experience)
## [1] 31

Next, lets also each consider the nba team’s regular season performance of that same year. Because some seasons were shortened, we will only observe win %.

Win_Percentage <- c(.707, .573, .878, .841, .756, .740, .817, .683, .707, .732, .659, .720, .634, .707, .805, .793, .695, .695, .697, .805, .756, .817, .890, .817, .707, .707, .732, .639, .646, .646)
main_data <- data.frame(Year = Year, Teams = Teams, Experience = experience, Win_Percentage = Win_Percentage)
main_data
##    Year     Teams Experience Win_Percentage
## 1  1994   Rockets         11          0.707
## 2  1995   Rockets         39          0.573
## 3  1996     Bulls         38          0.878
## 4  1997     Bulls         50          0.841
## 5  1998     Bulls         62          0.756
## 6  1999     Spurs         14          0.740
## 7  2000    Lakers         25          0.817
## 8  2001    Lakers         48          0.683
## 9  2002    Lakers         64          0.707
## 10 2003     Spurs         30          0.732
## 11 2004   Pistons         23          0.659
## 12 2005     Spurs         30          0.720
## 13 2006      Heat         22          0.634
## 14 2007     Spurs         44          0.707
## 15 2008   Celtics          3          0.805
## 16 2009    Lakers         37          0.793
## 17 2010    Lakers         59          0.695
## 18 2011 Mavericks         13          0.695
## 19 2012      Heat         37          0.697
## 20 2013      Heat         59          0.805
## 21 2014     Spurs         48          0.756
## 22 2015  Warriors         10          0.817
## 23 2016 Cavaliers         32          0.890
## 24 2017  Warriors         60          0.817
## 25 2018  Warriors         73          0.707
## 26 2019   Raptors         23          0.707
## 27 2020    Lakers          0          0.732
## 28 2021     Bucks         23          0.639
## 29 2022  Warriors         24          0.646
## 30 2023   Nuggets         20          0.646

Now that we’ve added a couple factors within our data set, lets see if experience/win percentage is mattering more/less/the same over time.

experience_over_time <- lm(experience ~ Year)
Win_Percentage_over_time <- lm(Win_Percentage ~ Year)
plot(Year, experience, main = "Experience over Time")
abline(experience_over_time)

plot(Year, Win_Percentage, main = "Win % over Time")
abline(Win_Percentage_over_time)

Both experience and win % still seems to matter, but there is a slight decline over time. We will not regress experience and win % over time, since it does not make intuitive sense to think that NBA teams with less experience and win % will have more success over time.

Now to add more predictors. Star power also seems to be a big determinant of championship outcomes. The easiest way to objectively measure this is to see if any players on the NBA championship teams have won an MVP in their past career. The recency of when the player won the MVP/came close to winning MVP won’t matter, since they’ve already validated themselves as a star player at that point. Furthermore, it is harder for star players to win MVP when they join another team that already has star players. However, they must have been a top 3 scorer within their team, since old MVP players may not contribute as much to an NBA team. Finishing 4th in the MVP race will count for 1 pt, and that value will be doubled for each place further. Additionally, MVP finishes within the championship year will not be counted, since mvp results come after the playoffs start, and our goal is to predict the NBA champion before the start of the playoffs.

Star_players <- c("Hakeem Olajuwwon", "Hakeem Olajuwwon", "Michael Jordan", "Michael Jordan", "Michael Jordan", "Tim Duncan and David Robinson", "Shaq O'Neal", "Shaq O'Neal", "Shaq O'Neal", "Tim Duncan", "None", "Tim Duncan", "Shaq O'Neal", "Tim Duncan", "Kevin Garnett", "Kobe Bryant", "Kobe Bryant", "Dirk Nowitzki", "Lebron James and Dwayne Wade", "Lebron James and Dwayne Wade", "Tim Duncan", "None","Lebron James", "Stephen Curry and Kevin Durant", "Stephen Curry and Kevin Durant", "Kawhi Leonard", "Lebron James and Anthony Davis", "Giannis Antetokounmpo", "Stephen Curry", "Nikola Jokic")
Star_power <- c(4+1, 8+4+1, 2+8+8+2+4+8+4, 2+8+8+2+4+8+4+8, 2+8+8+2+4+8+4+8+4, 1+4+8+4+2+2, 1+4+1, 1+4+1+8, 1+4+1+8+2, 8+4+2, 0, 8+4+2+8+4, 1+4+1+8+2+4, 8+4+2+8+4+1+4, 4+4+8, 2+1+2+8, 2+1+2+8+4, 8+2+2, 2+8+8+1+4+2, 2+8+8+1+4+2+8, 8+4+2+8+4+1+4+2, 0, 2+8+8+1+4+4+2+2, 8+8+8+4+4+4, 8+8+8+4+4+4, 2+4, 2+8+8+1+4+4+2+2+1+4+2, 1+8+8, 8+8+2, 8+8+4)
main_data <- data.frame(Year = Year, Teams = Teams, Experience = experience, Win_Percentage = Win_Percentage, Star_Players = Star_players, Star_power = Star_power)
main_data
##    Year     Teams Experience Win_Percentage                   Star_Players
## 1  1994   Rockets         11          0.707               Hakeem Olajuwwon
## 2  1995   Rockets         39          0.573               Hakeem Olajuwwon
## 3  1996     Bulls         38          0.878                 Michael Jordan
## 4  1997     Bulls         50          0.841                 Michael Jordan
## 5  1998     Bulls         62          0.756                 Michael Jordan
## 6  1999     Spurs         14          0.740  Tim Duncan and David Robinson
## 7  2000    Lakers         25          0.817                    Shaq O'Neal
## 8  2001    Lakers         48          0.683                    Shaq O'Neal
## 9  2002    Lakers         64          0.707                    Shaq O'Neal
## 10 2003     Spurs         30          0.732                     Tim Duncan
## 11 2004   Pistons         23          0.659                           None
## 12 2005     Spurs         30          0.720                     Tim Duncan
## 13 2006      Heat         22          0.634                    Shaq O'Neal
## 14 2007     Spurs         44          0.707                     Tim Duncan
## 15 2008   Celtics          3          0.805                  Kevin Garnett
## 16 2009    Lakers         37          0.793                    Kobe Bryant
## 17 2010    Lakers         59          0.695                    Kobe Bryant
## 18 2011 Mavericks         13          0.695                  Dirk Nowitzki
## 19 2012      Heat         37          0.697   Lebron James and Dwayne Wade
## 20 2013      Heat         59          0.805   Lebron James and Dwayne Wade
## 21 2014     Spurs         48          0.756                     Tim Duncan
## 22 2015  Warriors         10          0.817                           None
## 23 2016 Cavaliers         32          0.890                   Lebron James
## 24 2017  Warriors         60          0.817 Stephen Curry and Kevin Durant
## 25 2018  Warriors         73          0.707 Stephen Curry and Kevin Durant
## 26 2019   Raptors         23          0.707                  Kawhi Leonard
## 27 2020    Lakers          0          0.732 Lebron James and Anthony Davis
## 28 2021     Bucks         23          0.639          Giannis Antetokounmpo
## 29 2022  Warriors         24          0.646                  Stephen Curry
## 30 2023   Nuggets         20          0.646                   Nikola Jokic
##    Star_power
## 1           5
## 2          13
## 3          36
## 4          44
## 5          48
## 6          21
## 7           6
## 8          14
## 9          16
## 10         14
## 11          0
## 12         26
## 13         20
## 14         31
## 15         16
## 16         13
## 17         17
## 18         12
## 19         25
## 20         33
## 21         33
## 22          0
## 23         31
## 24         36
## 25         36
## 26          6
## 27         38
## 28         17
## 29         18
## 30         20

Now, lets another less analytical predictor. We want to add a predictor that measures the play style of the NBA teams. We want to measure how often of a team’s shots made are by assists. This is an attempt to measure how efficient the team is offensively.

score_efficiency <- c(25.5/39.0, 25.1/38.5, 24.8/40.2, 26.1/40.0, 23.8/37.4, 22.0/34.8, 23.4/38.3, 23.0/37.9, 23.0/38.4, 19.8/35.5, 20.8/33.5, 21.6/35.6, 20.6/37.1, 22.1/36.6, 22.4/36.4, 23.3/40.3, 21.1/38.3, 23.8/38.7, 20.0/37.1, 23.0/38.4, 25.2/40.6, 27.4/41.6, 22.7/38.7, 30.4/43.1, 29.3/42.8, 25.4/42.2, 25.4/42.3, 25.5/44.7, 27.1/40.5, 28.9/43.6)

main_data <- data.frame(Year = Year, Teams = Teams, Experience = experience, Win_Percentage = Win_Percentage, Score_efficiency = score_efficiency, Star_Players = Star_players, Star_power = Star_power)
main_data
##    Year     Teams Experience Win_Percentage Score_efficiency
## 1  1994   Rockets         11          0.707        0.6538462
## 2  1995   Rockets         39          0.573        0.6519481
## 3  1996     Bulls         38          0.878        0.6169154
## 4  1997     Bulls         50          0.841        0.6525000
## 5  1998     Bulls         62          0.756        0.6363636
## 6  1999     Spurs         14          0.740        0.6321839
## 7  2000    Lakers         25          0.817        0.6109661
## 8  2001    Lakers         48          0.683        0.6068602
## 9  2002    Lakers         64          0.707        0.5989583
## 10 2003     Spurs         30          0.732        0.5577465
## 11 2004   Pistons         23          0.659        0.6208955
## 12 2005     Spurs         30          0.720        0.6067416
## 13 2006      Heat         22          0.634        0.5552561
## 14 2007     Spurs         44          0.707        0.6038251
## 15 2008   Celtics          3          0.805        0.6153846
## 16 2009    Lakers         37          0.793        0.5781638
## 17 2010    Lakers         59          0.695        0.5509138
## 18 2011 Mavericks         13          0.695        0.6149871
## 19 2012      Heat         37          0.697        0.5390836
## 20 2013      Heat         59          0.805        0.5989583
## 21 2014     Spurs         48          0.756        0.6206897
## 22 2015  Warriors         10          0.817        0.6586538
## 23 2016 Cavaliers         32          0.890        0.5865633
## 24 2017  Warriors         60          0.817        0.7053364
## 25 2018  Warriors         73          0.707        0.6845794
## 26 2019   Raptors         23          0.707        0.6018957
## 27 2020    Lakers          0          0.732        0.6004728
## 28 2021     Bucks         23          0.639        0.5704698
## 29 2022  Warriors         24          0.646        0.6691358
## 30 2023   Nuggets         20          0.646        0.6628440
##                      Star_Players Star_power
## 1                Hakeem Olajuwwon          5
## 2                Hakeem Olajuwwon         13
## 3                  Michael Jordan         36
## 4                  Michael Jordan         44
## 5                  Michael Jordan         48
## 6   Tim Duncan and David Robinson         21
## 7                     Shaq O'Neal          6
## 8                     Shaq O'Neal         14
## 9                     Shaq O'Neal         16
## 10                     Tim Duncan         14
## 11                           None          0
## 12                     Tim Duncan         26
## 13                    Shaq O'Neal         20
## 14                     Tim Duncan         31
## 15                  Kevin Garnett         16
## 16                    Kobe Bryant         13
## 17                    Kobe Bryant         17
## 18                  Dirk Nowitzki         12
## 19   Lebron James and Dwayne Wade         25
## 20   Lebron James and Dwayne Wade         33
## 21                     Tim Duncan         33
## 22                           None          0
## 23                   Lebron James         31
## 24 Stephen Curry and Kevin Durant         36
## 25 Stephen Curry and Kevin Durant         36
## 26                  Kawhi Leonard          6
## 27 Lebron James and Anthony Davis         38
## 28          Giannis Antetokounmpo         17
## 29                  Stephen Curry         18
## 30                   Nikola Jokic         20
mean(score_efficiency)
## [1] 0.615438

On average, 61.5% of field goals made were from assists by NBA champions. We will have to measure the current shot efficiency to see if this measurement is a useful determinant.

Before we start weighing these predictors, lets add one more predictor: defense. Since teams offensively play at a different pace, points or field goals allowed will not be an accurate measurement. Again, we will measure based on percentage: opponent’s field goal percentage.

defense <- c(.440, .453, .448, .436, .431, .402, .425, .438, .424, .427, .413, .426, .440, .443, .419, .447, .446, .450, .434, .440, .444, .428, .435, .435, .447, .449, .448, .456, .438, .478)
main_data <- data.frame(Year = Year, Teams = Teams, Experience = experience, Win_Percentage = Win_Percentage, Score_efficiency = score_efficiency, Star_Players = Star_players, Star_power = Star_power, Defense = defense)
main_data
##    Year     Teams Experience Win_Percentage Score_efficiency
## 1  1994   Rockets         11          0.707        0.6538462
## 2  1995   Rockets         39          0.573        0.6519481
## 3  1996     Bulls         38          0.878        0.6169154
## 4  1997     Bulls         50          0.841        0.6525000
## 5  1998     Bulls         62          0.756        0.6363636
## 6  1999     Spurs         14          0.740        0.6321839
## 7  2000    Lakers         25          0.817        0.6109661
## 8  2001    Lakers         48          0.683        0.6068602
## 9  2002    Lakers         64          0.707        0.5989583
## 10 2003     Spurs         30          0.732        0.5577465
## 11 2004   Pistons         23          0.659        0.6208955
## 12 2005     Spurs         30          0.720        0.6067416
## 13 2006      Heat         22          0.634        0.5552561
## 14 2007     Spurs         44          0.707        0.6038251
## 15 2008   Celtics          3          0.805        0.6153846
## 16 2009    Lakers         37          0.793        0.5781638
## 17 2010    Lakers         59          0.695        0.5509138
## 18 2011 Mavericks         13          0.695        0.6149871
## 19 2012      Heat         37          0.697        0.5390836
## 20 2013      Heat         59          0.805        0.5989583
## 21 2014     Spurs         48          0.756        0.6206897
## 22 2015  Warriors         10          0.817        0.6586538
## 23 2016 Cavaliers         32          0.890        0.5865633
## 24 2017  Warriors         60          0.817        0.7053364
## 25 2018  Warriors         73          0.707        0.6845794
## 26 2019   Raptors         23          0.707        0.6018957
## 27 2020    Lakers          0          0.732        0.6004728
## 28 2021     Bucks         23          0.639        0.5704698
## 29 2022  Warriors         24          0.646        0.6691358
## 30 2023   Nuggets         20          0.646        0.6628440
##                      Star_Players Star_power Defense
## 1                Hakeem Olajuwwon          5   0.440
## 2                Hakeem Olajuwwon         13   0.453
## 3                  Michael Jordan         36   0.448
## 4                  Michael Jordan         44   0.436
## 5                  Michael Jordan         48   0.431
## 6   Tim Duncan and David Robinson         21   0.402
## 7                     Shaq O'Neal          6   0.425
## 8                     Shaq O'Neal         14   0.438
## 9                     Shaq O'Neal         16   0.424
## 10                     Tim Duncan         14   0.427
## 11                           None          0   0.413
## 12                     Tim Duncan         26   0.426
## 13                    Shaq O'Neal         20   0.440
## 14                     Tim Duncan         31   0.443
## 15                  Kevin Garnett         16   0.419
## 16                    Kobe Bryant         13   0.447
## 17                    Kobe Bryant         17   0.446
## 18                  Dirk Nowitzki         12   0.450
## 19   Lebron James and Dwayne Wade         25   0.434
## 20   Lebron James and Dwayne Wade         33   0.440
## 21                     Tim Duncan         33   0.444
## 22                           None          0   0.428
## 23                   Lebron James         31   0.435
## 24 Stephen Curry and Kevin Durant         36   0.435
## 25 Stephen Curry and Kevin Durant         36   0.447
## 26                  Kawhi Leonard          6   0.449
## 27 Lebron James and Anthony Davis         38   0.448
## 28          Giannis Antetokounmpo         17   0.456
## 29                  Stephen Curry         18   0.438
## 30                   Nikola Jokic         20   0.478

Besides a couple of the recent champions, Denver Nuggets and the Milwaukee Bucks, all teams had an defensive efficiency of .450 or less. Perhaps defense does not matter as in the recent years.

defense_over_time <- lm(defense ~ Year)
plot(Year, defense, main = "Opponent's FG% over Time")
abline(defense_over_time)

There is a clear increase in opponent’s FG% allowed over time. Intuitively, it does not make sense that for championship winning team’s defense to get worse. Thus, we will completely ignore this predictor for our model.

Lets see what we have so far: We will use the median for star power and experience since there are clear outliers in the data. However, we cannot just ignore these outliers, since common occurrences of these “outliers” can possibly occur again. Thus, our solution is to weigh these factors a little less compared to other factors.

median(experience)
## [1] 31
median(Star_power)
## [1] 19

Next, we have win percentage and score efficiency. Because there are no outliers in this data compared to the previous two, we can use the mean statistic to calculate the general value of these measurements.

mean(score_efficiency)
## [1] 0.615438
mean(Win_Percentage)
## [1] 0.7333667

Now, lets calculate some of the NBA team’s necessary statistics for the 2034-2024 season. This is going to be calculated during the NBA break weekend, so the data will have to be updated when the regular season ends. For the sake of efficiency, we will only consider the top 6 teams of each conference. When the regular season ends, we will consider all teams that are in the playoffs. The first index of each team will be score efficiency, then win percentage, then experience, then star power.

Timberwolves <- c(26.4/41.4, .709, 4*1+3*1, 0)
Thunder <- c(27.3/44.5, .685, 1, 0)
Clippers <- c(26.0/43.0, .679, 4*1+2*4+1*2, 2+2+4+2+4+8+4+4)
Nuggets <- c(28.6/43.4, .655, 4*4+3+2*4+2, 4+8+8+1)
Suns <- c(26.8/42.7, .600, 4*2+3*2+2*8, 4+4+4+8+1)
Pelicans <- c(27.1/42.9, .600, 2, 0)
Celtics <- c(26.2/43.5, .782, 4*3+3*8+2+4, 1)
Cavaliers <- c(27.3/42.8, .679, 4,0)
Bucks <- c(26.7/44.1, .625, 4+3*2+8*2+2, 8+8+1+2+2+1)
Knicks <- c(23.7/41.7, .600, 4*2+2, 0)
Sixers <- c(24.9/42.5, .593, 4*2+3*2+2*2+1, 4+4+8)
Pacers <- c(30.9/46.8, .554, 1, 0)

When we calculated score efficiency, we were doubtful if the predictor had any use. A good way to determine this is if we calculate the mean/median of score efficiency for the 2023-2024 nba teams, and compare it to the nba champions.

current_score_efficiency <- c(Thunder[1], Timberwolves[1], Clippers[1], Nuggets[1], Suns[1], Pelicans[1], Celtics[1], Cavaliers[1], Bucks[1], Knicks[1], Sixers[1], Pacers[1])
mean(current_score_efficiency)
## [1] 0.6195178

The average score efficiency of the 2024 teams (0.613) is not too much different from the average score efficiency of the nba champions. Therefore, we will exclude this variable as well. We only have three variables left that can help us with our prediction model: win percentage, experience, and star power.

Part Three: Creating the Model

Now, we need to figure our two things: How we will weigh these variables in terms of importance relative to another, and how we will use these predictor values of the nba champions to compare it to the current values for the 2024 nba teams.

For weighing the variables, we will weigh experience and star power less. This is because lots of the current teams have 0 experience/star power, and giving too much credit to the teams that do have experience may lead to too much bias. Therefore, win % will take 50% of the weights, and experience and star power taking 25% of the weights each.

For the algorithm of how we will compare the values between the current teams and the past champions, any team that has values closer to the median experience/star power of the past champions will get more credit, and any team that has more will only get partial credit (half more), since having too much wins, star power, or experience (due to age) may not necessarily be a good thing.

In order to weigh these variables properly, we can take advantage of the median/mean values to scale the values properly. If we say that the median of our experience is the baseline (31), we must create the median of star power (19) and mean of the win percentage (0.733) equivalent to 31. To do so, we can just divide the baseline value by the value we want to scale, and multiply that result by each team’s non-scaled values of star power and win percentage. However, we must also multiply win percentage by 2 afterwards, since we are weighing them heavier than the other two variables.

total_credit <- function(team) {
  win_diff <- team[2] - mean(Win_Percentage)
  if (win_diff > 0) {
     win_diff <- (1/2) * (team[2] - mean(Win_Percentage))
  }
  
  exp_diff <- team[3] - mean(experience)
  if (exp_diff > 0) {
     exp_diff <- (1/2) * (team[3] - mean(experience))
  }

 star_diff <- team[4] - mean(Star_power)
  if (star_diff > 0) {
     star_diff <- (1/2) * (team[4] - mean(Star_power))
  }
 
 win_diff * median(experience)/mean(Win_Percentage) * 2 + exp_diff + star_diff * median(experience)/median(Star_power)
}

Lets see how this works with one every first:

total_credit(Timberwolves)
## [1] -64.17228
total_credit(Thunder)
## [1] -72.20128
total_credit(Clippers)
## [1] -17.69537
total_credit(Nuggets)
## [1] -12.47437
total_credit(Suns)
## [1] -16.12416
total_credit(Pelicans)
## [1] -78.38731
total_credit(Celtics)
## [1] -27.40826
total_credit(Cavaliers)
## [1] -69.70853
total_credit(Bucks)
## [1] -14.78693
total_credit(Knicks)
## [1] -70.38731
total_credit(Sixers)
## [1] -35.87384
total_credit(Pacers)
## [1] -83.27623

The results being mostly negative makes sense, since only one of these teams will win the championship. Since these values are from the differences of the current nba teams’ values and the championship team’s values, generally, the higher the value of the output, the better chances the team has at winning the championship. Therefore, we can add 100 from all of these values, to give more credit to the teams with lower output.

100 + total_credit(Timberwolves)
## [1] 35.82772
100 + total_credit(Thunder)
## [1] 27.79872
100 + total_credit(Clippers)
## [1] 82.30463
100 + total_credit(Nuggets)
## [1] 87.52563
100 + total_credit(Suns)
## [1] 83.87584
100 + total_credit(Pelicans)
## [1] 21.61269
100 + total_credit(Celtics)
## [1] 72.59174
100 + total_credit(Cavaliers)
## [1] 30.29147
100 + total_credit(Bucks)
## [1] 85.21307
100 + total_credit(Knicks)
## [1] 29.61269
100 + total_credit(Sixers)
## [1] 64.12616
100 + total_credit(Pacers)
## [1] 16.72377

Now, we can divide each of these numbers by the total output by all teams.

total_credit <- 100 + total_credit(Timberwolves) + 100 + total_credit(Thunder) + 100 + total_credit(Clippers) + 100 + total_credit(Nuggets) + 100 + total_credit(Suns) + 100 + total_credit(Pelicans) + 100 + total_credit(Celtics) + 100 + total_credit(Cavaliers) + 100 + total_credit(Bucks) + 100 + total_credit(Knicks) + 100 + total_credit(Sixers) + 100 + total_credit(Pacers)
Contenders <- c("Timberwolves", "Thunder", "Clippers", "Nuggets", "Suns", "Pelicans", "Celtics", "Cavaliers", "Bucks", "Knicks", "Sixers", "Pacers")
Chances <- c(35.82772/637.5041,27.79872/637.5041, 82.30463/637.5041, 87.52563/637.5041, 83.87584/637.5041, 21.61269/637.5041, 72.59174/637.5041, 30.29147/637.5041, 85.21307/637.5041, 29.61269/637.5041, 64.12616/637.5041, 16.72377/637.5041)
ContendersChances <- data.frame(Contenders = Contenders, Chances = Chances)
NewContendersChances <- ContendersChances[order(-ContendersChances$Chances), ]
NewContendersChances
##      Contenders    Chances
## 4       Nuggets 0.13729422
## 9         Bucks 0.13366670
## 5          Suns 0.13156910
## 3      Clippers 0.12910447
## 7       Celtics 0.11386866
## 11       Sixers 0.10058941
## 1  Timberwolves 0.05619998
## 8     Cavaliers 0.04751573
## 10       Knicks 0.04645098
## 2       Thunder 0.04360555
## 6      Pelicans 0.03390204
## 12       Pacers 0.02623320

Part Four: Summary/Conclusion:

What we did was first collect the useful data for the past 30 nba teams that can potentially determine the outcome of an NBA championship team. We took the mean/median of these values so that we could compare these values to the current NBA teams. Values that were too similar to an average team or seemed intuitively useless were removed from consideration, which ended up being the opponent’s field goal percentage and score efficiency (assists divided by field goals made). The variables that proved to be useful were win percentage, experience, and star power. These variables were measured using a systematic and consistent algorithm but were still subjective by construction.

We then used these three predictive variables to find the mean/median of the variables from all of the NBA champions, depending on if there were outliers or not. We then collected the data for all of the top 6 current NBA teams of each conference that were recorded during the break of the season.

We were able to give each team their “credit” value by measuring how similar their own predictive variable to the mean/median of the previous NBA championship teams, and also give additional credit if their measurement for any of the variables were higher than the past mean/median. We then weighted these three values, giving higher weights to win percentage and lower weights to experience and star power, since the latter variables contained some outliers.

Finally, we were able to get the output in the form of a proportion by adding all the teams’ credit output, and dividing each team’s own credit by the total credit output.

Based on our final model that uses predictors of win percentage, experience, and star power, our results tell us that the Nuggets have the highest chance to win the NBA champinship at 13.7%, followed by the Bucks at 13.3% and Suns at 13.1%.

Part Five: Acknowledgements:

There are many flaws and acknowledgements that can be mentioned for how I completed this project, but I will try to address the most important ones.

Firstly, there is a good chance we missed a lot of other predictors that could have been more useful for our predictive model than we used. My original reason for the project was to use more intangible factors to predict the next NBA champion, but that turned out to be harder than I thought because of time constraints and inability to find simiilar statistical patterns of the NBA championship teams.

We also did not utilize time within our dataset, and gave the same weights to the statistics of a team 30 years ago and one year ago. Although we did observe some of the variables on a plot and remove them if they were changing over time, our model would have been more accurate if we gave more weights to teams that have won the championship more recently, as well as assessing the trends and using differentiation to predict the outcome instead of just finding the mean or median of the variables.

Finally, we only included the top 6 teams of each conference to consider the outcome of the championship, when ideally we could have included more or less. Although history shows us that only teams that finish top 3 within their conference have a realistic shot of winning the championship, our model should have included all teams that are in the playoffs for formality.

There are many other errors/flaws in this project that may have occured without notice as well. Additionally, there was a lot of subjectivity to creating the algorithms for the weights of experience, star power, etc. Realistically, because the game of Basketball involves more than just statistics, it very difficult to mathematically predict the future NBA championship team without some bias and subjectivity. But with that consideration, we were able to create a statistical model includes more intangible variables that are difficult to quantify, such as star power and experience, that distributes the chance of winning to each contending NBA team.

Sources/References:

ChatGPT

https://www.basketball-reference.com/

https://basketball.realgm.com/nba/teams/New-York-Knicks/20/Playoff-History