Introduction:

How much advantage does the home team have in the NBA and how did COVID-19 affect home-court advantage?

Using data from the last 11 years of NBA games, I present a visualization of home-court advantage along with follow-up analysis of my findings. This report is being finalized as the 2021-2022 playoffs begin, so this year’s data (’21-’22 season) only includes regular season games. I’ve also chosen to split the ’19-’20 season into two groups: before the COVID outbreak and the “COVID bubble”.

The “COVID Bubble” was the NBA’s solution to finishing the season while maintaining player safety in the wake of COVID-19’s initial outbreak. The top 22 teams finished the season on a neutral court, essentially removing any home arena advantages.

The measure that I’ll be using to quantify “Home-Court Advantage” is point differential at home minus point differential when playing away (point differential is simply the difference between points scored in a game and points allowed in a game). Grouping by teams and analyzing the difference in home and away point differential allows us to account for a team’s overall success in a season and isolate their relative home-court advantage.

First I present a plot of each team’s home-court advantage over the last decade along with league-wide averages represented by the blue and red crossbars:

As you can see in the plot above and in the summary below, teams typically perform better at home than away, with an average home-court advantage of 5.5 points. Outside of the COVID affected seasons, it is rare for a team’s ‘home-court advantage’ to drop below 0 (indicating they played better on the road).

In the space that follows, I’ll continue exploring the questions that surround home-court advantage including a description of the data used, more summary statistics, and some analysis of what I have found.

Data

The data was pulled from basketballreference.com, I gathered game results from the last 11 seasons of NBA games and compiled them. Here is a sample of the raw data which includes the team that played (and which was the home team), the date, attendance, and score:

Weekday Month Day Year Start..ET. Visitor.Neutral PTS Home.Neutral PTS.1 X X.1 Attend. Notes date Season
1 Tue Oct 26 2010 7:30p Miami Heat 80 Boston Celtics 88 Box Score 18624 2010-10-26 ’10-’11
2 Tue Oct 26 2010 10:00p Phoenix Suns 92 Portland Trail Blazers 106 Box Score 20603 2010-10-26 ’10-’11
3 Tue Oct 26 2010 10:30p Houston Rockets 110 Los Angeles Lakers 112 Box Score 18997 2010-10-26 ’10-’11
4 Wed Oct 27 2010 7:00p Boston Celtics 87 Cleveland Cavaliers 95 Box Score 20562 2010-10-27 ’10-’11
5 Wed Oct 27 2010 7:00p New York Knicks 98 Toronto Raptors 93 Box Score 18722 2010-10-27 ’10-’11
6 Wed Oct 27 2010 7:00p Miami Heat 97 Philadelphia 76ers 87 Box Score 20389 2010-10-27 ’10-’11

First, some minor adjustments were made to clean the data (for example, the Charlotte team changed mascots in 2014) and dates were combined into a single variable. I then added labels for COVID and the season year based on these dates, and grouped the data by season and by team for plotting and analysis. As you can see in the tidied sample below, each row constitutes a team’s performance during a particular season with the average home point differential and average away point differentials for that season (the statistic plotted above is the difference in these (hdiff - adiff)).

To use the first example, the 2010-2011 Atlanta Hawks on average were 1.06 points worse than their opponents at home and 1.49 points worse than opponents when away. Overall, their home-away differential was 0.43, which you can see on the plot above was the lowest in the league that year.

Team Season hdiff adiff covid
1 Atlanta Hawks ’10-’11 -1.06382978723404 -1.48936170212766 Pre-COVID
2 Atlanta Hawks ’11-’12 4.55555555555556 0.972222222222222 Pre-COVID
3 Atlanta Hawks ’12-’13 1.54545454545455 -1.5 Pre-COVID
4 Atlanta Hawks ’13-’14 1.72727272727273 -2.71111111111111 Pre-COVID
5 Atlanta Hawks ’14-’15 7.40816326530612 1.61224489795918 Pre-COVID
6 Atlanta Hawks ’15-’16 5.93478260869565 0.282608695652174 Pre-COVID

Analysis

Based on the main plot, it appears that the average home-court advantage was lower during the season and a half most effected by COVID. Particularly, the 2020-2021 season shows a clearly lower home-court advantage. The average attendance during the 2020-2021 season was 2321 people compared to other seasons shown which average around 17500-18000 people. The 2019 “Bubble” shows a lower mean as well, and considerably more variability since it is a much smaller sample size of games and teams. Finally, the most recent season’s average is a bit low as well, despite attendance numbers beginning to return to pre-pandemic levels (an average attendance of 16920 people for 2021-2022). However, this season’s data does not include the playoffs, where teams are seeded and given more home games based on season success, which may explain the lower mean advantage.

To quantify this apparent difference, we can assess a linear combination for the means from each season, grouped by seasons with COVID-19 restricting and those without (2021 is considered without since attendance rates we similar to pre-COVID levels). The equation below shows that average home-court advantage was higher in non-COVID years by about 2.424 points. Again, this is conservative since it is likely that the mean for 2021-22 (\(\mu_{21}\)) will increase after the playoffs have completed.

Linear Combination: \[ \gamma = \text{non-COVID averages} - \text{COVID averages}\]

\[ \gamma = \frac{1}{10}(\mu_{11} + \mu_{12} + \mu_{13} + \mu_{14} + \mu_{15}+ \mu_{16} + \mu_{17} + \mu_{18} + \mu_{19}+ \mu_{21}) - \frac{1}{2}(\mu_{bub} + \mu_{20})\]

\[ \gamma = \frac{1}{10}(6.42 + 5.77 + 6.53 + 5.17 + 4.75 + 5.94 + 4.71 + 5.54 + 4.47 + 3.44) - \frac{1}{2}(3.54 + 2.16)\]

\[ \gamma = 2.424\]

Having home court has clearly been an advantage in the NBA over the past 12 years. Of the 13,813 games played since 2010, the home team won 8,020 of those games. This is a win proportion of 58.1%. If we look specifically at games played before the disruptions of COVID-19, the proportion is even higher at 58.9%.

Attendance and home-court advantage

Do teams win more often at home when attendance is higher?

## # A tibble: 13,813 x 18
## # Groups:   Home.Neutral, Season [352]
##    Weekday Month Day   Year  Start..ET. Visitor.Neutral   PTS Home.Neutral PTS.1
##    <chr>   <chr> <chr> <chr> <chr>      <chr>           <int> <chr>        <int>
##  1 Tue     Oct   26    2010  7:30p      Miami Heat         80 Boston Celt~    88
##  2 Tue     Oct   26    2010  10:00p     Phoenix Suns       92 Portland Tr~   106
##  3 Tue     Oct   26    2010  10:30p     Houston Rockets   110 Los Angeles~   112
##  4 Wed     Oct   27    2010  7:00p      Boston Celtics     87 Cleveland C~    95
##  5 Wed     Oct   27    2010  7:00p      New York Knicks    98 Toronto Rap~    93
##  6 Wed     Oct   27    2010  7:00p      Miami Heat         97 Philadelphi~    87
##  7 Wed     Oct   27    2010  7:00p      Detroit Pistons    98 Brooklyn Ne~   101
##  8 Wed     Oct   27    2010  8:00p      Chicago Bulls      95 Oklahoma Ci~   106
##  9 Wed     Oct   27    2010  8:00p      Milwaukee Bucks    91 New Orleans~    95
## 10 Wed     Oct   27    2010  8:00p      Sacramento Kin~   117 Minnesota T~   116
## # ... with 13,803 more rows, and 9 more variables: X <chr>, X.1 <chr>,
## #   Attend. <dbl>, Notes <chr>, date <chr>, Season <chr>, homediff <int>,
## #   awaydiff <int>, homeWin <chr>
## # A tibble: 352 x 3
## # Groups:   Home.Neutral [30]
##    Home.Neutral  Season                     avg_attend
##    <chr>         <chr>                           <dbl>
##  1 Atlanta Hawks "'10-'11 "                     16136.
##  2 Atlanta Hawks "'11-'12 "                     15542 
##  3 Atlanta Hawks "'12-'13 "                     15338.
##  4 Atlanta Hawks "'13-'14 "                     14640.
##  5 Atlanta Hawks "'14-'15 "                     17570.
##  6 Atlanta Hawks "'15-'16 "                     17070.
##  7 Atlanta Hawks "'17-'18 "                     14409 
##  8 Atlanta Hawks "'18-'19 "                     15328.
##  9 Atlanta Hawks "'19-'20 \nBefore \nCOVID"     16043.
## 10 Atlanta Hawks "'20-'21 "                      4336.
## # ... with 342 more rows

To investigate this question, I will be looking at the average attendance when a team wins at home vs when they lose at home. Though I have all of the data available, I will use random sampling and two-sample t-tests to infer whether there is a difference for each team. The purpose of this inference is strictly practice, since we have access to the entire dataset.

We will test whether the average attendance is higher for home wins than for home losses. The data from the “COVID Bubble” is not included in this inference since these games were not played on a home court so the ‘home team’ designation was not meaningful for those games.

\[H_0: \mu_{win} = \mu_{loss}\] \[vs.\] \[H_A: \mu_{win} > \mu_{loss}\]

Because we are testing across every team, we will set a significance level of \(\alpha = 0.00167\). This significance level was chosen using a Bonferroni correction for multiple testing (0.05/30).

For each team, I drew a stratified random sample - taking a sample of 10 home games from each season for a sample size of 110. As this is about 12% of games, I chose to sample with replacement.

From the difference in sample means and the standard deviation of each sample, I calculated the t-statistic and found the p-value with the below R function to complete this testing for all 30 teams.

p_values <- c()
  
#two sample t-tests (unequal variance) for all 30 teams
testing <- for(i in seq(1, 60, 2)){
  #difference of means for each team  
  diff <- as.numeric(test_set[i, 'mean'] - test_set[i+1, 'mean'])
  sd1 <- as.numeric(test_set[i, 'sd'])
  sd2 <- as.numeric(test_set[i+1, 'sd'])
  n1 <- as.numeric(test_set[i, 'count'])
  n2 <- as.numeric(test_set[i+1, 'count'])
  s.err <- as.numeric(sqrt(((sd1^2)/n1) + ((sd2^2)/n2)))
  
  #degrees of freedom, comparing two means, Welch
  df <- as.numeric(((sd1^2/n1 + sd2^2/n2)^2)/((sd1^4/(n1^2*(n1-1)))+((sd2^4/(n2^2*(n2-1))))))

  t <- as.numeric(diff/s.err)
  #one-sided
  p_values <- c(p_values, pt(t, df))
  
}

Below are the results of comparing the average attendance for wins vs losses for each of the 30 NBA teams over the last 12 years.

Teams p-values Teams p-values
1 Atlanta Hawks 0.31651 Miami Heat 0.25122
2 Boston Celtics 0.12507 Milwaukee Bucks 0.15163
3 Brooklyn Nets 0.48802 Minnesota Timberwolves 0.47415
4 Charlotte Hornets 0.84485 New Orleans Pelicans 0.89516
5 Chicago Bulls 0.50868 New York Knicks 0.92907
6 Cleveland Cavaliers 0.00069 Oklahoma City Thunder 0.00254
7 Dallas Mavericks 0.05844 Orlando Magic 0.13343
8 Denver Nuggets 0.04329 Philadelphia 76ers 0.37855
9 Detroit Pistons 0.27037 Phoenix Suns 0.88609
10 Golden State Warriors 0.48168 Portland Trail Blazers 0.4444
11 Houston Rockets 0.00016 Sacramento Kings 0.74022
12 Indiana Pacers 0.37235 San Antonio Spurs 0.00068
13 Los Angeles Clippers 0.35169 Toronto Raptors 0.13939
14 Los Angeles Lakers 0.28052 Utah Jazz 0.33691
15 Memphis Grizzlies 0.46055 Washington Wizards 0.69049

From the table above we can conclude that there is significant evidence that average attendance is higher when the team wins at home for Cleveland, Houston, and San Antonio. These three team’s samples gave us a p-value lower than our significance level so we can reject the hypothesis that these means are equal. Oklahoma City is also quite close, which is notable considering how overly conservative a Bonferroni correction can be.

Having access to all games and their attendance, I wanted to follow up this prediction by displaying the attendance numbers for all home games separated by wins and losses. You can see the distribution of attendance rates for each team along with the averages(represented with black crossbars). Most teams do show a higher average attendance for wins at home.

Conclusion

The effects of home-court advantage are difficult to interpret because of the number of variables that could impact the team’s performance. Though most teams appear to win more games when the attendance is higher, it is unclear how much bigger crowds help the team or if better teams simply draw bigger crowds. For many teams, attendance is largely determined by who the away team is. For example, it may be hard to sell tickets to a game when the worst team in the league is coming to town. Whereas, we can expect the arena to be sold out if the home team is facing a good team with star talent. Some of this variation is already baked into attendance numbers since arenas adjust ticket price according to demand, but much of the variation due to variables other than crowd size is unaccounted for.

Though there is a clear positive impact for teams playing at home, it’s hard to know if this results from the crowd, the familiar court, or sleeping at home rather than a hotel. The “COVID-19 Bubble” is an interested case because teams were suddenly on a neutral court. Unfortunately, the small sample size (some teams only playing 4 ‘home’ games) and extenuated circumstances of the pandemic make it difficult to assert how much teams missed their home crowd. The main objective of this project was to build an interactive plot that helped visualize and analyze home-court success across the league.