NBA Referees

Author

Ben Coyle

Introduction

I am a big NBA fan and whenever there seems to be a close game, the losing team often blames the refs. I am curious to see if there has been any bias in reffing this season or at least in correlation going on in reffing. I ended up making 5 different graphs.

First, I wanted to see if the amount of experience a ref had made a difference in which refs were Crew and which refs were Chief.

Second, I wanted to see if refs called more fouls on the road team compared the home team.

Third, I wanted to see how often the home team won when reffing.

Fourth, I wanted to see if there was a big disparity in fouls called in a game among refs.

And last, I wanted to see the diversity in the gender of refs.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(knitr)     
library(readr)     
library(stringr)   
library(ggplot2)

refstats <- read.csv("https://myxavier-my.sharepoint.com/:x:/g/personal/coyleb2_xavier_edu/EbAFXKKGRYBKoFjMtinm9skB4MrSppjCqSmglgELMfRdNg?download=1")

Role vs Experience

average_experience <- refstats %>%
  group_by(ROLE) %>%
  summarise(average_experience = mean(EXPERIENCE..YEARS., na.rm = TRUE))

ggplot(average_experience, aes(x = ROLE, y = average_experience, fill = ROLE)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Average Experience Years by NBA Referee Role", x = "Referee Role", y = "Average Experience Years") +
  theme_minimal()

Looking at the graph above, the average Chief referee had experience close to 19 years whereas the average crew had an average experience of 12 years. Although the disparity did not surprise me, I was surprised on how long people are refs. Being a ref for 19 years is a long time.

Home vs Away Foul Differential

ggplot(refstats, aes(x = REFEREE, y = FOUL.DIFFERENTIAL..Against.Road.Team.....Against.Home.Team., fill = ROLE)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Foul Differential (Against Road Team) - (Against Home Team) by Referee",
       x = "Referee",
       y = "Foul Differential",
       fill = "Role") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

facet_wrap(~ROLE, scales = "free_y")

<ggproto object: Class FacetWrap, Facet, gg>
    compute_layout: function
    draw_back: function
    draw_front: function
    draw_labels: function
    draw_panels: function
    finish_data: function
    init_scales: function
    map_data: function
    params: list
    setup_data: function
    setup_params: function
    shrink: TRUE
    train_scales: function
    vars: function
    super:  <ggproto object: Class FacetWrap, Facet, gg>

For this graph, being at zero means that the ref is calling equally as many foul for the home and road team. Above zero means that refs are calling more fouls for the road team and below zero means more for the home team. Although fouls do not have to be even, it does seem that refs are calling more fouls on the road team.

Home vs Away Win Percentage

ggplot(refstats, aes(x = REFEREE, y = FOUL.DIFFERENTIAL..Against.Road.Team.....Against.Home.Team., fill = ROLE)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Foul Differential (Against Road Team) - (Against Home Team) by Referee",
       x = "Referee",
       y = "Foul Differential",
       fill = "Role") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

facet_wrap(~ROLE, scales = "free_y")

<ggproto object: Class FacetWrap, Facet, gg>
    compute_layout: function
    draw_back: function
    draw_front: function
    draw_labels: function
    draw_panels: function
    finish_data: function
    init_scales: function
    map_data: function
    params: list
    setup_data: function
    setup_params: function
    shrink: TRUE
    train_scales: function
    vars: function
    super:  <ggproto object: Class FacetWrap, Facet, gg>

For this graph, being at 0.5 means that the road team is winning just as much as the home team. Above 0.5 means the home team wins more, and below 0.5 means the road team wins more. Now with home court advantage since 2004 the home team has won around 59% of the time. Although a couple refs seem really high or really low, this is about what I was expecting.

Called Fouls by Ref

ggplot(refstats, aes(x = ROLE, y = CALLED.FOULS.PER.GAME, fill = ROLE)) +
  geom_boxplot() +
  labs(title = "Box Plot of Called Fouls per Game by Referee Role",
       x = "Referee Role",
       y = "Called Fouls per Game") +
  theme_minimal()

For this graph, I wanted to see if there was a big disparity in fouls called. Outside of a couple outliers most refs on average call around 40 fouls a game.

Ref by Gender

ggplot(refstats, aes(x = GENDER, y = EXPERIENCE..YEARS., color = GENDER)) +
  geom_point() +
  labs(title = "Scatter Plot of Gender vs Years of Experience",
       x = "Gender",
       y = "Years of Experience") +
  theme_minimal()

write.csv(refstats, "referee_data.csv", row.names = FALSE)

This surprised me, there are only four female refs in the NBA. It does make sense that they do not have much experience as I see this number growing over the years.

Conclusion

Based off my results, I learned that every ref is different than each other. Although I do not believe they are rigging games, every ref calls a game differently. Some factors can be experience and home crowd. Overall, I have never really thought about officiating so this was a cool project to do.