I am a big NBA fan and whenever there seems to be a close game, the losing team often blames the refs. I am curious to see if there has been any bias in reffing this season or at least in correlation going on in reffing. I ended up making 5 different graphs.
First, I wanted to see if the amount of experience a ref had made a difference in which refs were Crew and which refs were Chief.
Second, I wanted to see if refs called more fouls on the road team compared the home team.
Third, I wanted to see how often the home team won when reffing.
Fourth, I wanted to see if there was a big disparity in fouls called in a game among refs.
And last, I wanted to see the diversity in the gender of refs.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.3 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.3 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
average_experience <- refstats %>%group_by(ROLE) %>%summarise(average_experience =mean(EXPERIENCE..YEARS., na.rm =TRUE))ggplot(average_experience, aes(x = ROLE, y = average_experience, fill = ROLE)) +geom_bar(stat ="identity", position ="dodge") +labs(title ="Average Experience Years by NBA Referee Role", x ="Referee Role", y ="Average Experience Years") +theme_minimal()
Looking at the graph above, the average Chief referee had experience close to 19 years whereas the average crew had an average experience of 12 years. Although the disparity did not surprise me, I was surprised on how long people are refs. Being a ref for 19 years is a long time.
Home vs Away Foul Differential
ggplot(refstats, aes(x = REFEREE, y = FOUL.DIFFERENTIAL..Against.Road.Team.....Against.Home.Team., fill = ROLE)) +geom_bar(stat ="identity", position ="dodge") +labs(title ="Foul Differential (Against Road Team) - (Against Home Team) by Referee",x ="Referee",y ="Foul Differential",fill ="Role") +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1))
facet_wrap(~ROLE, scales ="free_y")
<ggproto object: Class FacetWrap, Facet, gg>
compute_layout: function
draw_back: function
draw_front: function
draw_labels: function
draw_panels: function
finish_data: function
init_scales: function
map_data: function
params: list
setup_data: function
setup_params: function
shrink: TRUE
train_scales: function
vars: function
super: <ggproto object: Class FacetWrap, Facet, gg>
For this graph, being at zero means that the ref is calling equally as many foul for the home and road team. Above zero means that refs are calling more fouls for the road team and below zero means more for the home team. Although fouls do not have to be even, it does seem that refs are calling more fouls on the road team.
Home vs Away Win Percentage
ggplot(refstats, aes(x = REFEREE, y = FOUL.DIFFERENTIAL..Against.Road.Team.....Against.Home.Team., fill = ROLE)) +geom_bar(stat ="identity", position ="dodge") +labs(title ="Foul Differential (Against Road Team) - (Against Home Team) by Referee",x ="Referee",y ="Foul Differential",fill ="Role") +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1))
facet_wrap(~ROLE, scales ="free_y")
<ggproto object: Class FacetWrap, Facet, gg>
compute_layout: function
draw_back: function
draw_front: function
draw_labels: function
draw_panels: function
finish_data: function
init_scales: function
map_data: function
params: list
setup_data: function
setup_params: function
shrink: TRUE
train_scales: function
vars: function
super: <ggproto object: Class FacetWrap, Facet, gg>
For this graph, being at 0.5 means that the road team is winning just as much as the home team. Above 0.5 means the home team wins more, and below 0.5 means the road team wins more. Now with home court advantage since 2004 the home team has won around 59% of the time. Although a couple refs seem really high or really low, this is about what I was expecting.
Called Fouls by Ref
ggplot(refstats, aes(x = ROLE, y = CALLED.FOULS.PER.GAME, fill = ROLE)) +geom_boxplot() +labs(title ="Box Plot of Called Fouls per Game by Referee Role",x ="Referee Role",y ="Called Fouls per Game") +theme_minimal()
For this graph, I wanted to see if there was a big disparity in fouls called. Outside of a couple outliers most refs on average call around 40 fouls a game.
Ref by Gender
ggplot(refstats, aes(x = GENDER, y = EXPERIENCE..YEARS., color = GENDER)) +geom_point() +labs(title ="Scatter Plot of Gender vs Years of Experience",x ="Gender",y ="Years of Experience") +theme_minimal()
This surprised me, there are only four female refs in the NBA. It does make sense that they do not have much experience as I see this number growing over the years.
Conclusion
Based off my results, I learned that every ref is different than each other. Although I do not believe they are rigging games, every ref calls a game differently. Some factors can be experience and home crowd. Overall, I have never really thought about officiating so this was a cool project to do.