Final Project

Refereeing in sports is a very common sore subject for fans, regardless of the sport in question. This is largely explained by the extremely strong emotions associated with being a fan - bad calls always seem egregious when they go against a team you support, but seem mundane (and sometimes entertaining) when they go against your opponents.

Sometimes, however, referees are accused of bias. This is where it becomes a bigger issue.

In the English Premier League (EPL), fans of the London-based team Arsenal have long been claiming that the referee Mike Dean is a fan of their rival team Tottenham. Let’s dive into some numbers from the last 10 years of football in England:

eplData <- read.csv("https://raw.githubusercontent.com/mkollontai/R_Bridge_FinalProject/master/EPL_09_19_Stats.csv")
#Rename some of the column names for clarity
names(eplData)[names(eplData) == "FTAG"] <- "AwayGoalsFinal"
names(eplData)[names(eplData) == "FTHG"] <- "HomeGoalsFinal"
names(eplData)[names(eplData) == "FTR"] <- "Final Victor"
names(eplData)[names(eplData) == "HTAG"] <- "AwayGoalsHalf"
names(eplData)[names(eplData) == "HTHG"] <- "HomeGoalsHalf"
names(eplData)[names(eplData) == "HTR"] <- "Halftime Victor"

HomePtsFxn <- function(x) 
{
  if (x == "D") 
  {1} 
  else {
    if (x == "H")
      {3} 
      else
      {0}
  }
}
AwayPtsFxn <- function(x) 
{
  if (x == "D") 
  {1} 
  else {
    if (x == "H")
      {0} 
      else
      {3}
  }
}
eplData$HomePoints <- mapply(HomePtsFxn,eplData$`Final Victor`)
eplData$AwayPoints <- mapply(AwayPtsFxn,eplData$`Final Victor`)

#Select the classic "Top 6" of the EPL since their quality is more consistent ad results in more predicatblle results

#Removed Liverpool since very few of their games were refereed by Mike Dean

teams <- c("Arsenal", "Chelsea", "Man City", "Man United", "Tottenham")

#Create dataframes containing subsets of the data associated with each of the 18 teams selected above separated into home and away games. 
#Separate dataframes for overall data and data for games refereed by Mike Dean

for (i in teams)
{
  assign(paste(i,"HomeData",sep = ""),eplData[which(eplData$HomeTeam == i),])
  assign(paste(i,"AwayData",sep = ""),eplData[which(eplData$AwayTeam == i),])
  assign(paste(i,"HomeDeanData",sep = ""), eplData[which((eplData$HomeTeam == i) & (eplData$Referee == 'M Dean')),])
  assign(paste(i,"AwayDeanData",sep = ""), eplData[which((eplData$AwayTeam == i) & (eplData$Referee == 'M Dean')),])
}

Now let us look at the comparative point haul of the top 6 teams overall and when Mike Dean was refereeing.

##                 Arsenal   Chelsea  Man City Man United Tottenham
## HomePts       2.2315789 2.2526316 2.4315789  2.2684211 2.1000000
## HomeDeanPts   1.5882353 2.0625000 2.3684211  1.7692308 1.9130435
## AwayPts       1.5526316 1.6947368 1.8421053  1.7263158 1.6368421
## AwayDeanPts   1.1764706 1.8260870 1.6500000  1.7083333 1.3333333
## HomeDeanRatio 0.7117092 0.9155958 0.9740260  0.7799393 0.9109731
## AwayDeanRatio 0.7577268 1.0775047 0.8957143  0.9895833 0.8145766

By comparing the average point haul for a team with the point average per game when Mike Dean was refereeing, we can make a very basic determination of his effect on the teams’ results

Below is a graph showing the ratio of points per game each team amassed when Mike Dean was refereeing divided by the overall average over the last 10 years, showing how much better (or worse) teams fared due partially to his refereeing.

Conclusion:

* Mike Dean being the referee generally leads to a worse result for 5 of the top-6 EPL teams (Liverpool omitted due to low game count with Mike Dean)

* Though all 5 of the teams analyzed seem to “lose” points when Mike Dean is refereeing, Arsenal’s drop in points is noticeably lower

* Mike Dean’s refereeing bias during Arsenal matches deserves further scrutiny and the claims made by Arsenal fans are not baseless