The purpose of this document is to create an ELO margin model that serves the purpose of predicting the margin of an AFL match. This model is a statistical tool that uses historical performance data and the Elo rating system to estimate how much one team is favored over the other, not just in terms of winning or losing but also in terms of the expected point difference. Load the following packages and if you haven’t, install these packages.
The data is sourced from the Fitzroy package using fetch_results_afltables which will give us the results for each game of the 2023 season.
afl2023 <- fitzRoy::fetch_results_afltables(season = 2023)
head(afl2023, )
## # A tibble: 6 × 16
## Game Date Round Home.Team Home.Goals Home.Behinds Home.Points Away.Team
## <dbl> <date> <chr> <chr> <int> <int> <int> <chr>
## 1 16191 2023-03-16 R1 Richmond 8 10 58 Carlton
## 2 16192 2023-03-17 R1 Geelong 16 7 103 Collingw…
## 3 16193 2023-03-18 R1 North Me… 12 15 87 West Coa…
## 4 16194 2023-03-18 R1 Port Ade… 18 18 126 Brisbane…
## 5 16195 2023-03-18 R1 Melbourne 17 13 115 Footscray
## 6 16196 2023-03-18 R1 Gold Coa… 9 7 61 Sydney
## # ℹ 8 more variables: Away.Goals <int>, Away.Behinds <int>, Away.Points <int>,
## # Venue <chr>, Margin <int>, Season <dbl>, Round.Type <chr>,
## # Round.Number <int>
The data is split between home and away teams.
A new column will be created which will represent the outcome of the game for the home team. A win = 1, a draw = 0.5 and a loss = 0. As mentioned previously, the data will be split into training data which will include the regular season and testing data which covers the finals matches.
afl2023$Result <- ifelse(afl2023$Margin > 0, 1,
ifelse(afl2023$Margin < 0, 0, 0.5))
train_data <- afl2023[afl2023$Season < 2023 | (afl2023$Season == 2023 & afl2023$Round.Type == "Regular"), ]
test_data <- afl2023[afl2023$Season == 2023 & afl2023$Round.Type == "Finals", ]
An initial elo rating of 1650 and k value of 30 will be assigned.
afl2023$Result <- ifelse(afl2023$Margin > 0, 1,
ifelse(afl2023$Margin < 0, 0, 0.5))
train_data <- afl2023[afl2023$Season < 2023 | (afl2023$Season == 2023 & afl2023$Round.Type == "Regular"), ]
test_data <- afl2023[afl2023$Season == 2023 & afl2023$Round.Type == "Finals", ]
An initial elo rating of 1650 and k value of 30 will be assigned. A new dataset will be created which will only consist of the 2023 finals teams.
elo_model_train <- elo::elo.run(initial_elos = 1650,
formula = Result ~ Home.Team + Away.Team,
k = 30,
data = train_data)
final_elos_train <- as.data.frame(final.elos(elo_model_train))
#Renaming column naames
colnames(final_elos_train) <- c("Elo")
#Team names are assigned to a new column
final_elos_train$Team <- rownames(final_elos_train)
Now we will be calculating the predicted margins of each finals match.
#Merging the Home ELO ratings and Away ELO ratings of each team
add_elo_diff <- function(data, elo_df) {
data <- merge(data, elo_df, by.x = "Home.Team", by.y = "Team", all.x = TRUE)
colnames(data)[which(colnames(data) == "Elo")] <- "Home.Elo"
data <- merge(data, elo_df, by.x = "Away.Team", by.y = "Team", all.x = TRUE)
colnames(data)[which(colnames(data) == "Elo")] <- "Away.Elo"
#Calculates the ELO rating difference between the home and away team
data$EloDifference <- data$Home.Elo - data$Away.Elo
return(data)
}
#Adding the elo ratings to the trian and test dataset
train_data <- add_elo_diff(train_data, final_elos_train)
test_data <- add_elo_diff(test_data, final_elos_train)
#Creating a linear regression model modelling the Margin as the response variable and EloDifference as the predictor
model <- lm(Margin ~ EloDifference, data = train_data)
#Generating predicted margins
test_data$Predicted_Margin <- predict(model, test_data)
#Calculates the error
errors <- test_data$Margin - test_data$Predicted_Margin
# MAE
mae <- mean(abs(errors))
# MSE
mse <- mean(errors^2)
# RMSE
rmse <- sqrt(mse)
print(paste("MAE:", round(mae, 2)))
## [1] "MAE: 17.62"
print(paste("MSE:", round(mse, 2)))
## [1] "MSE: 488.98"
print(paste("RMSE:", round(rmse, 2)))
## [1] "RMSE: 22.11"
From this, we are able to see the predicted match margins for the finals matches as well as the accuracy scores of this model. The accuracy scores are particularly low indicating that the model does a good job at predicting the margin of the finals mathces.