Elo Rating can be used in many sports to look for a player or a
team’s evolution. Elo rating method measures the relative strength of a
player/team in games, compared to other players. In this document I have
transposed this method to rugby union using a data set provided by
Kaggle (https://www.kaggle.com/datasets/lylebegbie/international-rugby-union-results-from-18712022).
As for every data set we will need to load and inspect the data.
# Loading the data set
RU_data <- read.csv("results.csv")
# Observing the first 10 rows
head(RU_data)
FALSE date home_team away_team home_score away_score
FALSE 1 1871-03-27 Scotland England 1 0
FALSE 2 1872-02-05 England Scotland 2 1
FALSE 3 1873-03-03 Scotland England 0 0
FALSE 4 1874-02-23 England Scotland 1 0
FALSE 5 1875-02-15 England Ireland 2 0
FALSE 6 1875-03-08 Scotland England 0 0
FALSE competition stadium
FALSE 1 1871 Scotland versus England rugby union match Raeburn Place
FALSE 2 1871-72 Home Nations rugby union matches The Oval
FALSE 3 1872-73 Home Nations rugby union matches West of Scotland F.C.
FALSE 4 1873-74 Home Nations rugby union matches The Oval
FALSE 5 1874–75 Home Nations rugby union matches The Oval
FALSE 6 1874-75 Home Nations rugby union matches Raeburn Place
FALSE city country neutral world_cup
FALSE 1 Edinburgh Scotland False False
FALSE 2 London England False False
FALSE 3 Glasgow Scotland False False
FALSE 4 London England False False
FALSE 5 London England False False
FALSE 6 Edinburgh Scotland False False
To transform and rearrange the data, we will load and use the
tidyverse package. The next step will be to remove all
matches that occurred before 2002, and focus on the last 20 years. To
facilitate this, the column date is transformed from a character class
to a Date.
An extra column is created in the data set.
This column will display the results of the home team (win = 1 / draw =
0.5 / lost = 0).
# Loading the package
library(tidyverse)
# Transform the date column
RU_data$date <- as.Date(RU_data$date)
# Remove the data older than 2002
RU_data <- RU_data%>%
filter(date > '2002-01-01')
# Adding the extra column
RU_data <- RU_data%>%
mutate(H_Score = ifelse(home_score > away_score, 1,
ifelse(home_score < away_score , 0 ,0.5)))
Using the package PlayerRatings, a new data frame is
created to evaluate each team’s performance.
# Loading the library
library(PlayerRatings)
# Creating the new data frame
elo <- elo::elo.run(formula = H_Score ~ home_team + away_team,
initial_elos = 2500,
k = 50,
data = RU_data) %>%
as.data.frame()
# Looking at the new data frame
head(elo)
## team.A team.B p.A wins.A update.A update.B elo.A elo.B
## 1 Scotland England 0.5000000 0 -25.00000 25.00000 1475.000 1525.000
## 2 France Italy 0.5000000 1 25.00000 -25.00000 1525.000 1475.000
## 3 Ireland Wales 0.5000000 1 25.00000 -25.00000 1525.000 1475.000
## 4 Italy Scotland 0.5000000 0 -25.00000 25.00000 1450.000 1500.000
## 5 Wales France 0.4285369 0 -21.42684 21.42684 1453.573 1546.427
## 6 England Ireland 0.5000000 1 25.00000 -25.00000 1550.000 1500.000
To accurately see the evolution of the Elo score, one team will be selected. The French team’s score will be observed. Few steps needs to be done before the observation. First, a unique match identifier needs to be created for both date frame (to be able to merge the data into one data frame). Then, the teams will be filtered and only the French score one will be kept
# Transforming the row numbers as identifier
elo <-rownames_to_column(elo)
RU_data <-rownames_to_column(RU_data)
# Remove the non French team
Home <- elo%>%
filter(team.A == "France")
Away <- elo%>%
filter(team.B == "France")
# Select the columns related to the Team
Home <- Home%>%
select(rowname, team.A, elo.A)
Away <- Away%>%
select(rowname, team.B, elo.B)
# Renaming the columns
Home <- Home %>% rename(team = team.A, elo = elo.A)
Away <- Away %>% rename(team = team.B, elo = elo.B)
# Binding the data frame
France_elo <- bind_rows(Away, Home)
# Checking the data
head(France_elo)
## rowname team elo
## 1 5 France 1546.427
## 2 10 France 1589.901
## 3 19 France 1578.786
## 4 22 France 1548.212
## 5 25 France 1521.944
## 6 44 France 1530.969
Adding the date to the new data set
# Using two for loops to add accurately the date to the new data frame
for (i in 1:nrow(France_elo)) {
for(n in 1:nrow(RU_data)){
Date <- ifelse(RU_data$rowname[n] == France_elo$rowname[i], RU_data$date[n], next)
France_elo$Date[i] <- as.Date(Date, origin="1970-01-01")
}
}
France_elo$Date <- as.Date(France_elo$Date, origin="1970-01-01")
Plotting France’s rugby union Elo rating score from 2002 to today
ggplot(France_elo, aes(x=Date, y=elo))+
geom_line()+
theme_bw()+
ggtitle("French Elo rating score 2002-2022")
The Elo score rating of the national French rugby union team was at its highest around 2007-2008. Their score was at its lowest between 2015-2020. Since 2020 we observed an important increase. According to the observation, the performances of the French national team looks to improve significantly in the last 2 years.
This page is simply an introduction to the Kaggle data
set and to the PlayerRatings package. We invite you to look
more deeply into the data as there are many more interesting
information. Similarly the package PlayerRatings has many
more feature to explore so feel free to use it and develop your skills.
Our next page will be on the Elo rating prediction using the same data
set we invite you to visit it if interested: https://rpubs.com/Patault_M/959669