To build the ELO Ranks, i have search the web for some historical results of the Serie A league, and put my hands on several dataframe starting from back in the 2000’s. I have allready cleaned and tidyed a bit the different dataframe with SQL, so we’ll jump straight into R.

install.packages(‘tinytex’)

Load tidiverse, and get the data into R

library("tidyverse")
## -- Attaching packages ------------------------------------------------------------------------------------------------------------------------------------------------------------------ tidyverse 1.3.0 --
## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.3     v dplyr   1.0.2
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0
## -- Conflicts --------------------------------------------------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
serie_a_2020 <- read.csv("serie_a_2019_2020.csv")
serie_a_2019 <- read.csv("serie_a_2018_2019.csv")
serie_a_2018 <- read.csv("serie_a_2017_2018.csv")
serie_a_2017 <- read.csv("serie_a_2016_2017.csv")
serie_a_2016 <- read.csv("serie_a_2015_2016.csv")
serie_a_2015 <- read.csv("serie_a_2014_2015.csv")
serie_a_2014 <- read.csv("serie_a_2013_2014.csv")
serie_a_2013 <- read.csv("serie_a_2012_2013.csv")
serie_a_2012 <- read.csv("serie_a_2011_2012.csv")
serie_a_2011 <- read.csv("serie_a_2010_2011.csv")
serie_a_2010 <- read.csv("serie_a_2009_2010.csv")
serie_a_2009 <- read.csv("serie_a_2008_2009.csv")
serie_a_2008 <- read.csv("serie_a_2007_2008.csv")
serie_a_2007 <- read.csv("serie_a_2006_2007.csv")
serie_a_2006 <- read.csv("serie_a_2005_2006.csv")
serie_a_2005 <- read.csv("serie_a_2004_2005.csv")
serie_a_2004 <- read.csv("serie_a_2003_2004.csv")
serie_a_2003 <- read.csv("serie_a_2002_2003.csv")
serie_a_2002 <- read.csv("serie_a_2001_2002.csv")
serie_a_2001 <- read.csv("serie_a_2000_2001.csv")
serie_a_2000 <- read.csv("serie_a_1999_2000.csv")

Now select some variables of interest from the database, we’ll need them later

serie_a_2000<- serie_a_2000 %>%
  select(Date, HomeTeam, AwayTeam, FTHG, FTAG, FTR, HTHG, HTAG, HTR)
serie_a_2001<- serie_a_2001 %>%
  select(Date, HomeTeam, AwayTeam, FTHG, FTAG, FTR, HTHG, HTAG, HTR)
serie_a_2002<- serie_a_2002 %>%
  select(Date, HomeTeam, AwayTeam, FTHG, FTAG, FTR, HTHG, HTAG, HTR)
serie_a_2003<- serie_a_2003 %>%
  select(Date, HomeTeam, AwayTeam, FTHG, FTAG, FTR, HTHG, HTAG, HTR)
serie_a_2004<- serie_a_2004 %>%
  select(Date, HomeTeam, AwayTeam, FTHG, FTAG, FTR, HTHG, HTAG, HTR)
serie_a_2005<- serie_a_2005 %>%
  select(Date, HomeTeam, AwayTeam, FTHG, FTAG, FTR, HTHG, HTAG, HTR)
serie_a_2006<- serie_a_2006 %>%
  select(Date, HomeTeam, AwayTeam, FTHG, FTAG, FTR, HTHG, HTAG, HTR)
serie_a_2007<- serie_a_2007 %>%
  select(Date, HomeTeam, AwayTeam, FTHG, FTAG, FTR, HTHG, HTAG, HTR)
serie_a_2008<- serie_a_2008 %>%
  select(Date, HomeTeam, AwayTeam, FTHG, FTAG, FTR, HTHG, HTAG, HTR)
serie_a_2009<- serie_a_2009 %>%
  select(Date, HomeTeam, AwayTeam, FTHG, FTAG, FTR, HTHG, HTAG, HTR)
serie_a_2010<- serie_a_2010 %>%
  select(Date, HomeTeam, AwayTeam, FTHG, FTAG, FTR, HTHG, HTAG, HTR)
serie_a_2011<- serie_a_2011 %>%
  select(Date, HomeTeam, AwayTeam, FTHG, FTAG, FTR, HTHG, HTAG, HTR)
serie_a_2012<- serie_a_2012 %>%
  select(Date, HomeTeam, AwayTeam, FTHG, FTAG, FTR, HTHG, HTAG, HTR)
serie_a_2013<- serie_a_2013 %>%
  select(Date, HomeTeam, AwayTeam, FTHG, FTAG, FTR, HTHG, HTAG, HTR)
serie_a_2014<- serie_a_2014 %>%
  select(Date, HomeTeam, AwayTeam, FTHG, FTAG, FTR, HTHG, HTAG, HTR)
serie_a_2015<- serie_a_2015 %>%
  select(Date, HomeTeam, AwayTeam, FTHG, FTAG, FTR, HTHG, HTAG, HTR)
serie_a_2016<- serie_a_2016 %>%
  select(Date, HomeTeam, AwayTeam, FTHG, FTAG, FTR, HTHG, HTAG, HTR)
serie_a_2017<- serie_a_2017 %>%
  select(Date, HomeTeam, AwayTeam, FTHG, FTAG, FTR, HTHG, HTAG, HTR)
 serie_a_2018<- serie_a_2018 %>%
  select(Date, HomeTeam, AwayTeam, FTHG, FTAG, FTR, HTHG, HTAG, HTR)
serie_a_2019<- serie_a_2019 %>%
  select(Date, HomeTeam, AwayTeam, FTHG, FTAG, FTR, HTHG, HTAG, HTR)
serie_a_2020<- serie_a_2020 %>%
  select(Date, HomeTeam, AwayTeam, FTHG, FTAG, FTR, HTHG, HTAG, HTR)

Now we can merge the database into one (we can use rbind, because all dataframe has the same columns names and number)

serie_a_total <- rbind(serie_a_2000, serie_a_2001, serie_a_2002,
                       serie_a_2003,serie_a_2004, serie_a_2005, serie_a_2006,
                       serie_a_2007, serie_a_2008, serie_a_2009, serie_a_2010,
                       serie_a_2011, serie_a_2012, serie_a_2013, serie_a_2014,
                       serie_a_2015, serie_a_2016, serie_a_2017, serie_a_2018,
                       serie_a_2019, serie_a_2020)

Let’s change some variable name as I still have trouble interpreting some of them

serie_a_total <- serie_a_total %>%
  rename(FullTimeHomeTeamGoals= FTHG,
         FullTimeAwayTeamGoals= FTAG,
         FullTimeResult= FTR,
         HalfTimeHomeTeamGoals= HTHG,
         HalfTimeAwayTeamGoals= HTAG,
         HalfTimeResult= HTR)

Now the first thing to do, we have to create another dataframe to store each teams ELO RATING, and obviously update it after avery single match.

serie_a_teams <- data.frame(team = unique(c(serie_a_total$HomeTeam, serie_a_total$AwayTeam)))

Ok, ready to move on! Before we begin playing with Elo ratings, we need to assign an initial Elo value to all of the Serie A Teams we have. We can set this value to 1200.

serie_a_teams<- serie_a_teams %>%
   mutate(elo = 1200)

For each football game played, we’ll create a variable showing who won. We’ll set the variable values to: #### 1 if the home team won #### 0 if the away team won #### 0.5 for a draw

serie_a_total<- serie_a_total %>%
  mutate(GameResult = if_else(FullTimeHomeTeamGoals>FullTimeAwayTeamGoals,
  1,
  if_else(FullTimeHomeTeamGoals == FullTimeAwayTeamGoals, 0.5, 0)))

Now we install and load the most important package for our ELO RATING SYSTEM

library(elo)
## Warning: package 'elo' was built under R version 4.0.5

The difficult part, we have to write our program. It won’t be to difficult, and thanks to Edouard Mathiueu and elo CRAN package instruction everyone can easily understand a bit more. We’ll loop over every single game, get pre-match ratings and update them accordingly to our historical saved results.

for (i in seq_len(nrow(serie_a_total))) {

  match <- serie_a_total[i, ]

  teamA_elo <- subset(serie_a_teams, team == match$HomeTeam)$elo

  teamB_elo <- subset(serie_a_teams, team == match$AwayTeam)$elo

  new_elo <- elo.calc(wins.A = match$GameResult,
                      elo.A = teamA_elo,
                      elo.B = teamB_elo,
                      k = 32)

  teamA_new_elo <- new_elo[1, 1]

  teamB_new_elo <- new_elo[1, 2]

  serie_a_teams <- serie_a_teams %>%

    mutate(elo = if_else(team == match$HomeTeam, teamA_new_elo,

     if_else(team == match$AwayTeam, teamB_new_elo, elo)))

}

Let’s wait for R to run the code and then check it out!

serie_a_teams %>%
arrange(-elo)
##          team      elo
## 1    Juventus 1551.086
## 2       Inter 1464.413
## 3        Roma 1444.377
## 4       Lazio 1436.622
## 5    Atalanta 1412.872
## 6      Napoli 1369.907
## 7       Milan 1327.346
## 8      Torino 1303.022
## 9    Cagliari 1288.759
## 10    Bologna 1274.334
## 11      Parma 1255.946
## 12   Sassuolo 1242.973
## 13    Udinese 1236.299
## 14      Carpi 1232.347
## 15  Sampdoria 1221.050
## 16     Empoli 1207.038
## 17    Perugia 1206.563
## 18 Fiorentina 1199.820
## 19    Crotone 1191.961
## 20    Catania 1189.770
## 21      Lecce 1185.787
## 22       Spal 1176.548
## 23    Reggina 1173.585
## 24     Novara 1170.369
## 25    Vicenza 1168.511
## 26     Verona 1163.226
## 27      Siena 1155.915
## 28      Genoa 1150.361
## 29   Piacenza 1144.016
## 30    Brescia 1143.050
## 31     Modena 1140.293
## 32       Como 1129.224
## 33    Palermo 1128.835
## 34     Ascoli 1125.048
## 35     Chievo 1115.004
## 36  Frosinone 1112.577
## 37  Benevento 1109.516
## 38       Bari 1087.791
## 39    Treviso 1081.600
## 40     Cesena 1075.150
## 41    Livorno 1057.786
## 42    Messina 1057.052
## 43     Ancona 1044.090
## 44    Pescara 1028.919
## 45    Venezia 1019.244

Now we can select only those team playing in the current 2021 Serie A league season

serie_a_teams_2021 <- serie_a_teams %>%
  filter(team %in% c("Roma", "Milan","Napoli", "Inter", "Udinese","Bologna",
                     "Lazio", "Fiorentina", "Sassuolo", "Atalanta", "Torino",
                     "Empoli", "Genoa", "Venezia", "Sampdoria", "Juventus",
                     "Cagliari", "Spezia","Verona", "Salernitana")) %>%

arrange(-elo)

print.data.frame(serie_a_teams_2021)
##          team      elo
## 1    Juventus 1551.086
## 2       Inter 1464.413
## 3        Roma 1444.377
## 4       Lazio 1436.622
## 5    Atalanta 1412.872
## 6      Napoli 1369.907
## 7       Milan 1327.346
## 8      Torino 1303.022
## 9    Cagliari 1288.759
## 10    Bologna 1274.334
## 11   Sassuolo 1242.973
## 12    Udinese 1236.299
## 13  Sampdoria 1221.050
## 14     Empoli 1207.038
## 15 Fiorentina 1199.820
## 16     Verona 1163.226
## 17      Genoa 1150.361
## 18    Venezia 1019.244

Roma is only in the 6 raw…..well i guess I won’t get married that soon will I?

Let’s calculate probabilities for an individual game, like the first one of the season: AS ROMA VS Fiorentina. We’ll use the elo.prob function

Roma <- subset(serie_a_teams_2021, team == "Roma")$elo

Fiorentina <- subset(serie_a_teams_2021, team == "Fiorentina")$elo

elo.prob(Roma, Fiorentina)
## [1] 0.8034168

Only 56% chance of winning…not bad

(AS ROMA actually won that match 3-1, in a very balanced match with one red card for each team)

Latest Update: from when i decided to create this bet, AS ROMA has, in order: Signed Jose Mourinho as our coach + spent nearly 50.0000 pounds for a single player (most expensive agreement in our history) + won the first four games….should i look for a suite allready?