Load Packages

Here are the packages that we will be using for this analysis. If you have never used these packages before use the ‘install.packages’ and firstly download the package, otherwise just use the ‘library’ function to load the packages.

#If not already installed, install first, remove hashtags
#install.packages("fitzroy")
#install.packages("dplyr")
#install.packages("tidtverse")

#If already loaded skip install process
library(fitzRoy)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.4
## ✔ ggplot2   3.4.4     ✔ stringr   1.5.0
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Data Import

Lets import some data to use, we will be sourcing our data from the ‘fitzroy’ package, the fitzroy package contains many datasets for AFL and AFLW competitions. The ‘fetch_results_afltables’ will give us all the results for each game of the selected season. Aswell as the match results the ‘fitzroy’ package can also provide player data, player statistics and ladder positions for different seasons.

afl2022 <- fitzRoy::fetch_results_afltables(season = 2022)

View dataset

Lets have a quick look at our dataset

head(afl2022, )
## # A tibble: 6 × 16
##    Game Date       Round Home.Team Home.Goals Home.Behinds Home.Points Away.Team
##   <dbl> <date>     <chr> <chr>          <int>        <int>       <int> <chr>    
## 1 15984 2022-03-16 R1    Melbourne         14           13          97 Footscray
## 2 15985 2022-03-17 R1    Carlton           14           17         101 Richmond 
## 3 15986 2022-03-18 R1    St Kilda          12           13          85 Collingw…
## 4 15987 2022-03-19 R1    Geelong           20           18         138 Essendon 
## 5 15988 2022-03-19 R1    GWS               13           14          92 Sydney   
## 6 15989 2022-03-19 R1    Brisbane…         11           14          80 Port Ade…
## # ℹ 8 more variables: Away.Goals <int>, Away.Behinds <int>, Away.Points <int>,
## #   Venue <chr>, Margin <int>, Season <dbl>, Round.Type <chr>,
## #   Round.Number <int>

As it can be seen above, the dataset has 16 variables consisting of date, round, home team, away team, goals, behinds, total score, venue and margin.

Filter the dataset

For our analysis many of these variables are not necessary, so lets filter them out. Lets also create a new column which represents the ‘Result’ of the game, where a win = 1, a draw = 0.5 and a loss = 0 (this applies to the home team, so if the home team losses the result is 0, the away team does not have a seperate row for their result).

afl2022 <- afl2022 %>% 
  filter(Round.Type == "Regular") %>%            #"Regular" omits finals series
  mutate(Result = ifelse(Home.Points > Away.Points, 1,          #Home Wins (1)
                         ifelse(Home.Points < Away.Points, 0,   #Away wins (0)
                                0.5)))                          #Draw (0.5)

Load Elo Package

Elo rating is used to determine the strength of a player or team, after each time a player or team goes head to head, the result will change the respective elo ratings based on the winner. For example, a strong team like Geelong, would have a much higher elo rating than a weaker team like Gold Coast. If Geelong were to win their elo rating would not rise as much, but if Gold Coast were able to win the game their elo rating would rise significantly since Geelong are a strong team. We will be using the Elo package to give all elo ratings for each team after each game of the season.

#Install elo package if never used before, just remove the hashtag
#install.packages("elo")

#Skip install if previously used
library(elo)

Create the elo ratings for each round

Using the elo package we need to generate all the elo ratings for each team for all matches across the season. The history variable is whether or not you want teams to keep their elo rating for the next game, and for this since its an ongoing season we want this to be true.

elo_rating <- PlayerRatings::elo(afl2022[c("Round.Number", "Home.Team", "Away.Team",
                                           "Result")], history = TRUE)

Save the ratings into a dataframe

Now that the elo ratings have been created, we need to save them in a dataframe. Before we can look at this data we can filter it to only contain the first 23 columns, which is the 23 rounds of Elo data (the other columns are not needed for this analysis). Now lets view what this dataframe looks like.

ratings <- elo_rating$history %>% as.data.frame()

ratings <- ratings[, 1:23]

head(ratings)
##                1.Rating 2.Rating 3.Rating 4.Rating 5.Rating 6.Rating 7.Rating
## Adelaide         2186.5 2174.047 2187.547 2173.079 2187.629 2201.199 2185.865
## Brisbane Lions   2213.5 2225.953 2238.446 2224.018 2236.431 2248.083 2261.592
## Carlton          2213.5 2225.953 2239.453 2223.871 2234.545 2221.090 2231.866
## Collingwood      2213.5 2225.953 2211.446 2195.981 2183.568 2196.362 2209.107
## Essendon         2186.5 2174.047 2162.589 2177.057 2165.371 2152.577 2139.980
## Footscray        2186.5 2174.047 2189.589 2175.970 2189.430 2175.860 2188.457
##                8.Rating 9.Rating 10.Rating 11.Rating 12.Rating 13.Rating
## Adelaide       2174.142 2164.292  2153.529  2142.740  2154.530  2154.530
## Brisbane Lions 2270.460 2280.310  2262.394  2272.010  2257.796  2271.188
## Carlton        2243.589 2253.801  2266.366  2250.197  2250.197  2258.929
## Collingwood    2194.768 2180.472  2196.778  2212.946  2224.666  2240.124
## Essendon       2155.418 2144.347  2133.527  2121.961  2121.961  2113.229
## Footscray      2174.279 2188.574  2202.414  2212.349  2199.729  2199.729
##                14.Rating 15.Rating 16.Rating 17.Rating 18.Rating 19.Rating
## Adelaide        2143.046  2154.202  2145.177  2130.761  2122.347  2113.729
## Brisbane Lions  2271.188  2257.251  2269.413  2250.911  2260.273  2271.148
## Carlton         2244.082  2258.861  2243.480  2251.247  2239.192  2248.637
## Collingwood     2240.124  2250.170  2261.887  2268.445  2276.859  2286.274
## Essendon        2131.503  2115.980  2134.240  2152.742  2168.350  2158.934
## Footscray       2211.740  2222.710  2210.548  2197.637  2211.768  2227.901
##                20.Rating 21.Rating 22.Rating 23.Rating
## Adelaide        2132.223  2142.564  2152.562  2140.221
## Brisbane Lions  2255.030  2267.565  2278.856  2265.276
## Carlton         2230.144  2217.609  2205.944  2195.802
## Collingwood     2296.183  2308.973  2294.233  2304.374
## Essendon        2168.921  2153.295  2140.428  2130.763
## Footscray       2217.547  2205.455  2216.046  2227.041

Restructure the data

There are a few things that we should change to the dataset to get it into a nice structure. First the Teams are currently as the rownames, but we need them in their own column rather than the row titles. Next we will pivot the data so it contains only 3 columns: Team, Round and Elo. So each team will have a row for each round which shows their elo score after that particular round.

ratings <- ratings %>% 
  mutate(Team = rownames(.)) %>% 
  select(Team, everything())

ratings_long <- ratings %>% 
  pivot_longer(cols = -Team, 
               names_to = "Round", 
               values_to = "Elo")

Tidy the data

Lets get the data ready to plot neatly, at the moment the Rounds are not in order and it has a long name so wont show correctly on the ggplot. So lets shorten the name from for example “1.Rating” for round 1 to just show “1” as the round. Also the order of the rounds is going through all the rounds with 1’s, then 2’s and so on, so lets change this to go from round 1 to round 23.

ratings_long <- ratings_long %>% 
  mutate(Round = sub("\\.Rating", "", Round)) %>% 
  mutate(Round = factor(Round, levels = c("1", "2", "3", "4", "5", "6", "7",
                                          "8", "9", "10", "11", "12", "13",
                                          "14", "15", "16", "17", "18", "19",
                                          "20", "21", "22", "23")))

Plot the data

Using ‘ggplot’ we can plot all these elo ratings on a graph, if we use the ‘geom_line’ function each teams elo ratings will be connected and show as a timeline.

ratings_long %>% 
  ggplot(aes(x = Round, y = Elo, group = Team)) +
  geom_line()

Make the graph look nice

Finally lets make the graph look nicer with some different colours for the teams, titles (plot, x-axis and y-axis) and giving it a sharper look through the ‘theme_minimal’

ratings_long %>% 
  ggplot(aes(x = Round, y = Elo, group = Team, col = Team)) +
  geom_line(size = 1) +
  labs(title = "Elo Ratings 2022", 
       x = "Round", 
       y = "Elo Rating", 
       color = "Team") +
  theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.