Here are the packages that we will be using for this analysis. If you have never used these packages before use the ‘install.packages’ and firstly download the package, otherwise just use the ‘library’ function to load the packages.
#If not already installed, install first, remove hashtags
#install.packages("fitzroy")
#install.packages("dplyr")
#install.packages("tidtverse")
#If already loaded skip install process
library(fitzRoy)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.4
## ✔ ggplot2 3.4.4 ✔ stringr 1.5.0
## ✔ lubridate 1.9.3 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Lets import some data to use, we will be sourcing our data from the ‘fitzroy’ package, the fitzroy package contains many datasets for AFL and AFLW competitions. The ‘fetch_results_afltables’ will give us all the results for each game of the selected season. Aswell as the match results the ‘fitzroy’ package can also provide player data, player statistics and ladder positions for different seasons.
afl2022 <- fitzRoy::fetch_results_afltables(season = 2022)
Lets have a quick look at our dataset
head(afl2022, )
## # A tibble: 6 × 16
## Game Date Round Home.Team Home.Goals Home.Behinds Home.Points Away.Team
## <dbl> <date> <chr> <chr> <int> <int> <int> <chr>
## 1 15984 2022-03-16 R1 Melbourne 14 13 97 Footscray
## 2 15985 2022-03-17 R1 Carlton 14 17 101 Richmond
## 3 15986 2022-03-18 R1 St Kilda 12 13 85 Collingw…
## 4 15987 2022-03-19 R1 Geelong 20 18 138 Essendon
## 5 15988 2022-03-19 R1 GWS 13 14 92 Sydney
## 6 15989 2022-03-19 R1 Brisbane… 11 14 80 Port Ade…
## # ℹ 8 more variables: Away.Goals <int>, Away.Behinds <int>, Away.Points <int>,
## # Venue <chr>, Margin <int>, Season <dbl>, Round.Type <chr>,
## # Round.Number <int>
As it can be seen above, the dataset has 16 variables consisting of date, round, home team, away team, goals, behinds, total score, venue and margin.
For our analysis many of these variables are not necessary, so lets filter them out. Lets also create a new column which represents the ‘Result’ of the game, where a win = 1, a draw = 0.5 and a loss = 0 (this applies to the home team, so if the home team losses the result is 0, the away team does not have a seperate row for their result).
afl2022 <- afl2022 %>%
filter(Round.Type == "Regular") %>% #"Regular" omits finals series
mutate(Result = ifelse(Home.Points > Away.Points, 1, #Home Wins (1)
ifelse(Home.Points < Away.Points, 0, #Away wins (0)
0.5))) #Draw (0.5)
Elo rating is used to determine the strength of a player or team, after each time a player or team goes head to head, the result will change the respective elo ratings based on the winner. For example, a strong team like Geelong, would have a much higher elo rating than a weaker team like Gold Coast. If Geelong were to win their elo rating would not rise as much, but if Gold Coast were able to win the game their elo rating would rise significantly since Geelong are a strong team. We will be using the Elo package to give all elo ratings for each team after each game of the season.
#Install elo package if never used before, just remove the hashtag
#install.packages("elo")
#Skip install if previously used
library(elo)
Using the elo package we need to generate all the elo ratings for each team for all matches across the season. The history variable is whether or not you want teams to keep their elo rating for the next game, and for this since its an ongoing season we want this to be true.
elo_rating <- PlayerRatings::elo(afl2022[c("Round.Number", "Home.Team", "Away.Team",
"Result")], history = TRUE)
Now that the elo ratings have been created, we need to save them in a dataframe. Before we can look at this data we can filter it to only contain the first 23 columns, which is the 23 rounds of Elo data (the other columns are not needed for this analysis). Now lets view what this dataframe looks like.
ratings <- elo_rating$history %>% as.data.frame()
ratings <- ratings[, 1:23]
head(ratings)
## 1.Rating 2.Rating 3.Rating 4.Rating 5.Rating 6.Rating 7.Rating
## Adelaide 2186.5 2174.047 2187.547 2173.079 2187.629 2201.199 2185.865
## Brisbane Lions 2213.5 2225.953 2238.446 2224.018 2236.431 2248.083 2261.592
## Carlton 2213.5 2225.953 2239.453 2223.871 2234.545 2221.090 2231.866
## Collingwood 2213.5 2225.953 2211.446 2195.981 2183.568 2196.362 2209.107
## Essendon 2186.5 2174.047 2162.589 2177.057 2165.371 2152.577 2139.980
## Footscray 2186.5 2174.047 2189.589 2175.970 2189.430 2175.860 2188.457
## 8.Rating 9.Rating 10.Rating 11.Rating 12.Rating 13.Rating
## Adelaide 2174.142 2164.292 2153.529 2142.740 2154.530 2154.530
## Brisbane Lions 2270.460 2280.310 2262.394 2272.010 2257.796 2271.188
## Carlton 2243.589 2253.801 2266.366 2250.197 2250.197 2258.929
## Collingwood 2194.768 2180.472 2196.778 2212.946 2224.666 2240.124
## Essendon 2155.418 2144.347 2133.527 2121.961 2121.961 2113.229
## Footscray 2174.279 2188.574 2202.414 2212.349 2199.729 2199.729
## 14.Rating 15.Rating 16.Rating 17.Rating 18.Rating 19.Rating
## Adelaide 2143.046 2154.202 2145.177 2130.761 2122.347 2113.729
## Brisbane Lions 2271.188 2257.251 2269.413 2250.911 2260.273 2271.148
## Carlton 2244.082 2258.861 2243.480 2251.247 2239.192 2248.637
## Collingwood 2240.124 2250.170 2261.887 2268.445 2276.859 2286.274
## Essendon 2131.503 2115.980 2134.240 2152.742 2168.350 2158.934
## Footscray 2211.740 2222.710 2210.548 2197.637 2211.768 2227.901
## 20.Rating 21.Rating 22.Rating 23.Rating
## Adelaide 2132.223 2142.564 2152.562 2140.221
## Brisbane Lions 2255.030 2267.565 2278.856 2265.276
## Carlton 2230.144 2217.609 2205.944 2195.802
## Collingwood 2296.183 2308.973 2294.233 2304.374
## Essendon 2168.921 2153.295 2140.428 2130.763
## Footscray 2217.547 2205.455 2216.046 2227.041
There are a few things that we should change to the dataset to get it into a nice structure. First the Teams are currently as the rownames, but we need them in their own column rather than the row titles. Next we will pivot the data so it contains only 3 columns: Team, Round and Elo. So each team will have a row for each round which shows their elo score after that particular round.
ratings <- ratings %>%
mutate(Team = rownames(.)) %>%
select(Team, everything())
ratings_long <- ratings %>%
pivot_longer(cols = -Team,
names_to = "Round",
values_to = "Elo")
Lets get the data ready to plot neatly, at the moment the Rounds are not in order and it has a long name so wont show correctly on the ggplot. So lets shorten the name from for example “1.Rating” for round 1 to just show “1” as the round. Also the order of the rounds is going through all the rounds with 1’s, then 2’s and so on, so lets change this to go from round 1 to round 23.
ratings_long <- ratings_long %>%
mutate(Round = sub("\\.Rating", "", Round)) %>%
mutate(Round = factor(Round, levels = c("1", "2", "3", "4", "5", "6", "7",
"8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19",
"20", "21", "22", "23")))
Using ‘ggplot’ we can plot all these elo ratings on a graph, if we use the ‘geom_line’ function each teams elo ratings will be connected and show as a timeline.
ratings_long %>%
ggplot(aes(x = Round, y = Elo, group = Team)) +
geom_line()
Finally lets make the graph look nicer with some different colours for the teams, titles (plot, x-axis and y-axis) and giving it a sharper look through the ‘theme_minimal’
ratings_long %>%
ggplot(aes(x = Round, y = Elo, group = Team, col = Team)) +
geom_line(size = 1) +
labs(title = "Elo Ratings 2022",
x = "Round",
y = "Elo Rating",
color = "Team") +
theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.