AFLTables.com provides historic data on every VFL/AFL game ever played, all the way back to the leagues inception in 1897.
The web-link can be found here: https://afltables.com/afl/stats/biglists/bg3.txt.
In this rpub, I will visualise change in average match-day scores from 1897 to the present. The approach herein will hopefuly encourage further analyses
First, lets load-in our dependencies :
library(readr)
library(tidyr)
library(dplyr)
library(anytime)
library(ggplot2)
library(plotly)
library(scales)
library(reshape)
library(data.table)
library(knitr)
library(kableExtra)Read URL to R, specifying columns and delimiters
#read in URL
url <- 'https://afltables.com/afl/stats/biglists/bg3.txt'
#specify columns
AllAFLData <- read_table(url, col_names = c("ID", "Date", "Round", "HomeTeam",
"HomeScore", "AwayTeam", "AwayScore",
"Venue"),
col_types = NULL, skip = 2)
#Convert to dataframe
AllAFLData <- data.frame(as.list(AllAFLData))
AllAFLData <- data.frame(AllAFLData, stringsAsFactors = FALSE) We should now have a dataframe with 8 variables, and 15,407 rows
| ID | Date | Round | HomeTeam | HomeScore | AwayTeam | AwayScore | Venue |
|---|---|---|---|---|---|---|---|
| 1 | 8-May-1897 | R1 | Fitzroy | 6.13.49 | Carlton | 2.4.16 | Brunswick St |
| 2 | 8-May-1897 | R1 | Collingwood | 5.11.41 | St Kilda | 2.4.16 | Victoria Park |
| 3 | 8-May-1897 | R1 | Geelong | 3.6.24 | Essendon | 7.5.47 | Corio Oval |
| 4 | 8-May-1897 | R1 | South Melbourne | 3.9.27 | Melbourne | 6.8.44 | Lake Oval |
| 5 | 15-May-1897 | R2 | South Melbourne | 6.4.40 | Carlton | 5.6.36 | Lake Oval |
| 6 | 15-May-1897 | R2 | Essendon | 4.6.30 | Collingwood | 8.2.50 | East Melbourne |
Change variable types
#Change variable types
AllAFLData$HomeScore <- as.character(AllAFLData$HomeScore)
AllAFLData$HomeTeam <- as.character(AllAFLData$HomeTeam)
AllAFLData$AwayTeam <- as.character(AllAFLData$AwayTeam)
AllAFLData$Date <- as.character(AllAFLData$Date)Extract out ‘Year’ from our date column
#Duplicate date column, and separate date into day, month and year columns
AllAFLData$Date2 = AllAFLData$Date
AllAFLData <- separate(AllAFLData, Date2,
into = c("Day", "Month", "Year"),
sep = "[-]")
AllAFLData$Day <- as.numeric(as.character(AllAFLData$Day))
AllAFLData$Year <- as.numeric(as.character(AllAFLData$Year))Separate ‘score’ column(s) into goals, behinds and total
#Separate score column into goals, behinds, and total
AllAFLData <- separate(AllAFLData, HomeScore,
into = c("HomeGoals", "HomeBehinds",
"HomeTotal"),
sep = "[.]")
AllAFLData <- separate(AllAFLData, AwayScore,
into = c("AwayGoals", "AwayBehinds",
"AwayTotal"),
sep = "[.]")Change score variables from character to numeric
## ID Date Round HomeTeam HomeGoals HomeBehinds
## "numeric" "character" "factor" "character" "character" "character"
## HomeTotal AwayTeam AwayGoals AwayBehinds AwayTotal Venue
## "character" "character" "character" "character" "character" "factor"
## Day Month Year
## "numeric" "character" "numeric"
#change newly created numeric columns to numeric
cols.num <- c("HomeGoals","HomeBehinds", "HomeTotal",
"AwayGoals", "AwayBehinds", "AwayTotal")
AllAFLData[cols.num] <- sapply(AllAFLData[cols.num],as.numeric)
#confirm change
sapply(AllAFLData, class)## ID Date Round HomeTeam HomeGoals HomeBehinds
## "numeric" "character" "factor" "character" "numeric" "numeric"
## HomeTotal AwayTeam AwayGoals AwayBehinds AwayTotal Venue
## "numeric" "character" "numeric" "numeric" "numeric" "factor"
## Day Month Year
## "numeric" "character" "numeric"
Aggregate teams who have merged
#change names and merge teams
AllAFLData[AllAFLData=="South Melbourne"] <- "South Melb/Syd"
AllAFLData[AllAFLData=="Sydney"] <- "South Melb/Syd"
AllAFLData[AllAFLData=="Fitzroy"] <- "Fitzroy/Brisbane"
AllAFLData[AllAFLData=="Brisbane Bears"] <- "Fitzroy/Brisbane"
AllAFLData[AllAFLData=="Brisbane Lions"] <- "Fitzroy/Brisbane"
AllAFLData[AllAFLData=="Kangaroos"] <- "North Melbourne"
AllAFLData[AllAFLData=="Footscray"] <- "Western Bulldogs"
AllAFLData[AllAFLData=="Western Bulldog"] <- "Western Bulldogs"
AllAFLData[AllAFLData=="GW Sydney"] <- "GWS"Finally, create an ‘era’ variable which breaks down games by the decade they occurred in. We will use this variable as a slicer for the visualisation
AllAFLData$Era <- ifelse(AllAFLData$Year>=1897 & AllAFLData$Year<=1899,"1890s",
ifelse(AllAFLData$Year>=1900 & AllAFLData$Year<=1909,"1900s",
ifelse(AllAFLData$Year>=1910 & AllAFLData$Year<=1919,"1910s",
ifelse(AllAFLData$Year>=1920 & AllAFLData$Year<=1929,"1920s",
ifelse(AllAFLData$Year>=1930 & AllAFLData$Year<=1939,"1930s",
ifelse(AllAFLData$Year>=1940 & AllAFLData$Year<=1949,"1940s",
ifelse(AllAFLData$Year>=1950 & AllAFLData$Year<=1959,"1950s",
ifelse(AllAFLData$Year>=1960 & AllAFLData$Year<=1969,"1960s",
ifelse(AllAFLData$Year>=1970 & AllAFLData$Year<=1979,"1970s",
ifelse(AllAFLData$Year>=1980 & AllAFLData$Year<=1989,"1980s",
ifelse(AllAFLData$Year>=1990 & AllAFLData$Year<=1999,"1990s",
ifelse(AllAFLData$Year>=2000 & AllAFLData$Year<=2009,"2000s",
ifelse(AllAFLData$Year>=2010 & AllAFLData$Year<=2019,"2010s", 0)))))))))))))Let’s begin building a table summarising the total number of games and total score of each team, per decade
#Total score per team, per decade for home games
EraScore <- AllAFLData %>%
gather(HomeAway, Team, HomeTeam) %>%
group_by(Era, Team) %>%
summarise(HomeTotal = sum(HomeTotal))
#Total score per team, per decade for away games
EraScore <- AllAFLData %>%
gather(HomeAway, Team, AwayTeam) %>%
group_by(Era, Team) %>%
summarise(AwayTotal = sum(AwayTotal)) %>%
left_join(EraScore,
by = c("Team" = "Team",
"Era" = "Era"))
#Total number of home & away games per team
EraScore <- AllAFLData %>%
gather(HomeAway, Team, HomeTeam, AwayTeam) %>%
group_by(Era, Team) %>%
summarise(GamesHome = sum(HomeAway == "HomeTeam"),
GamesAway = sum(HomeAway == "AwayTeam")) %>%
left_join(EraScore,
by = c("Team" = "Team",
"Era" = "Era")) Combine home and away scores and games to get absolute totals. Determine the average score per team, per game, per decade
#Total games per team, per decade
EraScore$TotalGames <- (EraScore$GamesHome +
EraScore$GamesAway)
#Total score per team, per decade
EraScore$TotalScore <- (EraScore$HomeTotal +
EraScore$AwayTotal)
#Average score per team, per decade
EraScore$AverageScore <- (EraScore$TotalScore /
EraScore$TotalGames)
#round to 2 decimal places
EraScore[,'AverageScore']=round(EraScore[,'AverageScore'],2)
#Re-order variables
EraScore <- EraScore[,c(1,2,3,4,7,5,6,8,9)]View our newly developed table
| Era | Team | GamesHome | GamesAway | TotalGames | AwayTotal | HomeTotal | TotalScore | AverageScore |
|---|---|---|---|---|---|---|---|---|
| 1890s | Carlton | 22 | 26 | 48 | 632 | 578 | 1210 | 25.21 |
| 1890s | Collingwood | 26 | 26 | 52 | 1087 | 1126 | 2213 | 42.56 |
| 1890s | Essendon | 27 | 25 | 52 | 979 | 1529 | 2508 | 48.23 |
| 1890s | Fitzroy/Brisbane | 28 | 23 | 51 | 903 | 1187 | 2090 | 40.98 |
| 1890s | Geelong | 27 | 24 | 51 | 1083 | 1534 | 2617 | 51.31 |
| 1890s | Melbourne | 22 | 29 | 51 | 960 | 1058 | 2018 | 39.57 |
Create a minimalistic theme for our animated plot
#minimalistic theme for visualisation
PlotlyTheme <-
theme(
panel.background = element_blank(),
panel.grid.major = element_line(color='light grey'),
plot.title = element_text(size=8, hjust=0.5),
axis.title.x = element_blank(),
axis.text = element_text(size=6),
axis.text.x = element_text(angle = 45, hjust = 1, colour="#606060"),
axis.text.y = element_text(hjust = 1, colour="#606060"),
axis.title = element_text(size=8),
axis.line.x = element_line(color="black", size = 0.5),
axis.line.y = element_line(color="black", size = 0.5),
legend.key = element_rect(fill = "white") +
scale_y_continuous(label = comma)
)Visualise the average score per match, per team, across each decade
#animated plot, summarising every teams average score per decade
animatedscore <- EraScore %>%
ggplot(aes(x = Team, y = AverageScore)) +
geom_point(aes(frame = Era), colour="#3599B8") +
stat_summary(aes(y = AverageScore, group = 1, frame = Era),
fun.y=mean, colour="#3599B8", geom = "line", group = 1) +
labs(title = "Average score per game, per decade for all AFL teams",
y = "Average score") +
PlotlyThemeThis interactive visualisation clearly shows that with each passing decade up until the 1980s, incremental increases in average score per team, per game can be observed. Scoring has increased nearly three-fold since the 1890s. From 1990 to the present, there has been a slight downward trend in overall scoring. This is probably attributable to more sophisticated defensive tactics as the game has matured into a professional competition.