AFLTables.com provides historic data on every VFL/AFL game ever played, all the way back to the leagues inception in 1897.
The web-link can be found here: https://afltables.com/afl/stats/biglists/bg3.txt.
In this rpub, I will clean, extract features, aggregate, and visualise some of this data. The approach herein will hopefuly encourage further analyses
First, lets load-in our dependencies :
library(readr)
library(tidyr)
library(dplyr)
library(anytime)
library(ggplot2)
library(scales)
library(reshape)
library(data.table)
library(knitr)
library(kableExtra)Read URL to R, specifying columns and delimiters
#read in URL
url <- 'https://afltables.com/afl/stats/biglists/bg3.txt'
#specify columns
AllAFLData <- read_table(url, col_names = c("ID", "Date", "Round", "HomeTeam",
"HomeScore", "AwayTeam", "AwayScore",
"Venue"),
col_types = NULL, skip = 2)
#Convert to dataframe
AllAFLData <- data.frame(as.list(AllAFLData))
AllAFLData <- data.frame(AllAFLData, stringsAsFactors = FALSE) We should now have a dataframe with 8 variables, and 15,407 rows
| ID | Date | Round | HomeTeam | HomeScore | AwayTeam | AwayScore | Venue |
|---|---|---|---|---|---|---|---|
| 1 | 8-May-1897 | R1 | Fitzroy | 6.13.49 | Carlton | 2.4.16 | Brunswick St |
| 2 | 8-May-1897 | R1 | Collingwood | 5.11.41 | St Kilda | 2.4.16 | Victoria Park |
| 3 | 8-May-1897 | R1 | Geelong | 3.6.24 | Essendon | 7.5.47 | Corio Oval |
| 4 | 8-May-1897 | R1 | South Melbourne | 3.9.27 | Melbourne | 6.8.44 | Lake Oval |
| 5 | 15-May-1897 | R2 | South Melbourne | 6.4.40 | Carlton | 5.6.36 | Lake Oval |
| 6 | 15-May-1897 | R2 | Essendon | 4.6.30 | Collingwood | 8.2.50 | East Melbourne |
Change variable types
#Change variable types
AllAFLData$HomeScore <- as.character(AllAFLData$HomeScore)
AllAFLData$HomeTeam <- as.character(AllAFLData$HomeTeam)
AllAFLData$AwayTeam <- as.character(AllAFLData$AwayTeam)
AllAFLData$Date <- as.character(AllAFLData$Date)Extract out ‘Year’ from our date column
#Duplicate date column, and separate date into day, month and year columns
AllAFLData$Date2 = AllAFLData$Date
AllAFLData <- separate(AllAFLData, Date2,
into = c("Day", "Month", "Year"),
sep = "[-]")
AllAFLData$Day <- as.numeric(as.character(AllAFLData$Day))
AllAFLData$Year <- as.numeric(as.character(AllAFLData$Year))Separate ‘score’ column(s) into goals, behinds and total
#Separate score column into goals, behinds, and total
AllAFLData <- separate(AllAFLData, HomeScore,
into = c("HomeGoals", "HomeBehinds",
"HomeTotal"),
sep = "[.]")
AllAFLData <- separate(AllAFLData, AwayScore,
into = c("AwayGoals", "AwayBehinds",
"AwayTotal"),
sep = "[.]")Change score variables from character to numeric
## ID Date Round HomeTeam HomeGoals HomeBehinds
## "numeric" "character" "factor" "character" "character" "character"
## HomeTotal AwayTeam AwayGoals AwayBehinds AwayTotal Venue
## "character" "character" "character" "character" "character" "factor"
## Day Month Year
## "numeric" "character" "numeric"
#change newly created numeric columns to numeric
cols.num <- c("HomeGoals","HomeBehinds", "HomeTotal",
"AwayGoals", "AwayBehinds", "AwayTotal")
AllAFLData[cols.num] <- sapply(AllAFLData[cols.num],as.numeric)
#confirm change
sapply(AllAFLData, class)## ID Date Round HomeTeam HomeGoals HomeBehinds
## "numeric" "character" "factor" "character" "numeric" "numeric"
## HomeTotal AwayTeam AwayGoals AwayBehinds AwayTotal Venue
## "numeric" "character" "numeric" "numeric" "numeric" "factor"
## Day Month Year
## "numeric" "character" "numeric"
Check the number of unique elements for factors we may wish to concatenate
#print unique elements in venue, team and round
Venue <- unique(AllAFLData$Venue)
HomeTeam <- unique(AllAFLData$HomeTeam)
Round <- unique(AllAFLData$Round)## [1] Brunswick St Victoria Park Corio Oval
## [4] Lake Oval East Melbourne Junction Oval
## [7] M.C.G. Princes Park S.C.G.
## [10] Punt Rd Windy Hill Glenferrie Oval
## [13] Arden St Western Oval Olympic Park
## [16] Kardinia Park Yarraville Oval Toorak Park
## [19] Euroa North Hobart Yallourn
## [22] Albury Brisbane Exhibition Moorabbin Oval
## [25] Coburg Oval Waverley Park Gabba
## [28] Subiaco Carrara W.A.C.A.
## [31] Football Park Bruce Stadium Manuka Oval
## [34] Docklands York Park Stadium Australia
## [37] Marrara Oval Cazaly's Stadium Adelaide Oval
## [40] Bellerive Oval Blacktown Sydney Showground
## [43] Wellington Traeger Park Jiangwan Stadium
## [46] Eureka Stadium Perth Stadium
## 47 Levels: Adelaide Oval Albury Arden St Bellerive Oval ... York Park
## [1] "Fitzroy" "Collingwood" "Geelong"
## [4] "South Melbourne" "Essendon" "St Kilda"
## [7] "Melbourne" "Carlton" "Richmond"
## [10] "University" "Hawthorn" "North Melbourne"
## [13] "Footscray" "Sydney" "West Coast"
## [16] "Brisbane Bears" "Adelaide" "Fremantle"
## [19] "Western Bulldog" "Brisbane Lions" "Port Adelaide"
## [22] "Kangaroos" "Gold Coast" "GW Sydney"
## [1] R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 SF R15 R16
## [18] R17 GF PF R18 R19 R20 R21 R22 QF EF R23 R24
## 29 Levels: EF GF PF QF R1 R10 R11 R12 R13 R14 R15 R16 R17 R18 R19 ... SF
Determine which state in Australia games have been played, based upon venue
#aggregate stadiums into 'state' variable
AllAFLData$VenueState[
AllAFLData$Venue == "Brunswick St" |
AllAFLData$Venue == "Victoria Park" |
AllAFLData$Venue == "Corio Oval" |
AllAFLData$Venue == "Lake Oval" |
AllAFLData$Venue == "East Melbourne" |
AllAFLData$Venue == "Junction Oval" |
AllAFLData$Venue == "M.C.G." |
AllAFLData$Venue == "Princes Park" |
AllAFLData$Venue == "Punt Rd" |
AllAFLData$Venue == "Windy Hill" |
AllAFLData$Venue == "Glenferrie Oval" |
AllAFLData$Venue == "Arden St" |
AllAFLData$Venue == "Western Oval" |
AllAFLData$Venue == "Olympic Park" |
AllAFLData$Venue == "Kardinia Park" |
AllAFLData$Venue == "Yarraville Oval" |
AllAFLData$Venue == "Toorak Park" |
AllAFLData$Venue == "Euroa" |
AllAFLData$Venue == "Yallourn" |
AllAFLData$Venue == "Albury" |
AllAFLData$Venue == "Moorabbin Oval" |
AllAFLData$Venue == "Coburg Oval" |
AllAFLData$Venue == "Waverley Park" |
AllAFLData$Venue == "Docklands" |
AllAFLData$Venue == "Eureka Stadium"] <-
"VIC"
AllAFLData$VenueState[
AllAFLData$Venue == "North Hobart" |
AllAFLData$Venue == "York Park" |
AllAFLData$Venue == "Bellerive Oval"] <-
"TAS"
AllAFLData$VenueState[
AllAFLData$Venue == "S.C.G." |
AllAFLData$Venue == "Bruce Stadium" |
AllAFLData$Venue == "Manuka Oval" |
AllAFLData$Venue == "Stadium Australia" |
AllAFLData$Venue == "Blacktown" |
AllAFLData$Venue == "Sydney Showground"] <-
"NSW"
AllAFLData$VenueState[
AllAFLData$Venue == "Brisbane Exhibition" |
AllAFLData$Venue == "Gabba" |
AllAFLData$Venue == "Carrara" |
AllAFLData$Venue == "Cazaly's Stadium"] <-
"QLD"
AllAFLData$VenueState[
AllAFLData$Venue == "Subiaco" |
AllAFLData$Venue == "W.A.C.A." |
AllAFLData$Venue == "Perth Stadium"] <-
"WA"
AllAFLData$VenueState[
AllAFLData$Venue == "Football Park" |
AllAFLData$Venue == "Adelaide Oval"] <-
"SA"
AllAFLData$VenueState[
AllAFLData$Venue == "Marrara Oval" |
AllAFLData$Venue == "Traeger Park"] <-
"NT"
AllAFLData$VenueState[
AllAFLData$Venue == "Wellington"] <-
"NZ"
AllAFLData$VenueState[
AllAFLData$Venue == "Jiangwan Stadium"] <-
"China"Aggregate teams who have merged
#change names and merge teams
AllAFLData[AllAFLData=="South Melbourne"] <- "South Melbourne/Sydney"
AllAFLData[AllAFLData=="Sydney"] <- "South Melbourne/Sydney"
AllAFLData[AllAFLData=="Fitzroy"] <- "Fitzroy/Brisbane"
AllAFLData[AllAFLData=="Brisbane Bears"] <- "Fitzroy/Brisbane"
AllAFLData[AllAFLData=="Brisbane Lions"] <- "Fitzroy/Brisbane"
AllAFLData[AllAFLData=="Kangaroos"] <- "North Melbourne"
AllAFLData[AllAFLData=="Footscray"] <- "Western Bulldogs"
AllAFLData[AllAFLData=="Western Bulldog"] <- "Western Bulldogs"
AllAFLData[AllAFLData=="GW Sydney"] <- "Greater Western Sydney"Create variable reflecting transition from VFL to AFL
#Competition Type
AllAFLData$Competiton[
AllAFLData$Year >= 1990] <-
"AFL"
AllAFLData$Competiton[
AllAFLData$Year < 1990] <-
"VFL"Calculate winning margin for all games
#calculate difference in scores for every game ever played
AllAFLData$ScoreMargin <- AllAFLData$HomeTotal - AllAFLData$AwayTotalCreate a column specifying which team won each game
#derive column that specifies whether the home or away team won
AllAFLData$Winner <- ifelse(AllAFLData$ScoreMargin > 0, "Home",
ifelse(AllAFLData$ScoreMargin == 0, "Draw",
ifelse(AllAFLData$ScoreMargin < 0, "Away", 0)))
#specify the team that won each game
AllAFLData$WinningTeam <- ifelse(AllAFLData$Winner == "Home", AllAFLData$HomeTeam,
ifelse(AllAFLData$Winner == "Away", AllAFLData$AwayTeam, "Draw"))Finally, create an ‘era’ variable which will be useful for later visualisations
AllAFLData$Era <- ifelse(AllAFLData$Year>=1897 & AllAFLData$Year<=1899,"1890s",
ifelse(AllAFLData$Year>=1900 & AllAFLData$Year<=1909,"1900s",
ifelse(AllAFLData$Year>=1910 & AllAFLData$Year<=1919,"1910s",
ifelse(AllAFLData$Year>=1920 & AllAFLData$Year<=1929,"1920s",
ifelse(AllAFLData$Year>=1930 & AllAFLData$Year<=1939,"1930s",
ifelse(AllAFLData$Year>=1940 & AllAFLData$Year<=1949,"1940s",
ifelse(AllAFLData$Year>=1950 & AllAFLData$Year<=1959,"1950s",
ifelse(AllAFLData$Year>=1960 & AllAFLData$Year<=1969,"1960s",
ifelse(AllAFLData$Year>=1970 & AllAFLData$Year<=1979,"1970s",
ifelse(AllAFLData$Year>=1980 & AllAFLData$Year<=1989,"1980s",
ifelse(AllAFLData$Year>=1990 & AllAFLData$Year<=1999,"1990s",
ifelse(AllAFLData$Year>=2000 & AllAFLData$Year<=2009,"2000s",
ifelse(AllAFLData$Year>=2010 & AllAFLData$Year<=2019,"2010s", 0)))))))))))))The 1897 and 1924 AFL seasons did not have a Grand Final. In 1924 the AFL grand final was decided by a round robin series, for which Essendon prevailed. The below code corrects the data to reflect the season champions in these years
Let’s begin building a table summarising descriptive statistics, by team
Determine total number of games played by each team, and how many wins
#identify total number of home games, away games, and wins
Games <- AllAFLData %>%
gather(key, Team, HomeTeam, AwayTeam, WinningTeam) %>%
group_by(Team) %>%
summarise(GamesHome = sum(key == "HomeTeam"),
GamesAway = sum(key == "AwayTeam"),
Wins = sum(key == "WinningTeam"))
#Identify total number of games
Games$TotalGames <- (Games$GamesHome +
Games$GamesAway)
#identify percentage of games won
Games$WinPercent <- (Games$Wins / Games$TotalGames * 100)
#round to 2 decimal places
Games[,'WinPercent']=round(Games[,'WinPercent'], 2) Determine the year each team joined the competition
#pull out first (i.e. earliest) instance of each team playing
TeamFirstYear <- AllAFLData[match(unique(AllAFLData$HomeTeam),
AllAFLData$HomeTeam),]
#filter out to just home team and year
TeamFirstYear <- select(TeamFirstYear, HomeTeam, Year)
#join to winners
Games <- left_join(Games, TeamFirstYear,
by = c("Team" = "HomeTeam"))
#remove this table
rm(TeamFirstYear)Determine how many finals each team has contested, and how many wins
#filter data to only finals games
AllAFLData_Finals <- AllAFLData[ which( AllAFLData$Round == "EF" |
AllAFLData$Round == "GF" |
AllAFLData$Round == "PF" |
AllAFLData$Round == "QF") , ]
#Aggregate finals
Finals <- AllAFLData_Finals %>%
gather(key, Team, HomeTeam, AwayTeam, WinningTeam) %>%
group_by(Team) %>%
summarise(FinalsHome = sum(key == "HomeTeam"),
FinalsAway = sum(key == "AwayTeam"),
FinalsWins = sum(key == "WinningTeam"))
#join to 'games dataframe
Games <- left_join(Games, Finals,
by = c("Team" = "Team"))
#Calculate total number of finals played
Games$TotalFinals <- (Games$FinalsHome +
Games$FinalsAway)
#Calculate teams winning percentage in finals
Games$FinalsWinPercent <- (Games$FinalsWins / Games$TotalFinals * 100)
#Round to 2 decimal places
Games[,'FinalsWinPercent']=round(Games[,'FinalsWinPercent'], 2)
#remove remaining tables
rm(Finals, AllAFLData_Finals)Determine how many grand finals each team has contested, and how many wins
#filter data to only finals games
AllAFLData_GF <- AllAFLData[ which( AllAFLData$Round == "GF" ) , ]
#Aggregate finals
GF <- AllAFLData_GF %>%
gather(key, Team, HomeTeam, AwayTeam, WinningTeam) %>%
group_by(Team) %>%
summarise(GFsHome = sum(key == "HomeTeam"),
GFsAway = sum(key == "AwayTeam"),
GFsWins = sum(key == "WinningTeam"))
#join to 'games dataframe
Games <- left_join(Games, GF,
by = c("Team" = "Team"))
#Calculate total number of finals played
Games$TotalGFs <- (Games$GFsHome +
Games$GFsAway)
#Calculate teams winning percentage in finals
Games$GFsWinPercent <- (Games$GFsWins / Games$TotalGFs * 100)
#round to two decimal places
Games[,'GFsWinPercent']=round(Games[,'GFsWinPercent'],2)
#convert missing team NA values to 0
Games[is.na(Games)] <- 0
#we will keep GF tables for later usage Determine all teams average winning margin
#absolute values
AllAFLData$ScoreMargin <- abs(AllAFLData$ScoreMargin)
#get sum of column by category of another column with TOP N, using DPLYR
AverageMargin <- AllAFLData %>%
group_by(WinningTeam) %>%
summarise(AverageWinningMargin = mean(ScoreMargin))
Games <- left_join(Games, AverageMargin,
by = c("Team" = "WinningTeam"))
#Round to 2 decimal places
Games[,'AverageWinningMargin']=round(Games[,'AverageWinningMargin'],2)
#remove remaining tables
rm(AverageMargin)Remove Draw from our ‘Games’ dataframe, and re-order variables
#Remove Draw
Games <- Games[ !(Games$Team %in% c("Draw")), ]
#re-order variables
Games <- Games[,c(1,7,2,3,5,4,6,8,9,11,10,12,13, 14, 16, 15, 17, 18)]Behold, our new table, summarising every teams historic winning record
| Team | Year | GamesHome | GamesAway | TotalGames | Wins | WinPercent | FinalsHome | FinalsAway | TotalFinals | FinalsWins | FinalsWinPercent | GFsHome | GFsAway | TotalGFs | GFsWins | GFsWinPercent | AverageWinningMargin |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Adelaide | 1991 | 324 | 325 | 649 | 350 | 53.93 | 13 | 13 | 26 | 13 | 50.00 | 2 | 1 | 3 | 2 | 66.67 | 40.49 |
| Carlton | 1897 | 1240 | 1251 | 2491 | 1423 | 57.13 | 37 | 38 | 75 | 36 | 48.00 | 16 | 13 | 29 | 16 | 55.17 | 32.32 |
| Collingwood | 1897 | 1258 | 1275 | 2533 | 1530 | 60.40 | 46 | 60 | 106 | 46 | 43.40 | 15 | 29 | 44 | 15 | 34.09 | 33.28 |
| Essendon | 1897 | 1234 | 1222 | 2456 | 1374 | 55.94 | 41 | 37 | 78 | 41 | 52.56 | 16 | 15 | 31 | 16 | 51.61 | 33.07 |
| Fitzroy/Brisbane | 1897 | 1336 | 1319 | 2655 | 1167 | 43.95 | 30 | 21 | 51 | 30 | 58.82 | 11 | 6 | 17 | 11 | 64.71 | 30.99 |
| Fremantle | 1995 | 270 | 273 | 543 | 240 | 44.20 | 5 | 6 | 11 | 5 | 45.45 | 0 | 1 | 1 | 0 | 0.00 | 32.05 |
| Geelong | 1897 | 1210 | 1219 | 2429 | 1318 | 54.26 | 33 | 38 | 71 | 33 | 46.48 | 9 | 8 | 17 | 9 | 52.94 | 34.82 |
| Gold Coast | 2011 | 88 | 88 | 176 | 44 | 25.00 | 0 | 0 | 0 | 0 | 0.00 | 0 | 0 | 0 | 0 | 0.00 | 29.55 |
| Greater Western Sydney | 2012 | 80 | 81 | 161 | 66 | 40.99 | 2 | 3 | 5 | 2 | 40.00 | 0 | 0 | 0 | 0 | 0.00 | 39.23 |
| Hawthorn | 1925 | 996 | 980 | 1976 | 972 | 49.19 | 37 | 24 | 61 | 37 | 60.66 | 13 | 6 | 19 | 13 | 68.42 | 36.42 |
| Melbourne | 1897 | 1208 | 1191 | 2399 | 1085 | 45.23 | 32 | 17 | 49 | 31 | 63.27 | 13 | 6 | 19 | 12 | 63.16 | 30.81 |
| North Melbourne | 1925 | 978 | 986 | 1964 | 874 | 44.50 | 27 | 28 | 55 | 26 | 47.27 | 5 | 5 | 10 | 4 | 40.00 | 30.53 |
| Port Adelaide | 1997 | 253 | 254 | 507 | 260 | 51.28 | 8 | 9 | 17 | 8 | 47.06 | 1 | 1 | 2 | 1 | 50.00 | 35.93 |
| Richmond | 1908 | 1136 | 1119 | 2255 | 1145 | 50.78 | 26 | 23 | 49 | 26 | 53.06 | 11 | 11 | 22 | 11 | 50.00 | 31.54 |
| South Melbourne/Sydney | 1897 | 1210 | 1220 | 2430 | 1160 | 47.74 | 25 | 31 | 56 | 25 | 44.64 | 5 | 13 | 18 | 5 | 27.78 | 30.58 |
| St Kilda | 1897 | 1182 | 1190 | 2372 | 919 | 38.74 | 15 | 18 | 33 | 14 | 42.42 | 2 | 6 | 8 | 1 | 12.50 | 31.01 |
| University | 1908 | 63 | 63 | 126 | 27 | 21.43 | 0 | 0 | 0 | 0 | 0.00 | 0 | 0 | 0 | 0 | 0.00 | 23.67 |
| West Coast | 1987 | 378 | 376 | 754 | 425 | 56.37 | 21 | 18 | 39 | 20 | 51.28 | 4 | 3 | 7 | 4 | 57.14 | 38.52 |
| Western Bulldogs | 1925 | 963 | 975 | 1938 | 870 | 44.89 | 8 | 22 | 30 | 8 | 26.67 | 2 | 1 | 3 | 2 | 66.67 | 29.05 |
Create a minimalistic theme for our bar plots
BarPlotTheme <-
theme(
panel.background = element_blank(), #blank, white background
plot.title = element_text(size=10, hjust=0.5),
axis.text = element_text(size=8),
axis.text.x = element_text(angle = 45, hjust = 1, colour="#606060"),
axis.text.y = element_text(hjust = 1, colour="#606060"),
axis.title = element_text(size=10),
axis.line.x = element_line(color="black", size = 0.5),
axis.line.y = element_line(color="black", size = 0.5),
legend.key = element_rect(fill = "white")
)Visualise percentage of games won historically by all teams
#remove decimal places for tidier visualisation
Games[,'WinPercent']=round(Games[,'WinPercent'],0)
#plot
WinPercent <-
ggplot(data=Games, aes(x=reorder(Team, -WinPercent), y= WinPercent)) +
# -win percent denotes put in descending order
geom_bar(stat ="identity", fill="#31713A")+ #stat type and colour of bars
geom_text(data=Games, aes(x = Team, y = WinPercent, label = paste0(WinPercent,"%")), #data to be visualised
colour="white", size=2.5, position = position_stack(vjust = .5)) +
labs(title = " Winning percentage of all AFL/VFL teams", x = "Teams", y = "Percent") + # title, and x & y labels
scale_y_continuous(label = comma) +
BarPlotTheme # add themePerhaps unfortunately, Collingwood have won a higher proportion of games than any other team in over 120 years of competition
Visualise percentage of finals games won historically by all teams
#remove decimal places for tidier visualisation
Games[,'FinalsWinPercent']=round(Games[,'FinalsWinPercent'],0)
#plot
FinalsWinPercent <-
ggplot(data=Games, aes(x=reorder(Team, -FinalsWinPercent), y = FinalsWinPercent)) +
geom_bar(stat ="identity", fill="#D1A93D")+
geom_text(data=Games, aes(x = Team, y = FinalsWinPercent, label = paste0(FinalsWinPercent,"%")),
colour="white", size=2.5, position = position_stack(vjust = .5)) +
labs(title = "Finals Winning percentage of all AFL/VFL teams", x = "Teams", y = "Percent") +
scale_y_continuous(label = comma) +
BarPlotThemeWhen it comes to playing finals, Collingwoods record isn’t so impressive sitting 14th overall. Melbourne, a team who have experienced little success in recent years are statistically the best performing team in finals games
Visualise percentage of Grand Finals won historically by all teams
#remove decimal places for tidier visualisation
Games[,'GFsWinPercent']=round(Games[,'GFsWinPercent'],0)
#plot
GFWinPercent <-
ggplot(data=Games, aes(x=reorder(Team, -GFsWinPercent), y= GFsWinPercent)) +
geom_bar(stat ="identity", fill="#7bc1d6") +
geom_text(data=Games, aes(x = Team, y = GFsWinPercent, label = paste0(GFsWinPercent,"%")),
colour="white", size=2.5, position = position_stack(vjust = .5)) +
labs(title = "Grand Final Winning percentage of all AFL/VFL teams", x = "Teams", y = "Percent") +
scale_y_continuous(label = comma) +
BarPlotThemeLooking just at Grand Final performances, five teams sit clear as the best performing teams; Hawthorn, Adelaide, Melbourne, Western Bulldogs and Fitzroy/Brisbane. Of these teams, Hawthorn and Melbourne have contested in significantly more Grand Finals with 19 and 18 appearances respectively.
We can also visualise Grand Final performance in a slightly different way - examining total games, wins and losses to get a better depiction of just how frequently each team has contested a grand final
#grand final wins and losses
GFWL <- subset(Games, select = c(1,15,16))
GFWL <- as.data.frame(GFWL)
GFWL$GFLosses <- (GFWL$TotalGFs - GFWL$GFsWins)
GFWL <- subset(GFWL, select = -c(2))
GFWL <- rename(GFWL, c("GFsWins" = "Wins",
"GFLosses"="Losses"))
#melt 'wins' and 'losses' into a single column for visualisation
GFWL <- melt(GFWL, id.vars = 'Team')
#rename variables
GFWL <- rename(GFWL, c("variable" = "GrandFinals",
"value"="Games"))A table summarising grand final wins and losses
| Team | GrandFinals | Games |
|---|---|---|
| Adelaide | Wins | 2 |
| Carlton | Wins | 16 |
| Collingwood | Wins | 15 |
| Essendon | Wins | 16 |
| Fitzroy/Brisbane | Wins | 11 |
| Fremantle | Wins | 0 |
Visualise this data with a stacked bar chart
#Create stacked bar-chart
GFWL_plot <-
ggplot(GFWL, aes(fill=GrandFinals, y=Games, x=reorder(Team, -Games))) +
geom_bar( stat="identity", position = position_stack(reverse = T)) +
scale_fill_manual("legend", values = c("Wins" = "#31713A", "Losses" = "#C0433A"))+
labs(title = "Grand Final wins (green) and losses (red)", x = "Teams", y = "Number of Grand Finals")+
geom_text(data=GFWL, aes(x = Team, y = Games, label = Games),
colour="white", size= 2, position = position_stack(vjust = .5, reverse = T)) +
theme(legend.position = "none") +
scale_y_continuous(label = comma) +
BarPlotThemeThis is interesting. It shows Collingwood have contested far more Grand Finals than any other team, yet sit third on the all time number of Grand Final victories.
Similarly, South Melbourne/Sydney and St Kilda stand-out as teams to have caused much heart-break to their fans. In fact, South Melbourne/Sydney have contested one less Grand Final than Melbourne, yet won 7 less in total.
Of course, with 120 + years of history, none of this matters much to your average supporter if a team has not had recent success
Below we will create one last data table, that breaks down teams success by decade, and visualise this data for some of the aforementioned teams
#Aggregate each teams winning record, by era from 1897 to the present
Era <- AllAFLData %>%
gather(key, Team, HomeTeam, AwayTeam, WinningTeam) %>%
group_by(Era, Team) %>%
summarise(GamesHome = sum(key == "HomeTeam"),
GamesAway = sum(key == "AwayTeam"),
Wins = sum(key == "WinningTeam"))
#Calculate total number of games played in each era
Era$TotalGames <- (Era$GamesHome +
Era$GamesAway)
#Calculate teams winning percentage in each era
Era$WinPercent <- (Era$Wins / Era$TotalGames * 100)
#round to two decimal places
Era[,'WinPercent']=round(Era[,'WinPercent'],2)
#Let's break down grand final wins by era
EraGF <- AllAFLData_GF %>%
gather(key, Team, WinningTeam) %>%
group_by(Era, Team) %>%
summarise(GFWins = sum(key == "WinningTeam"))
#left join to 'era' table
Era <- left_join(Era, EraGF,
by = c("Era" = "Era",
"Team" = "Team"))
#delete 'draws'
Era <- subset(Era, Team != "Draw")
#turn all NAs to O
Era[is.na(Era)] <- 0
#change newly created integer columns to numeric
cols.num.era <- c("GamesHome","GamesAway", "Wins",
"TotalGames")
Era[cols.num.era] <- sapply(Era[cols.num.era],as.numeric)A table summarising results by team, by decade
| Era | Team | GamesHome | GamesAway | Wins | TotalGames | WinPercent | GFWins |
|---|---|---|---|---|---|---|---|
| 1890s | Carlton | 22 | 26 | 9 | 48 | 18.75 | 0 |
| 1890s | Collingwood | 26 | 26 | 35 | 52 | 67.31 | 0 |
| 1890s | Essendon | 27 | 25 | 37 | 52 | 71.15 | 1 |
| 1890s | Fitzroy/Brisbane | 28 | 23 | 34 | 51 | 66.67 | 2 |
| 1890s | Geelong | 27 | 24 | 36 | 51 | 70.59 | 0 |
| 1890s | Melbourne | 22 | 29 | 24 | 51 | 47.06 | 0 |
Plot win percentage per decade for Collingwood, Hawthorn and Melbourne, with number of grand final victories per decade annotated above each bar
#re-code the same plot
Era_Col_Plot <- Era %>% #specify data
subset(Team == "Collingwood") %>% #subset to desired team
ggplot(aes(x=Era, y= WinPercent)) + # -win percent denotes put in descending order
geom_bar(stat ="identity", fill="black")+
geom_text(aes(x = Era, y = WinPercent, label=WinPercent),
colour="white", size=3, position = position_stack(vjust = .5)) +
geom_text(aes(label=GFWins), colour="black", size=3,
position=position_dodge(width=0.9), vjust=-0.25) +
labs(title = "Percentage of wins per decade for Collingwood FC", x = "Era", y = "Win Percent")+
BarPlotTheme
Era_Haw_Plot <- Era %>%
subset(Team == "Hawthorn") %>%
ggplot(aes(x=Era, y= WinPercent)) +
geom_bar(stat ="identity", fill="#56251b")+
geom_text(aes(x = Era, y = WinPercent, label=WinPercent),
colour="gold", size=3, position = position_stack(vjust = .5)) +
geom_text(aes(label=GFWins), colour="#56251b", size=3,
position=position_dodge(width=0.9), vjust=-0.25) +
labs(title = "Percentage of wins per decade for Hawthorn FC", x = "Era", y = "Win Percent") +
scale_y_continuous(label = comma) +
BarPlotTheme
Era_Melb_Plot <- Era %>%
subset(Team == "Melbourne") %>%
ggplot(aes(x=Era, y= WinPercent)) +
geom_bar(stat ="identity", fill="#221b38")+
geom_text(aes(x = Era, y = WinPercent, label=WinPercent),
colour="#ea1b29", size=3, position = position_stack(vjust = .5)) +
geom_text(aes(label=GFWins), colour="#221b38", size=3,
position=position_dodge(width=0.9), vjust=-0.25) +
labs(title = "Percentage of wins per decade for Melbourne FC", x = "Era", y = "Win Percent") +
scale_y_continuous(label = comma) +
BarPlotThemeCollingwood won 13/15 of their Grand Finals before 1959. This is despite relatively successful period in the 1970s which did not produce a Grand Final win (despite three appearances)
Hawthorn, to the contrary were one of the least successful teams up until the 1970s, with only a single premiership to their name. Since that time, they have clearly been the most successful team, adding 12 premierships to their trophy cabinet, including three in the present decade
In stark contrast to Hawthorn, the Melbourne Football club have been the worst performed team since the 1970s. Melbourne’s last premiership was in 1964, and since that time, their win-rate has been low, ranging from 33% to 45% of games won per decade. Melbourne haven’t had a 10 year span of winning more games than they have lost since the 1960s