The Data

AFLTables.com provides historic data on every VFL/AFL game ever played, all the way back to the leagues inception in 1897.

The web-link can be found here: https://afltables.com/afl/stats/biglists/bg3.txt.

In this rpub, I will clean, extract features, aggregate, and visualise some of this data. The approach herein will hopefuly encourage further analyses

First, lets load-in our dependencies :

Read URL to R, specifying columns and delimiters

We should now have a dataframe with 8 variables, and 15,407 rows

ID Date Round HomeTeam HomeScore AwayTeam AwayScore Venue
1 8-May-1897 R1 Fitzroy 6.13.49 Carlton 2.4.16 Brunswick St
2 8-May-1897 R1 Collingwood 5.11.41 St Kilda 2.4.16 Victoria Park
3 8-May-1897 R1 Geelong 3.6.24 Essendon 7.5.47 Corio Oval
4 8-May-1897 R1 South Melbourne 3.9.27 Melbourne 6.8.44 Lake Oval
5 15-May-1897 R2 South Melbourne 6.4.40 Carlton 5.6.36 Lake Oval
6 15-May-1897 R2 Essendon 4.6.30 Collingwood 8.2.50 East Melbourne

Cleaning and Feature Extraction

Change variable types

Extract out ‘Year’ from our date column

Separate ‘score’ column(s) into goals, behinds and total

Change score variables from character to numeric

##          ID        Date       Round    HomeTeam   HomeGoals HomeBehinds 
##   "numeric" "character"    "factor" "character" "character" "character" 
##   HomeTotal    AwayTeam   AwayGoals AwayBehinds   AwayTotal       Venue 
## "character" "character" "character" "character" "character"    "factor" 
##         Day       Month        Year 
##   "numeric" "character"   "numeric"
##          ID        Date       Round    HomeTeam   HomeGoals HomeBehinds 
##   "numeric" "character"    "factor" "character"   "numeric"   "numeric" 
##   HomeTotal    AwayTeam   AwayGoals AwayBehinds   AwayTotal       Venue 
##   "numeric" "character"   "numeric"   "numeric"   "numeric"    "factor" 
##         Day       Month        Year 
##   "numeric" "character"   "numeric"

Check the number of unique elements for factors we may wish to concatenate

##  [1] Brunswick St        Victoria Park       Corio Oval         
##  [4] Lake Oval           East Melbourne      Junction Oval      
##  [7] M.C.G.              Princes Park        S.C.G.             
## [10] Punt Rd             Windy Hill          Glenferrie Oval    
## [13] Arden St            Western Oval        Olympic Park       
## [16] Kardinia Park       Yarraville Oval     Toorak Park        
## [19] Euroa               North Hobart        Yallourn           
## [22] Albury              Brisbane Exhibition Moorabbin Oval     
## [25] Coburg Oval         Waverley Park       Gabba              
## [28] Subiaco             Carrara             W.A.C.A.           
## [31] Football Park       Bruce Stadium       Manuka Oval        
## [34] Docklands           York Park           Stadium Australia  
## [37] Marrara Oval        Cazaly's Stadium    Adelaide Oval      
## [40] Bellerive Oval      Blacktown           Sydney Showground  
## [43] Wellington          Traeger Park        Jiangwan Stadium   
## [46] Eureka Stadium      Perth Stadium      
## 47 Levels: Adelaide Oval Albury Arden St Bellerive Oval ... York Park
##  [1] "Fitzroy"         "Collingwood"     "Geelong"        
##  [4] "South Melbourne" "Essendon"        "St Kilda"       
##  [7] "Melbourne"       "Carlton"         "Richmond"       
## [10] "University"      "Hawthorn"        "North Melbourne"
## [13] "Footscray"       "Sydney"          "West Coast"     
## [16] "Brisbane Bears"  "Adelaide"        "Fremantle"      
## [19] "Western Bulldog" "Brisbane Lions"  "Port Adelaide"  
## [22] "Kangaroos"       "Gold Coast"      "GW Sydney"
##  [1] R1  R2  R3  R4  R5  R6  R7  R8  R9  R10 R11 R12 R13 R14 SF  R15 R16
## [18] R17 GF  PF  R18 R19 R20 R21 R22 QF  EF  R23 R24
## 29 Levels: EF GF PF QF R1 R10 R11 R12 R13 R14 R15 R16 R17 R18 R19 ... SF

Determine which state in Australia games have been played, based upon venue

#aggregate stadiums into 'state' variable
AllAFLData$VenueState[
  AllAFLData$Venue ==  "Brunswick St" | 
    AllAFLData$Venue == "Victoria Park" | 
    AllAFLData$Venue ==  "Corio Oval" | 
    AllAFLData$Venue ==  "Lake Oval" | 
    AllAFLData$Venue ==  "East Melbourne" | 
    AllAFLData$Venue ==  "Junction Oval" | 
    AllAFLData$Venue ==  "M.C.G." | 
    AllAFLData$Venue ==  "Princes Park" | 
    AllAFLData$Venue ==  "Punt Rd" | 
    AllAFLData$Venue ==  "Windy Hill" | 
    AllAFLData$Venue ==  "Glenferrie Oval" | 
    AllAFLData$Venue ==  "Arden St" |   
    AllAFLData$Venue ==  "Western Oval" | 
    AllAFLData$Venue ==  "Olympic Park" | 
    AllAFLData$Venue ==  "Kardinia Park" | 
    AllAFLData$Venue ==  "Yarraville Oval" | 
    AllAFLData$Venue ==  "Toorak Park" | 
    AllAFLData$Venue ==  "Euroa" | 
    AllAFLData$Venue ==  "Yallourn" | 
    AllAFLData$Venue ==  "Albury" | 
    AllAFLData$Venue ==  "Moorabbin Oval" | 
    AllAFLData$Venue ==  "Coburg Oval" | 
    AllAFLData$Venue ==  "Waverley Park" | 
    AllAFLData$Venue ==  "Docklands" | 
    AllAFLData$Venue ==  "Eureka Stadium"] <- 
  "VIC"

AllAFLData$VenueState[
  AllAFLData$Venue ==  "North Hobart" | 
    AllAFLData$Venue == "York Park" | 
    AllAFLData$Venue ==  "Bellerive Oval"] <- 
  "TAS"

AllAFLData$VenueState[
  AllAFLData$Venue ==  "S.C.G." | 
    AllAFLData$Venue == "Bruce Stadium" | 
    AllAFLData$Venue ==  "Manuka Oval" | 
    AllAFLData$Venue ==  "Stadium Australia" |
    AllAFLData$Venue ==  "Blacktown" |
    AllAFLData$Venue ==  "Sydney Showground"] <- 
  "NSW"

AllAFLData$VenueState[
  AllAFLData$Venue ==  "Brisbane Exhibition" | 
    AllAFLData$Venue == "Gabba" | 
    AllAFLData$Venue ==  "Carrara" | 
    AllAFLData$Venue ==  "Cazaly's Stadium"] <- 
  "QLD"

AllAFLData$VenueState[
  AllAFLData$Venue ==  "Subiaco" | 
    AllAFLData$Venue == "W.A.C.A." |
  AllAFLData$Venue == "Perth Stadium"] <- 
  "WA"

AllAFLData$VenueState[
  AllAFLData$Venue ==  "Football Park" | 
    AllAFLData$Venue == "Adelaide Oval"] <- 
  "SA"

AllAFLData$VenueState[
  AllAFLData$Venue ==  "Marrara Oval" | 
    AllAFLData$Venue == "Traeger Park"] <- 
  "NT"

AllAFLData$VenueState[
  AllAFLData$Venue ==  "Wellington"] <- 
  "NZ"

AllAFLData$VenueState[
  AllAFLData$Venue ==  "Jiangwan Stadium"] <- 
  "China"

Aggregate teams who have merged

Create variable reflecting transition from VFL to AFL

Calculate winning margin for all games

Create a column specifying which team won each game

Finally, create an ‘era’ variable which will be useful for later visualisations

The 1897 and 1924 AFL seasons did not have a Grand Final. In 1924 the AFL grand final was decided by a round robin series, for which Essendon prevailed. The below code corrects the data to reflect the season champions in these years

Aggregation

Let’s begin building a table summarising descriptive statistics, by team

Determine total number of games played by each team, and how many wins

Determine the year each team joined the competition

Determine how many finals each team has contested, and how many wins

Determine how many grand finals each team has contested, and how many wins

Determine all teams average winning margin

Remove Draw from our ‘Games’ dataframe, and re-order variables

Behold, our new table, summarising every teams historic winning record

Team Year GamesHome GamesAway TotalGames Wins WinPercent FinalsHome FinalsAway TotalFinals FinalsWins FinalsWinPercent GFsHome GFsAway TotalGFs GFsWins GFsWinPercent AverageWinningMargin
Adelaide 1991 324 325 649 350 53.93 13 13 26 13 50.00 2 1 3 2 66.67 40.49
Carlton 1897 1240 1251 2491 1423 57.13 37 38 75 36 48.00 16 13 29 16 55.17 32.32
Collingwood 1897 1258 1275 2533 1530 60.40 46 60 106 46 43.40 15 29 44 15 34.09 33.28
Essendon 1897 1234 1222 2456 1374 55.94 41 37 78 41 52.56 16 15 31 16 51.61 33.07
Fitzroy/Brisbane 1897 1336 1319 2655 1167 43.95 30 21 51 30 58.82 11 6 17 11 64.71 30.99
Fremantle 1995 270 273 543 240 44.20 5 6 11 5 45.45 0 1 1 0 0.00 32.05
Geelong 1897 1210 1219 2429 1318 54.26 33 38 71 33 46.48 9 8 17 9 52.94 34.82
Gold Coast 2011 88 88 176 44 25.00 0 0 0 0 0.00 0 0 0 0 0.00 29.55
Greater Western Sydney 2012 80 81 161 66 40.99 2 3 5 2 40.00 0 0 0 0 0.00 39.23
Hawthorn 1925 996 980 1976 972 49.19 37 24 61 37 60.66 13 6 19 13 68.42 36.42
Melbourne 1897 1208 1191 2399 1085 45.23 32 17 49 31 63.27 13 6 19 12 63.16 30.81
North Melbourne 1925 978 986 1964 874 44.50 27 28 55 26 47.27 5 5 10 4 40.00 30.53
Port Adelaide 1997 253 254 507 260 51.28 8 9 17 8 47.06 1 1 2 1 50.00 35.93
Richmond 1908 1136 1119 2255 1145 50.78 26 23 49 26 53.06 11 11 22 11 50.00 31.54
South Melbourne/Sydney 1897 1210 1220 2430 1160 47.74 25 31 56 25 44.64 5 13 18 5 27.78 30.58
St Kilda 1897 1182 1190 2372 919 38.74 15 18 33 14 42.42 2 6 8 1 12.50 31.01
University 1908 63 63 126 27 21.43 0 0 0 0 0.00 0 0 0 0 0.00 23.67
West Coast 1987 378 376 754 425 56.37 21 18 39 20 51.28 4 3 7 4 57.14 38.52
Western Bulldogs 1925 963 975 1938 870 44.89 8 22 30 8 26.67 2 1 3 2 66.67 29.05

Visualising the data

Create a minimalistic theme for our bar plots

Visualise percentage of games won historically by all teams

Perhaps unfortunately, Collingwood have won a higher proportion of games than any other team in over 120 years of competition

Visualise percentage of finals games won historically by all teams

When it comes to playing finals, Collingwoods record isn’t so impressive sitting 14th overall. Melbourne, a team who have experienced little success in recent years are statistically the best performing team in finals games

Visualise percentage of Grand Finals won historically by all teams

Looking just at Grand Final performances, five teams sit clear as the best performing teams; Hawthorn, Adelaide, Melbourne, Western Bulldogs and Fitzroy/Brisbane. Of these teams, Hawthorn and Melbourne have contested in significantly more Grand Finals with 19 and 18 appearances respectively.

Additional visualisations and aggregations

We can also visualise Grand Final performance in a slightly different way - examining total games, wins and losses to get a better depiction of just how frequently each team has contested a grand final

A table summarising grand final wins and losses

Team GrandFinals Games
Adelaide Wins 2
Carlton Wins 16
Collingwood Wins 15
Essendon Wins 16
Fitzroy/Brisbane Wins 11
Fremantle Wins 0

Visualise this data with a stacked bar chart

This is interesting. It shows Collingwood have contested far more Grand Finals than any other team, yet sit third on the all time number of Grand Final victories.

Similarly, South Melbourne/Sydney and St Kilda stand-out as teams to have caused much heart-break to their fans. In fact, South Melbourne/Sydney have contested one less Grand Final than Melbourne, yet won 7 less in total.

Of course, with 120 + years of history, none of this matters much to your average supporter if a team has not had recent success

Below we will create one last data table, that breaks down teams success by decade, and visualise this data for some of the aforementioned teams

A table summarising results by team, by decade

Era Team GamesHome GamesAway Wins TotalGames WinPercent GFWins
1890s Carlton 22 26 9 48 18.75 0
1890s Collingwood 26 26 35 52 67.31 0
1890s Essendon 27 25 37 52 71.15 1
1890s Fitzroy/Brisbane 28 23 34 51 66.67 2
1890s Geelong 27 24 36 51 70.59 0
1890s Melbourne 22 29 24 51 47.06 0

Plot win percentage per decade for Collingwood, Hawthorn and Melbourne, with number of grand final victories per decade annotated above each bar

#re-code the same plot 
Era_Col_Plot <- Era %>% #specify data
  subset(Team == "Collingwood") %>% #subset to desired team 
  ggplot(aes(x=Era, y= WinPercent)) + # -win percent denotes put in descending order
  geom_bar(stat ="identity", fill="black")+
  geom_text(aes(x = Era, y = WinPercent, label=WinPercent),
            colour="white", size=3, position = position_stack(vjust = .5)) +
  geom_text(aes(label=GFWins), colour="black", size=3, 
            position=position_dodge(width=0.9), vjust=-0.25) +
  labs(title = "Percentage of wins per decade for Collingwood FC", x = "Era", y = "Win Percent")+
  BarPlotTheme

Era_Haw_Plot <- Era %>% 
  subset(Team == "Hawthorn") %>% 
  ggplot(aes(x=Era, y= WinPercent)) + 
  geom_bar(stat ="identity", fill="#56251b")+
  geom_text(aes(x = Era, y = WinPercent, label=WinPercent),
            colour="gold", size=3, position = position_stack(vjust = .5)) +
  geom_text(aes(label=GFWins), colour="#56251b", size=3, 
            position=position_dodge(width=0.9), vjust=-0.25) +
  labs(title = "Percentage of wins per decade for Hawthorn FC", x = "Era", y = "Win Percent") +
  scale_y_continuous(label = comma) +
  BarPlotTheme

Era_Melb_Plot <- Era %>% 
  subset(Team == "Melbourne") %>% 
  ggplot(aes(x=Era, y= WinPercent)) + 
  geom_bar(stat ="identity", fill="#221b38")+
  geom_text(aes(x = Era, y = WinPercent, label=WinPercent),
            colour="#ea1b29", size=3, position = position_stack(vjust = .5)) +
  geom_text(aes(label=GFWins), colour="#221b38", size=3, 
            position=position_dodge(width=0.9), vjust=-0.25) +
  labs(title = "Percentage of wins per decade for Melbourne FC", x = "Era", y = "Win Percent") +
  scale_y_continuous(label = comma) +
  BarPlotTheme

Collingwood won 13/15 of their Grand Finals before 1959. This is despite relatively successful period in the 1970s which did not produce a Grand Final win (despite three appearances)

Hawthorn, to the contrary were one of the least successful teams up until the 1970s, with only a single premiership to their name. Since that time, they have clearly been the most successful team, adding 12 premierships to their trophy cabinet, including three in the present decade

In stark contrast to Hawthorn, the Melbourne Football club have been the worst performed team since the 1970s. Melbourne’s last premiership was in 1964, and since that time, their win-rate has been low, ranging from 33% to 45% of games won per decade. Melbourne haven’t had a 10 year span of winning more games than they have lost since the 1960s