The main factors leading to Brownlow Medal votes

There is a perception within the AFL community that the highest individual honour - The Brownlow Medal - is a “Midfielders Award”. True, most winners in recent memory have been midfielders, but is it a myth that there is a bias to ballwinners over, say, goal-scorers when it comes to the AFL’s marquee prize?

This tutorial will deep-dive into 2022 individual player data and look at correlations along with building a number of linear regression models to assess what statistics really contribute to the chase for Charlie.

Getting the data

Within R, we will be utilising the fitzRoy package to access all the data that we need. The data is scraped from the AFL Website. To download and install the data set, you will need to install the package install.packages("fitzRoy") and then utilise the following code:

library(fitzRoy)

We want to get a dataset data that extracts all player statistic data across each game of the 2022 AFL season. Furthermore, we also want to filter for only the regular season (as the Brownlow is judged purely on regular season performance):

# Scraping data from fitzRoy package
data <- fitzRoy::fetch_player_stats_afltables(season = 2022)

# Convert some variables (as all are character variables)
as.numeric(data$Round)
as.Date(data$Date)

# Filter for only regular season games
data <- data %>%
  filter(Date < '2022-08-30')

# Creating unique player_id by name to easily identify players
data3 <- data %>%
  mutate(Player = 
           paste(First.name, Surname, sep = "-"))

# Obtaining selected season stats by player
player_season_data <- data3 %>%
  group_by(Player, ID) %>% 
  summarise(total_clear=sum(Clearances),
            total_kicks=sum(Kicks),
            total_handballs=sum(Handballs),
            total_disposals=total_kicks+total_handballs,
            total_contposs=sum(Contested.Possessions),
            total_marks=sum(Marks), 
            total_1p=sum(One.Percenters), 
            total_tackles=sum(Tackles),
            total_uncontposs=sum(Uncontested.Possessions), 
            total_I50=sum(Inside.50s), 
            total_goals=sum(Goals),
            brownlow=sum(Brownlow.Votes))

We now have a great summary in the form of player_season_data in order to see an aggregate of every players’ key statistics of the 2022 regular season

Correlation between statistics and Brownlow Votes

Before we look at modelling, it is interesting to look at the correlation between certain statistics and Browlow votes, we can do this by following the below steps:

# New data frame with just relevant columns (removing player id, kicks and handballs but keeping total disposals)
player_season_data_corr <- player_season_data[,-c(1:2,4:5)]

# Installing psuch package and creating correlation matrix using pairs.panels
library(psych)
pairs.panels(player_season_data_corr,
             scale = TRUE,
             smoother = TRUE)

Correlation Matrix comparing key stat categories

Here, we can see some key correlations, specifically between Brownlow votes (final column) and key statistics. For example, there is a relatively strong positive correlation between Brownlow votes and disposals (0.64), clearances (0.68), contested possessions (0.71) and inside 50s (0.69). We also see a positive correlation between Brownlow votes and goals (0.37) and marks (0.40) but certainly not as strong as other categories.

Modelling

Now we have some key statistics, we can build a number of linear regression models to understand if any, and which combinations of key stats lead to Brownlow votes. For the purposes of this tutorial I have created 4 models, but feel free to build more with larger numbers of variables (being careful not to overfit).

# Building the 4 models
model1 <- lm(brownlow ~ total_disposals, player_season_data)

model2 <- lm(brownlow ~ total_disposals + total_goals, player_season_data)

model3 <- lm(brownlow ~ total_goals, player_season_data)

model4 <- lm(brownlow ~ total_disposals + total_tackles, player_season_data)

# Getting a summary and comparing the models
summary(model1)
summary(model2)
summary(model3)
summary(model4)

As we can see having total disposals alone in the model shows an adjusted R^2 value of 40.7%. If we add in other variables such as goals or tackles it only increases R^2 marginally (to 45.6% and 41.4% respectively). If we use goals scored purely as a predictor of Brownlow votes, we get an adjusted R^2 of 13.2%, substantially less than if we purely use disposals.

Conclusion

Based on our analysis above for 2022, we can definitely see that disposals go a long way to earning Brownlow votes, substantially more than goals scored. When we assess the types of statistics that midfielders ‘rack-up’, in comparison to their forward and back teammates, there is a clear link between these types of statistics (possessions, tackles, inside 50s etc.) and Brownlow votes in comparison to forwards’ statistics (like goals scored).

I encourage you to try this for yourself with different seasons’ and variations!