Despite being the most popular sport in the world, Soccer is still struggling to captivate Australian audiences when it comes to their only professional competitions, the A-leagues. The A-leagues come in the form of the A-league Women and A-league Men competitions, and runs between October and May.
In that time, some major sporting events occur and may be a reason why the A-leagues struggle to attract the interest that other sports do in Australia. Our following analysis looks to understand when Australians are interested in the A-leagues, and when they aren’t, as well as a the reason why this may be.
The data we are using today has been sourced from Google Trends. You are able to download the same data to your desktop simply by searching ‘A-leagues’ into the Google Trends search bar and downloading the data as a CSV file. In the excel spreadsheet, make sure to delete any unnecessary rows so that the first row is the first date point.
# Load Data, convert to Date variable and filter for only past 10 years of data
library(tidyverse)
library(tsibble)
library(lubridate)
library(fpp3)
aleague<- read.csv("aleague.csv") # Change to whatever your file is saved as
aleague<- aleague |>
mutate(date = yearmonth(ym)) |>
as_tsibble(index = date) |>
slice(130:250) # Filters for past 10 years only (10/2014 - 10/2024)
aleague<- aleague |>
select(date,Interest) # Necessary Cleaning
aleague # Display table
## # A tsibble: 121 x 2 [1M]
## date Interest
## <mth> <int>
## 1 2014 Oct 69
## 2 2014 Nov 63
## 3 2014 Dec 55
## 4 2015 Jan 30
## 5 2015 Feb 48
## 6 2015 Mar 48
## 7 2015 Apr 49
## 8 2015 May 63
## 9 2015 Jun 14
## 10 2015 Jul 13
## # ℹ 111 more rows
As we can see here, we have a neat data frame that shows us the date in one column, and the level on interest shown in the A-leagues for this time.
This shows us the overall change in interest towards the A-leagues over the last 10 years
# Time Series Plot
aleague |>
autoplot()
## Plot variable not specified, automatically selected `.vars = Interest`
This shows us how interest changed in every year from 2014-2024.
# Seasonal Pattern
aleague |> gg_season(Interest)
We can see there is no interest in the off-season (June-September) and then varied interest over the course of the season (October-May)
This plot allows you to see the interest value of each month over the last 10 years to get a more detailed understanding of how interest changes over time.
# Time series plot
library(timetk)
aleague$date<- as.Date(aleague$date)
aleague |>
plot_time_series(.date_var = date,
.value = Interest)
This shows some of the diagnostics of the seasonal patterns that occur in the A-leagues interest levels. It also interactive.
# Seasonal Diagnostics
aleague |>
plot_seasonal_diagnostics(.date_var = date,
.value = Interest)
The A-league has anomaly seasons such as when COVID interrupted several seasons as well as when the league was suspended for a month in January 2015 to accommodate for the hosting of the 2015 Asian Cup.
# Data Anomalies
aleague |>
plot_anomaly_diagnostics(.date_var = date,
.value = Interest)
## frequency = 12 observations per 1 year
## trend = 60 observations per 5 years
This shows us how interest is trending overtime.
# Trend
aleague |>
plot_stl_diagnostics(.date_var = date,
.value = Interest,
.facet_scales = "free",
.feature_set = c("trend"))
## frequency = 12 observations per 1 year
## trend = 60 observations per 5 years
The A-leagues is trending upwards which is positive for them.
This identifies seasonal patterns in the A-leagues interest and is vital for understanding when interest peaks and drops over the course of a season.
# Decomposed Series
aleague |>
plot_stl_diagnostics(.date_var = date,
.value = Interest,
.facet_scales = "free",
.feature_set = c("season"))
## frequency = 12 observations per 1 year
## trend = 60 observations per 5 years
This is a data frame we were able to create from data sourced from the ‘Austadiums’ website.
It shows us the most recent attendance stats for major sporting events over Summer.
# A-league attendance compared to other sports
attendance<-data.frame(
Sport = c('Aleague Men', 'Aleague Women', 'BBL', 'International Cricket', 'Australian Open Tennis', 'Melbourne Grand Prix', 'AFL Round 1', 'NRL Round 1'),
Avg_Event_Attendance = c(8589, 2248, 21505, 25446, 72912, 113013, 45933, 28038),
Events = c(167,137,41,19,15,4,9,8),
Event_Type = c('Match', 'Match', 'Match', 'Day', 'Day', 'Day', 'Match', 'Match'),
Event_Length_Weeks = c(27,26,6,5,2,1,1,1)
)
attendance$Total_Attendance = attendance$Avg_Event_Attendance*attendance$Events
attendance$Weekly_Attendance = attendance$Total_Attendance/attendance$Event_Length_Weeks
# Plot attendance
# By single events per comp
attendance<- attendance |>
mutate(color = ifelse(Sport == "Aleague Women" | Sport == "Aleague Men", "Aleague", "Other"))
ggplot(
attendance,aes(x = reorder(Sport,Avg_Event_Attendance), y = Avg_Event_Attendance, fill = color)) +
geom_col(color = 'black')+
scale_fill_manual(values = c("Aleague" = "red", "Other" = "navy"))+
labs(title = 'Attendance of Sporting Events over A-leagues season', x = 'Sporting Event', y = 'Average Event Attendance')+
theme(axis.text.x = element_text(angle = 15, hjust = 1))
This shows us how much the A-leagues struggle when one of these events is also on at the same time.
The good thing for the A-leagues is that these sports are not on for that long compared to the A-leagues season.
# Add weeks
ggplot(
attendance,aes(x = reorder(Sport,Avg_Event_Attendance), y = Avg_Event_Attendance, fill = color)) +
geom_col(color = 'black')+
scale_fill_manual(values = c("Aleague" = "red", "Other" = "navy"))+
labs(title = 'Attendance of Sporting Events over A-leagues season', x = 'Sporting Event', y = 'Average Event Attendance')+
geom_text(aes(label = paste(Event_Length_Weeks, "Weeks")), vjust = -0.5)+
theme_classic()+
theme(axis.text.x = element_text(angle = 15, hjust = 1))
The next plot shows us how we can expect attendances to be for any single week where on of these sports is on:
# Average Weeky Attendance
ggplot(
attendance,aes(x = reorder(Sport,Weekly_Attendance), y = Weekly_Attendance, fill = color)) +
geom_col(color = 'black')+
scale_fill_manual(values = c("Aleague" = "red", "Other" = "navy"))+
labs(title = 'Weekly Attendance of Summer Sporting Events', subtitle = "source: austadiums", x = 'Sporting Event', y = 'Average Weekly Attendance')+
theme_classic()+
theme(axis.text.x = element_text(angle = 15, hjust = 1))+
scale_y_continuous(labels = scales::comma)+
theme(legend.title = element_blank())
Once again, the A-leagues are lacking.
This may help us explain why we see interest in the A-league changing over the course of a season. When these bigger events are on, it likely can explain why interest drops.
We have also forecasted the next 3 years and predicted what interest will be like towards the A-leagues for the next 3 years.
# Forecasting
# Holts winter seasonality - multiplicative
aleague<- read.csv("aleague.csv")
aleague<- aleague |>
mutate(date = yearmonth(ym)) |>
as_tsibble(index = date) |>
slice(130:250)
aleague<- aleague |>
select(date,Interest)
fit <- aleague |>
model(
multiplicative = ETS(Interest ~ error("M") + trend("A") + season("M"))
)
forecast <- fit|> forecast(h = "3 years")
augment <- augment(fit)
forecast |>
autoplot(aleague, level = NULL) # With time series plot
forecast |>
autoplot() # Forecasting on its own