As someone that runs and exercises a lot in my free time, I wanted to understand people’s interest in exercising and whether it has improved over the years.My time series is taken from Google Trends and gives information on the indexed search volume for the term “Exercise” in GB from 2011 to 2025 every month.
First, plot the time series and model it using simple linear regression.
The linear regression model shows a slightly positive trend, meaning the popularity of the term “exercise” has increased over time. The search history seems to have a seasonal pattern and it spiked during the covid-19 pandemic. The linear regression model is therefore not a good model as it cannot capture any seasonal patterns.
Next, we can model the time series using the decompose method. This allows us to explore the trend, seasonality, and residual components of a time series.
Trend component shows an gradual increase in search history for exercise from 2011 to 2020, followed by a spike in 2020 and 2021. Search history then drops back to pre-2020 levels in 2022 to 2024 and rises slightly in 2025.
The gradual increase in search history from 2011 to 2020 can be explained by the increase availability of mobile phones and technology. As mobile phones are more accessible, people are able to use technology in their every day lives for non-work purposes.
The huge spike corresponds to the covid-19 pandemic, where people are stuck at home, hence have more time to consider exercise.
The search volume returns to pre-covid levels in 2022 after the loosening of covid restrictions in GB.
Seasonality component shows a clear decreasing trend in search history for exercise over the year. It is interesting to see a high search history in January, and steep reductions between April to August and between November to December. An explanation for the high volume in January is most people try to loose weight after the holiday season and make new year resolutions during this month. Gradually over the year, people pay less attention on exercising hence search volume reduces.
Resdiual component shows huge residuals in 2020 and 2021. This is
expected as the covid-19 pandemic happened during this period and model
cannot capture these “unexpected circumstances” unless we add an
additional model to model the effects of the pandemic. If we ignore the
residuals from 2020 to 2021, the residuals look random.
We can now test the properties of the residuals. Ideally, residuals should be stationary and follows white noise property.
Adf test is used to test if time series is stationary. The p-value for adf test is less than 5%, so residuals are stationary.
#adf test
#H0: Time series is not stationary
tseries::adf.test(exercise_decom_resid) #p-value = 0.01 < 0.05
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Warning in tseries::adf.test(exercise_decom_resid): p-value smaller than
## printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: exercise_decom_resid
## Dickey-Fuller = -8.2438, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
Another test used to test stationary property is the kpss test. P-value is greater than 5%, so we fail to reject H0 and the residuals are stationary.
#kpss test
#H0: Time series is stationary
feasts::unitroot_kpss(exercise_decom_resid) #p-value = 0.1 > 0.5
## kpss_stat kpss_pvalue
## 0.01911093 0.10000000
Ljung-Box test is used to test if time series is white noise. The p-value for Ljung-Box test is less than 5%, so we reject H0 and the residuals are not white noise.
#Ljung box test
#H0: Time series is white noise
Box.test(exercise_decom_resid, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: exercise_decom_resid
## X-squared = 45.014, df = 1, p-value = 1.957e-11
I took a subset of the time series so we model only post-covid data (after 2022), hoping the residuals will follow a white noise pattern. However, there is still auto-correlation in the residuals.
exercise_subset <- window(gb_exercise, start = c(2022,1))
exercise_subset_decom <- decompose(exercise_subset, type = "additive")
plot(exercise_subset_decom)
tseries::adf.test(exercise_subset_resid)
## Warning in tseries::adf.test(exercise_subset_resid): p-value smaller than
## printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: exercise_subset_resid
## Dickey-Fuller = -8.2438, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
Box.test(exercise_subset_resid, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: exercise_subset_resid
## X-squared = 45.014, df = 1, p-value = 1.957e-11
Meta Prophet is a time series forecasting tool based on an additive model. The package allows us to model yearly, weekly, daily seasonality and holiday effect. It is a open source software released by Meta (Formerly Facebook) We try to run Prophet on my time series to predict search history volume for the next 5 years.
First, we model the complete time series using Prophet without subsetting it.
## Loading required package: Rcpp
## Loading required package: rlang
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
#Defining the time series for Prophet
ds <- as.yearmon(time(gb_exercise))
y <- gb_exercise
exercise_df <- data.frame(ds, y)
prophet_m <- prophet(exercise_df, weekly.seasonality = FALSE,daily.seasonality = FALSE)
#disable weekly and daily seasonality as time series data is expressed in months
prophet_time <- make_future_dataframe(prophet_m, periods = 60, freq = "month")
# 5 years is equivalent to 60 periods
prophet_pred <- predict(prophet_m, prophet_time)
# make 5 years of predictions using the Prophet model
plot(prophet_m, prophet_pred, xlab = "Time", ylab = "Search volume") +
ggtitle("Time series modelled using Prophet")
It looks like the model predicts a reduction in search volume for exercise in the next 5 years. I am speculative about this prediction as Covid certainly impacts the quality of my data.To remove the impact of covid, I applied the prophet to pre-covid and post-covid time frames and see what trends are obtained.
exercise_subset <- window(gb_exercise, end = c(2019,12))
#subsets the time series for pre-2020 data
Prophet predicts that search volume will display a decreasing trend over time. The decrease is driven by a reduction in search volume from 2017 to 2019. I would expect search volume to increase given increase awareness of the importance of exercise, but from a data perspective, search volume is predicted to decrease.
exercise_subset <- window(gb_exercise, start = c(2022,1))
#subsets the time series for post-2022 data
We see that a increased search volume over the next 5 years. The increasing trend is driven by the increase in search volume from 2023 to 2025.
ARIMA is another widely used time series forecasting tool in R. It combines auto-regression, differencing, and moving average techniques to analyse a time series. Apply ARIMA to model post-covid data.
library(forecast)
exercise_subset <- window(gb_exercise, start = c(2022,1))
forecast_arima <- forecast::forecast(exercise_subset, h = 60)
time_arima <- seq(from = 2026, by = 1/12, length.out = 60)
plot(forecast_arima, main = "ARIMA forecast")
ARIMA also predicts an increase in search volume for exercise in the next 5 years. However, the increase is less steep compared to Prophet’s forecast.
To test the modelled time series (from the Prophet and ARIMA method), we can compare the forecasts with actual data from January to March 2026.
gb_exercise_26 <- c(54, 57, 47)
#Data points extracted directly from google trends instead of loading a csv file
gb_exercise_26 <- ts(gb_exercise_26, start = c(2026,1), frequency = 12)
gb_exercise_26 <- ts(c(gb_exercise, gb_exercise_26), start = c(2011,1), frequency = 12)
plot(gb_exercise_26, main = "Web search volumne for exercise in GB", type = "l")
## Warning in check_tzones(e1, e2): 'tzone' attributes are inconsistent
plot(forecast_arima, main = "ARIMA forecast")
lines(gb_exercise_26, col="red")
lines(gb_exercise)
legend("topleft", legend=c("2011-2025 ts", "2026 ts", "ARIMA"), fill=c("black", "red", "deepskyblue"), cex = 0.6)
ARIMA is under-predicting search history. In reality, more poeple in GB
are searching the term “Exercise” than expected.
In summary, both the Prophet and ARIMA forecasts under-predict the actual search volume in 2026 which shows the popularity of exercise this year has grown more than expected.