This project analyses Google trends data for the research term “arsenal” to see how public interest in the football club changes over time.The dataset contains weekly search popularity values allowing us to analyse trends and fluctuations in interest.In this project I want to analyse these results and use Meta’s Prophet model to forecast future search interest.
The dataset I used in this project was from google trends and contains search interest values for the term “arsenal”.Google Trends reports popularity on a scale from 0 to 100, where 100 represents the highest level of search interest in the selected period.The dataset is structured with two columns: ds: the date of each observation y: the search interest value.The data begins in 2004 and provides a long time series that can be used to explore trends and generate forecasts.
## # A tibble: 6 × 2
## ds y
## <date> <dbl>
## 1 2004-01-01 7
## 2 2004-02-01 8
## 3 2004-03-01 9
## 4 2004-04-01 10
## 5 2004-05-01 8
## 6 2004-06-01 5
ggplot(arsenal_trends, aes(x = ds, y = y)) +
geom_line() +
labs(
title = "Google Search Interest for Arsenal",
x = "Date",
y = "Search Interest"
)The plot shows how search interest in Arsenal has changed over time. There are clear fluctuations across the series, with some periods of higher search activity and other times of lower interest. These changes may show how important football events such as title challenges, transfer windows, manager changes, or other times that have caused increased media attentions.
##
## Call:
## lm(formula = y ~ ds, data = arsenal_trends)
##
## Residuals:
## Min 1Q Median 3Q Max
## -24.771 -3.120 -0.594 1.935 64.080
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -40.402472 3.724999 -10.85 <2e-16 ***
## ds 0.003782 0.000224 16.89 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.585 on 265 degrees of freedom
## Multiple R-squared: 0.5184, Adjusted R-squared: 0.5165
## F-statistic: 285.2 on 1 and 265 DF, p-value: < 2.2e-16
The linear regression results suggest that search interest in Arsenal has increased over time. The coefficient for the date variable is positive, which shows an upward trend in the series. The very small p-value suggests that this trend is statistically significant.
arsenal_prophet_model <- prophet(arsenal_trends)
future_dates <- make_future_dataframe(
arsenal_prophet_model,
periods = 24,
freq = "month"
)
arsenal_forecast <- predict(arsenal_prophet_model, future_dates)
plot(arsenal_prophet_model, arsenal_forecast)Prophet is a forecasting model developed by Meta that is used for time series data. It works by separating the data into components such as trend and seasonal patterns, and then uses these to predict future values.The forecast plot shows both the historical data and the predicted values. From the results there is an overall increase in search interest over time, along with fluctuations throughout the year. These fluctuations suggest that search interest changes depending on football related events.However, the model has some limitations. It assumes that past patterns will continue into the future, which may not always be the case. For example, events such as major transfers or important matches could cause changes in search interest that are difficult to predict.
recent_arsenal_trends <- subset(arsenal_trends, ds >= as.Date("2018-01-01"))
recent_prophet_model <- prophet(recent_arsenal_trends)
recent_future <- make_future_dataframe(
recent_prophet_model,
periods = 24,
freq = "month"
)
recent_forecast <- predict(recent_prophet_model, recent_future)
plot(recent_prophet_model, recent_forecast)The second forecast uses more recent observations. This makes it more focused on Arsenal’s recent popularity patterns, which may produce different results from the model fitted to the full dataset.
This project analysed Google Trends data for the search term “arsenal” to understand how public interest has changed over time. The data was plotted to identify patterns, and a linear regression model was used to examine the overall trend. The Prophet model was then applied to generate forecasts of future search interest. The results suggest that search interest have increased over time, although there are some fluctuations throughout the series. While the model provides useful information, the forecasts should be interpreted with carefarely because search behaviour can be influenced by unpredictable football events. Overall, this analysis shows how time series methods can be used to study real world data and generate forecasts, while also showing the limitations of relying on historical patterns alone.
google trends data: https://trends.google.com/trends/