How to identify trends & seasonality in time series data in R

This article will walk through how to identify if your time series data has trend or seasonality. First, lets define trend and seasonality.

What is trend? Trend in time series data is a specific pattern which in which observations are either increasing or decreasing over time. Trends can constantly change. If you work for an e-commerce site and want to know if sales are improving

What is seasonality? Seasonality in time series data makes forecasting more complex. One reason is that seasonality makes it difficult to understand trends in your data. For your e-commerce site, if you want to understand if sales are improving, periods of high shopping during Christmas can obscure how well or poorly the business is doing.

Before we dig in, let’s review three core concepts when talking about forecasting in time series data.

base: the best estimate of the level (in this case sales) without seasonality or other fluctuations
trend: the rate of increase (e.g. for monthly sales data, this would be the monthly rate of increase)
seasonal: the expected change based on the season that is either higher or lower than the mean level

What we want to do is the estimate the base level (#1) of our data without trend (#2) and without seasonality (#3). Then we determine what is the rate of increase with trend (monthly rate of increase) + seasonality (sales are increased or decreased from the mean level based on the season).

Essentially when we build a forecast model, we are breaking down the time series patterns into it’s individual components (base, trend, & seasonality) and we use this to build our forecast equation.

Set up environment

# Install pacman if needed
if (!require("pacman")) install.packages("pacman")

## Loading required package: pacman

# load packages
pacman::p_load(pacman,
  tidyverse, openxlsx, forecast, modeltime, parsnip, rsample, timetk, xts, tidyquant, feasts, prophet)

Data Import

#Read Excel Spreadsheet
retail <- read_csv("https://raw.githubusercontent.com/PacktPublishing/Forecasting-Time-Series-Data-with-Facebook-Prophet/main/data/online_retail.csv")

## Rows: 1104 Columns: 2

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (1): total sales
## date (1): date

## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

#Dataset is in datasets subfolder
#df <- read.xlsx("datasets/marketing_data_wine.xlsx", skipEmptyRows = TRUE)

#Check results
#str(df)

glimpse(retail)

## Rows: 1,104
## Columns: 2
## $ date          <date> 2009-12-01, 2009-12-02, 2009-12-03, 2009-12-04, 2009-12…
## $ `total sales` <dbl> 3106.0000, 3254.0000, 2951.0000, 2529.0000, 2644.5418, 1…

retail  <- retail %>% 
  rename(sales = `total sales`)

#Check results
names(retail)

## [1] "date"  "sales"

How to identify if your data has trend and/or seasonality

This step is mostly about visualizing your data. Spotting trends and seasonality can sometimes be easy. Does your data look like it’s moving up, down or flat? Are there a lot of peaks and valleys? However, sometimes it’s not clear what’s going on in the data, but there are quite a few tools to help make this task easier. We will cover a few common methods.

Method 1. Line plot with a smooth line

This is about plotting your data with a smooth line. This can be done using either:

Moving Average
Lowess Smoothing

Both methods essentially fits a smooth line to data points.

ggplot(data=retail, aes(x=date, y=sales)) + geom_line() + geom_ma(ma_fun = SMA, color = "orange", n = 4) + geom_ma(ma_fun = SMA, n=5, color = "lightblue")+ theme_classic()

#From timetk package - visualize sales
retail %>%
  plot_time_series(date, sales, .interactive = TRUE, .smooth = TRUE)

#On a log instead
retail %>%
  plot_time_series(date, log(sales), .smooth = TRUE, .smooth_color = "#FFA500", .title = "Ecommerce Sales")

In both plots, the smooth line makes it much easier to see the sales trend in the data. We can see that there has a been an upward trend in sales.

Method 2. Components Analysis

What if there was a way to clearly view base, trend, & seasonality at the same time. We can leverage Facebook’s Prophet package for visualization.

seasonality mode. Additive is the default mode, and if no seasonality is chosen.

df <- retail %>% 
  rename(ds= date,
         y = sales)

# Model fit
retail_model <- prophet(df, seasonality.mode = 'additive')

## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.

#fit model on historical data
#If we don't use a future dataframe in the predict function, the model is
#fit on the historical data which is what we want.
retail_forecast <- predict(retail_model)

#Show components plot
prophet_plot_components(retail_model, retail_forecast)

#Forecast plot
plot(retail_model, retail_forecast)

There are 3 components: Trend, Weekly, Yearly. If you have a dataset that has at least two years of data, Prophet will automatically detect and isolate yearly seasonality components.

Trend shows that demand has been decreasing since 2010.

The yearly trend is somewhat wavy but stable and showing high peak demand November, December (around 2000 higher in sales).

Another concept to understand with trend and seasonality is additive or multiplicative models.

Additive or multiplicative

Additive The additive decomposition is the most appropriate if the magnitude of the seasonal fluctuations, or the variation around the trend-cycle, does not vary with the level of the time series.

Multiplicative When the variation in the seasonal pattern, or the variation around the trend-cycle, appears to be proportional to the level of the time series

For this exercise, we will use air passenger miles from the United States Department of Transportation Bureau of Transportation Statistics.

#Rename columns
df <- retail %>% 
  rename(ds= date,
         y = sales)

# Model fit
retail_model_m <- prophet(df, seasonality.mode = 'multiplicative')

## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.

#fit model on historical data
retail_forecast_m <- predict(retail_model_m)

#Show components plot
prophet_plot_components(retail_model_m, retail_forecast_m)

#Forecast plot
plot(retail_model_m, retail_forecast_m)

There is trend in this dataset. There is also day of week patterns showing that sales are higher on Mondays and Thursdays. Lowest on Sundays and Saturdays. The yearly trend is somewhat wavy but stable and showing high peak demand November, December (around 90% higher in sales). The multiplicative model is a better fit since it can model the relative magnitude of seasonality.

Method 3. Autocorrelation plots

Autocorrelation plots can also reveal trend and seasonality.

df <- as_tsibble(retail)

## Using `date` as index variable.

df %>%
  ACF(sales) %>%
  autoplot() +
  labs(title="Autocorrelation plot of retail data")

How to interpret autocorrelation plot when it looks like vertical lines on a screen.

Final Summary

We walked through how to use line plots, component plots, and autocorrelation plots to identify trend and seasonality in time series data.

References

Winston, W. L. (2014) Marketing Analytics: Data-driven techniques with Microsoft Excel. Wiley. Pg 225 -234.
Hyndman, R.J., & Athanasopoulos, G. (2021) Forecasting: principles and practice, 3rd edition, OTexts: Melbourne, Australia. OTexts.com/fpp3. Accessed on 11-27-2022.
https://www.bts.gov/content/us-passenger-miles. Accessed on 06-24-2023.

Common Time series methods

1. naive

2. average

3. exponential smoothing

4. simple regression

5. ARIMA

6. neural networks

Which time series method to build a forecast model based on if there is seasonality and/or trend:	Forecasting Method
No seasonality/ No trend	averages
No seasonality with trend	averages exponential smoothing (e.g. Holt) linear regression
Seasonality with no trend	exponential smoothing (e.g Winter)
Seasonality with trend	exponential smoothing (e.g. Holt-Winter)