Agenda

  • Introduction to time series analysis and forecasting
  • Time series objects - introduction to the time series classes and their attributes
  • Descriptive analysis of time series
  • Linear regression-based forecasting models
  • The ARIMA family of models

Today, we will mainly focus on methods for analyzing and forecasting regular time-series data with seasonality patterns

Quick pool

  • Used R?
  • Feel comfortable with linear regression?
  • Familiar with the forecast package?
  • Used ggplot2 or plotly?

Assumptions

  • Some background in R
  • Basic knowledge in probability
  • Familiar with linear regression

Why R?

  • Open source and free
  • Statistical programming language
  • A vast amount of packages for time series analysis
  • The forecast package (and soon the fable package)

Goals

By the end of this workshop, you probably won’t become an expert in time series analysis and forecasting, but you will be able to:

  • Explore time series data with some basic tools
  • Use discriptive statistics for identifying seasonal and correlation patterns
  • Build basic forecasting model

Admin

Workshop material

All today’s slides, code, and rmarkdown files are available on GitHub

Downloading the workshop material from the terminal:

git clone https://github.com/RamiKrispin/Time-Series-Workshop.git

Required packages

install.packages(c("forecast", "plotly", "ggplot2", "dplyr", "UKgrid", "fpp2", "shiny", "tsibble", "dygraphs"))

# We will use the dev version of TSstudio as some of 
# the features will use today are not yet on CRAN (will be by end of Oct)
devtools::install_github("RamiKrispin/TSstudio")

# This package is on early development mode, 
# will use it for the linear regression shiny example
devtools::install_github("RamiKrispin/forecastML")

Introduction to time series analysis

Time series analysis is commonly used in many fields of science, such as economics, finance, physics, engineering, and astronomy. The usage of time series analysis to understand past events and to predict future ones did not start with the introduction of the stochastic process during the past century. Ancient civilizations such as the Greeks, Romans, or Mayans, researched and learned how to utilize cycled events such as weather and astronomy to predict future events.

Time series analysis - is the art of extracting meaningful insights from time-series data to learn about past events and to predict future events.

This process includes the following steps:

  • Data collection - pulling the raw data from a database, API, flat files etc.
  • Data prep - cleaning, reformating (dates, classes, etc.), aggregating
  • Descriptive analysis - using statistical methods and data visualization tools to extract insights and learn about the series components and patterns
  • Predictive analysis - leveraging the insights learned in the descriptive process and apply some predictive model

Generally, in R this process will look like this:

Of course, there are more great packages that could be part of this process such as zoo, xts, bsts, forecastHybird, prophet, etc.

The TSstudio package

The TSstudio package provides a set of functions for time series analysis. That includes interactive data visualization tools based on the plotly package engine, supporting multiple time series objects such as ts, xts, and zoo. The following diagram demonstrates the workflow of the TSstudio package:

Time series data

Time series data - is a sequence of values, each associate to a unique point in time that can divide to the following two groups:

  • Regular time series - is a sequence of observations which were captured at equally spaced time intervals (e.g., every month, week, day, hour, etc.)
  • Irregular time series - or unevenly spaced time series, is a sequence of observations which were not captured on equally spaced time intervals (for example rainy days, earthquakes, clinical trials, etc.)

Note: typically, the term time series data referred to regular time-series data. Therefore, if not stated otherwise, throughout the workshop the term time series (or series) refer to regular time-series data

Examples of time series data

Applications

With time series analysis, you can answer questions such as:

  • How many vehicles, approximately, going to be sold in the US in the next 12 months?
  • What will be the estimated demand for natural gas in the US in the next five years?
  • Generally, what will be the demand for electricity in the UK during the next 24 hours?

Time series objects

There are multiple classes in R for time-series data, the most common types are:

  • The ts class for regular time-series data, and mts class for multiple time seires objects , the most common class for time series data
  • The xts and zoo classes for both regular and irregular time series data, mainly popular in the financial field
  • The tsibble class, a tidy format for time series data, support both regular and irregular time-series data

The attribute of time series object

A typical time series object should have the following attributes:

  • A vector or matrix objects with sequential observations
  • Index or timestamp
  • Frequency units
  • Cycle units

Where the frequency of the series represents the units of the cycle. For example, for monthly series, the frequency units are the month of the year, and the cycle units are the years. Similarly, for daily series, the frequency units could be the day of the year, and the cycle units are also the years.

The stats package provides a set of functions for handling and extracting information from a ts object. The frequency and cycle functions, as their names implay return the frequency and the cycle, respectivly, of the object. Let’s load the USgas series from the TSstudio package and apply those functions:

library(TSstudio)
data(USgas)

class(USgas)
## [1] "ts"
is.ts(USgas)
## [1] TRUE
frequency(USgas)
## [1] 12
cycle(USgas)
##      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 2000   1   2   3   4   5   6   7   8   9  10  11  12
## 2001   1   2   3   4   5   6   7   8   9  10  11  12
## 2002   1   2   3   4   5   6   7   8   9  10  11  12
## 2003   1   2   3   4   5   6   7   8   9  10  11  12
## 2004   1   2   3   4   5   6   7   8   9  10  11  12
## 2005   1   2   3   4   5   6   7   8   9  10  11  12
## 2006   1   2   3   4   5   6   7   8   9  10  11  12
## 2007   1   2   3   4   5   6   7   8   9  10  11  12
## 2008   1   2   3   4   5   6   7   8   9  10  11  12
## 2009   1   2   3   4   5   6   7   8   9  10  11  12
## 2010   1   2   3   4   5   6   7   8   9  10  11  12
## 2011   1   2   3   4   5   6   7   8   9  10  11  12
## 2012   1   2   3   4   5   6   7   8   9  10  11  12
## 2013   1   2   3   4   5   6   7   8   9  10  11  12
## 2014   1   2   3   4   5   6   7   8   9  10  11  12
## 2015   1   2   3   4   5   6   7   8   9  10  11  12
## 2016   1   2   3   4   5   6   7   8   9  10  11  12
## 2017   1   2   3   4   5   6   7   8   9  10  11  12
## 2018   1   2   3   4   5   6   7   8   9  10  11  12
## 2019   1   2   3   4   5   6   7

The time function returns the series index or timestamp:

head(time(USgas))
## [1] 2000.0000 2000.0833 2000.1667 2000.2500 2000.3333 2000.4167

The deltat function returns the length of series’ time interval (which is equivalent to 1/frequency):

deltat(USgas)
## [1] 0.083333333

Similarly, the start and end functions return the starting and ending time of the series, respectively:

start(USgas)
## [1] 2000    1
end(USgas)
## [1] 2019    7

Where the left number represents the cycle units, and the right side represents the frequency units of the series. The tsp function returns both the start and end of the series and its frequency:

tsp(USgas)
## [1] 2000.0 2019.5   12.0

Last but not least, the ts_info function from the TSstudio package returns a concise summary of the series:

ts_info(USgas)
##  The USgas series is a ts object with 1 variable and 235 observations
##  Frequency: 12 
##  Start time: 2000 1 
##  End time: 2019 7

Creating a ts object

The ts function allows to create a ts object from a single vector and a mts object from a multiple vectors (or matrix). By defining the start (or end) and frequency of the series, the function generate the object index. In the following example we will load the US_indicators dataset from the TSstudio package and convert it to a ts object. The US_indicators is a data.frame with the monthly vehicle sales and unemployment rate in the US since 1976:

data(US_indicators)

head(US_indicators)
##         Date Vehicle Sales Unemployment Rate
## 1 1976-01-31         885.2               8.8
## 2 1976-02-29         994.7               8.7
## 3 1976-03-31        1243.6               8.1
## 4 1976-04-30        1191.2               7.4
## 5 1976-05-31        1203.2               6.8
## 6 1976-06-30        1254.7               8.0
mts_obj <- ts(data = US_indicators[, c("Vehicle Sales", "Unemployment Rate")], 
              start = c(1976, 1),
              frequency = 12)

ts_info(mts_obj)
##  The mts_obj series is a mts object with 2 variables and 524 observations
##  Frequency: 12 
##  Start time: 1976 1 
##  End time: 2019 8

How to define the start and frequency arguments?

Series Type Cycle Units Frequency Units Frequency Example
Quarterly Years Quarter of the year 4 ts(x, start = c(2019, 2), frequency = 4)
Monthly Years Month of the year 12 ts(x, start = c(2019, 1), frequency = 12)
Weekly Years Week of the year 52 ts(x, start = c(2019, 13), frequency = 52)
Daily Years Day of the year 365 ts(x, start = c(2019, 290), frequency = 365)

What if you have more granular time series data such as half-hour, 15 or five minutes intervals?

Me when needed to work with daily time series using ts object:

The disadvantages of the ts object

The ts object was designed for work with monthly, quarterly, or yearly series that have only two-time components (e.g., year and month). Yet, more granular series (high frequency) may have more than two-time components. A common example is a daily series that has the following time attributes:

  • Year
  • Month
  • Day of the year
  • Day of the week

When going to the hourly or minute levels, this is even adding more components such as the hour, minute, etc.

The zoo, xts classes and now the tsibble class provide solution for this issue.

The tsibble class

“The tsibble package provides a data infrastructure for tidy temporal data with wrangling tools…”

In other words, the tsibble object allows you to work with a data frame alike (i.e., tbl object) with a time awareness attribute. The key characteristics of this class:

  • It has a date/time object as an index
  • Using key to store multiple time series objects
  • A tbl object - can apply any of the normal tools to reformat, clean or modify tbl object such as dplyr functions

The reaction of me and my colegues when the tsibble package was released:

Creating a tsibble object

library(UKgrid)

data(UKgrid)

class(UKgrid)
## [1] "data.frame"
head(UKgrid)
##             TIMESTAMP    ND I014_ND   TSD I014_TSD ENGLAND_WALES_DEMAND
## 1 2011-01-01 00:00:00 34606   34677 35648    35685                31058
## 2 2011-01-01 00:30:00 35092   35142 36089    36142                31460
## 3 2011-01-01 01:00:00 34725   34761 36256    36234                31109
## 4 2011-01-01 01:30:00 33649   33698 35628    35675                30174
## 5 2011-01-01 02:00:00 32644   32698 34752    34805                29253
## 6 2011-01-01 02:30:00 32092   32112 34134    34102                28688
##   EMBEDDED_WIND_GENERATION EMBEDDED_WIND_CAPACITY
## 1                      484                   1730
## 2                      520                   1730
## 3                      520                   1730
## 4                      512                   1730
## 5                      512                   1730
## 6                      464                   1730
##   EMBEDDED_SOLAR_GENERATION EMBEDDED_SOLAR_CAPACITY NON_BM_STOR
## 1                         0                      79           0
## 2                         0                      79           0
## 3                         0                      79           0
## 4                         0                      79           0
## 5                         0                      79           0
## 6                         0                      79           0
##   PUMP_STORAGE_PUMPING I014_PUMP_STORAGE_PUMPING FRENCH_FLOW BRITNED_FLOW
## 1                   60                        67        1939            0
## 2                   16                        20        1939            0
## 3                  549                       558        1989            0
## 4                  998                       997        1991            0
## 5                 1126                      1127        1992            0
## 6                 1061                      1066        1992            0
##   MOYLE_FLOW EAST_WEST_FLOW I014_FRENCH_FLOW I014_BRITNED_FLOW
## 1       -382              0             1922                 0
## 2       -381              0             1922                 0
## 3       -382              0             1974                 0
## 4       -381              0             1975                 0
## 5       -382              0             1975                 0
## 6       -381              0             1975                 0
##   I014_MOYLE_FLOW I014_EAST_WEST_FLOW
## 1            -382                   0
## 2            -381                   0
## 3            -382                   0
## 4            -381                   0
## 5            -382                   0
## 6            -381                   0
library(dplyr)
library(tsibble)
data(UKgrid)
uk_grid <- UKgrid %>% 
  dplyr::select(time = TIMESTAMP, 
                net_demand = ND) %>%
  tsibble::as_tsibble(index = time)
  

head(uk_grid)
## # A tsibble: 6 x 2 [30m] <UTC>
##   time                net_demand
##   <dttm>                   <int>
## 1 2011-01-01 00:00:00      34606
## 2 2011-01-01 00:30:00      35092
## 3 2011-01-01 01:00:00      34725
## 4 2011-01-01 01:30:00      33649
## 5 2011-01-01 02:00:00      32644
## 6 2011-01-01 02:30:00      32092
class(uk_grid)
## [1] "tbl_ts"     "tbl_df"     "tbl"        "data.frame"
index(uk_grid)
## time
tsibble::interval(uk_grid)
## 30m

Descriptive analysis of time series

Like most common fields of statistics and machine learning, the goal of the descriptive analysis is to reveal meaningful insights about the series with the use of descriptive statistics and data visualization tools.

Plotting time series object

The plot function or plot.ts functions are R built-in functions for plotting time series object:

data("USVSales")

plot.ts(USVSales, 
        main = "US Monthly Total Vehicle Sales",
        ylab = "Thousands of units",
        xlab = "Date")

Alternatively, the ts_plot function from the TSstudio package provides an interactive visualization for time series object (ts, xts, zoo, tsibble, ets.). This function using the plotly package plotting engine:

ts_plot(USVSales, 
        title = "US Monthly Total Vehicle Sales",
        Ytitle = "Thousands of units",
        Xtitle = "Date",
        slider = TRUE)

The main advantage of using interactive data visualization tools that it allows you to zoom in the data with a click of a button. This is super useful when working with data and in particular, with time-series data.

The dygraphs package is another great tool for visualization time series data:

library(dygraphs)

dygraph(USVSales, 
        main = "US Monthly Total Vehicle Sales",
        ylab = "Thousands of units",
        xlab = "Date") %>% 
  dyRangeSelector()

The time series components

Time series data, typically, would have two types of patterns:

Structural patterns:

  • Trend - define the general growth of the series and its rate (e.g., linear, exponential, etc.)
  • Cycle - derived from the broad definition of a cycle in macroeconomics. A cycle can be described as a sequence of repeatable events over time, where the starting point of a cycle is at a local minimum of the series and the ending point is at the next one, and the ending point of one cycle is the starting point of the following cycle.
  • Seasonal - define the variation of the series that related to the frequency units of the series

Nonstructural patterns

The irregular component - which include any other patterns that are not captured by the trend, cycle, and seasonal components. For example structural changes, non-seasonal holidays effect, etc.

Together, the structural and non-structural patterns compounding the series, which can be formalized by the following expressions:

  • \(Y_t = T_t + C_t + S_t + I_t\), when the series has an additive structure, or

  • \(Y_t = T_t \times C_t \times S_t \times I_t\), when the series has a multiplicative structure

Applying log transformation on multiplicative series will transfome the series into additive structure:

\(log(Y_t) = log(T_t) + log(C_t) + log(S_t) + log(I_t)\)

We typically either ignore the cycle or embed it with the trend component, therefore:

\[Y_t = T_t + S_t + I_t\]

Decomposition of time series object

The decompose function from the stats decompose a time series into seasonal, trend and irregular components using moving averages. The ts_decompose function from the TSstudio provides an interactive wraper for the decompose function:

ts_decompose(USgas)

Seasonal analysis

Seasonality is one of the most dominant components in time series data (when exists) and it derived from the frequency units of the series (e.g., the month of the year for monthly time series data)

Furthermore, as funny as it may sound, most of the seasonal patterns in nature are related to two astrophysical phenomena:

  • The orbit of Earth around the Sun (also known as the orbital period of Earth), which is defined as 365 days
  • The rotation of Earth (or solar day) with a length of 86,400 seconds or 24 hours

For instance, the seasonality patterns of natural phenomena such as weather (temperature, rain, and snow fall), sunrise and sunset times, or the tide level are dictated directly from the orbital period and the solar time of Earth.

Seasonal types:

  • Single seasonal pattern: Whenever there is only one dominant seasonal pattern in the series
  • Multiple seasonal patterns: If more than one dominant seasonal pattern exists in the series

Data visualization helps to identify seasonal patterns in the series:

library(TSstudio)

data(USgas)

USgas_df <- data.frame(year = floor(time(USgas)), month = cycle(USgas),
   USgas = as.numeric(USgas))
   # Setting the month abbreviation and transforming it to a factor
   USgas_df$month <- factor(month.abb[USgas_df$month], levels = month.abb)
   
   
library(ggplot2)
    ggplot(USgas_df, aes(x = USgas)) +
      geom_density(aes(fill = month)) +
      ggtitle("USgas - Kernel Density Estimates by Month") +
      facet_grid(rows = vars(as.factor(month)))

Note that this plot take into account the trend of the series, let’s detrend the series and replot it:

USgas_df <- data.frame(year = floor(time(USgas)), month = cycle(USgas),
   USgas = as.numeric(USgas - decompose(USgas)$trend))
   # Setting the month abbreviation and transforming it to a factor
   USgas_df$month <- factor(month.abb[USgas_df$month], levels = month.abb)
   
   
library(ggplot2)
    ggplot(USgas_df, aes(x = USgas)) +
      geom_density(aes(fill = month)) +
      ggtitle("USgas - Kernel Density Estimates by Month") +
      facet_grid(rows = vars(as.factor(month)))

The ts_seasonal function is a castumize function for seasonal plot of low-frequency time series data (e.g., daily, monthly, quarterly):

ts_seasonal(USgas, type = "normal")
ts_seasonal(USgas, type = "cycle")
ts_seasonal(USgas, type = "box")

When setting the type argument to all it will combine the three plots together. Let’s again detrend the series and see the seasonal component of the series:

USgas_decompose <- USgas - decompose(USgas)$trend
ts_seasonal(USgas_decompose, type = "all")

Similarly, we can use a heatmap, surface, or polar plots:

ts_heatmap(USgas, color = "Reds")
ts_heatmap(USVSales, color = "Reds")
ts_surface(USgas)
ts_polar(USgas)
library(UKgrid)
UKgrid_df <- extract_grid(type = "data.frame",
                              columns = "ND",
                              aggregate = "hourly",
                              na.rm = TRUE)
ts_quantile(UKgrid_df)
ts_quantile(UKgrid_df, period = "weekdays", n = 2)
ts_quantile(UKgrid_df, period = "monthly", n = 2)

Correlation Analysis

Due to the continuous and chronologically ordered nature of time series data, there is a likelihood that there will be some degree of correlation between the series observations. For instance, the temperature in the next hour is not a random event since, in most cases, it has a strong relationship with the current temperature or the temperatures that have occurred during the past 24 hours.

In the context of forecasting and time series, we love correlated time series data!

The acf and pacf functions from the stats package plot the Auto-Correlation and Partial Auto-Correlation of the series with its legs:

UKgrid_daily <- extract_grid(type = "ts", aggregate = "daily")

acf(UKgrid_daily)

pacf(UKgrid_daily)

The ts_cor from the TSstudio return more details plot of the ACF and PACF of the function

ts_cor(UKgrid_daily, lag.max = 365 * 2)
ts_cor(UKgrid_daily, lag.max = 365 * 2, seasonal_lags = 7)

A more intuitive way to plot correlation is the lag plot, by plotting the series against its past lags:

ts_lags(USgas)
ts_lags(USgas, lags = c(12, 24, 36))

Forecasting with linear regression

The primary usage of the linear regression model is to quantify the relationship between the dependent variable Y (also known as the response variable) and the independent variable/s X (also known as the predictors, drivers or regressors variables) in a linear manner.

  • In the case of a single independent variable:

\[Y_{i} = \beta_{0} + \beta_{1}\times X_{1,i} + \epsilon_{i}\]

  • For n independent variables:

\[Y_{i} = \beta_{0} + \beta_{1}\times X_{1,i} + \beta_{2}\times X_{2,i} + ...+ \beta_{n}\times X_{n,i} + \epsilon_{i}\]

  • i represents the observations index, i = 1,…, N
  • \(Y_{i}\) the i observation of the dependent variable
  • \(X_{j,i}\) the i value of the j independent variable, where j = 1,…, n
  • \(\beta_{0}\) the value of the constant term (or intercept)
  • \(\beta_{j}\) the corresponded parameters (or coefficients) of the j independent variables, and
  • \(\epsilon_{i}\) defines the error term, which nothing but all the information that was not captured by independent variables for the i observation

Big misconception

The term linear, in the context of regression, referred to the model coefficients, which must follow a linear structure (as this allows us to construct a linear combination from the independent variables). On the other hand, the independent variables can follow a linear and non-linear formation.

The Ordinary Least Squares method (or OLS) is a simple optimization method which is based on basic linear algebra and calculus, or matrix calculus (this section is for general knowledge if you are not familiar with matrix calculus you can skip this section). The goal of the OLS is to identify the coefficients that minimize the residuals sum of squares. If the residual of the i observation defines as:

\[\hat{\epsilon_{i}} = Y_{i} - \hat{Y_{i}}\]

Then we can set the cost function by the following expression:

\[\sum_{i=1}^N \hat{\epsilon_{i}^2} = (Y_{1} - \hat{Y_{1}})^2 + (Y_{2} - \hat{Y_{2}})^2 + ... + (Y_{n} - \hat{Y_{n}})^2\]

Before applying the OLS method for minimizing the residuals sum of squares, for simplicity reasons, we will transform the representative of the cost function into a matrix formation:

\(\mathbf{Y} = \left[\begin{array}{rrr}Y_{1} \\Y_{2} \\.\\.\\.\\Y_{N}\end{array}\right]\) , \(\mathbf{X} = \left[\begin{array}{rrr}1 & X_{1,1}&.&.&.&X_{1,n} \\. \\.\\.\\1 & X_{N,1}&.&.&.&X_{N,n}\end{array}\right]\), \(\mathbf{\beta} = \left[\begin{array}{rrr}\beta_{0} \\\beta_{1} \\.\\.\\.\\\beta_{n}\end{array}\right]\), \(\mathbf{\epsilon} =Y-X\beta= \left[\begin{array}{rrr}\epsilon_{1} \\\epsilon_{2} \\.\\.\\.\\\epsilon_{N}\end{array}\right]\),

Where those set of matrices represents the following:

  • Vector Y (or \(N\times 1\) matrix), representing a dependent variable with N observations
  • X - a matrix of \(N\times n+1\), representing the corresponding n independent variables and a scalar of 1’s for the intercept component (\(\beta_{0}\))
  • \(\beta\) - a vector or \((n+1)\times 1\) matrix, representing the model coefficients
  • \(\epsilon\) - a vector or \(N\times 1\) matrix, representing the corresponding error rate or the difference between the Y and $X$

Note: the residual term \(\hat{\epsilon_{i}}\) should not be confused with the error term \(\epsilon_{i}\).While the first represents the difference between the series Y and its estimate \(\hat{Y}\), the second (error term) represents the difference between the series and its expected value

Let’s set the cost function using the matrix form as we defined above: \[\sum {\epsilon^2} = {\epsilon}^T{\epsilon}\]

We will now start to expand this expression by using the formula of \(\epsilon\) as we outlined above:

\[{\epsilon}^T{\epsilon} = (Y-X\beta)^T(Y-X\beta)\]

Next, we will multiply the two components (\(\epsilon^T\) and \(\epsilon\)) and open the brackets:

\[\epsilon ^{T}\epsilon = Y^TY-2Y^TX\beta+\beta^TX^TX\beta\]

Since our goal is to find the \(\beta\) that minimizes this equation, we will differentiate the equation with respect to \(\beta\) and then set it to zero:

\[ \frac{\partial\epsilon ^{T}\epsilon}{\partial \beta}=\frac{\partial (Y^TY-2Y^TX\beta+\beta^TX^TX\beta)}{\partial\beta} = 0\]

Solving this equation will yield the following output:

\[X^TX\beta=X^TY\] Manipulating this equation, allow us to extract \(\hat{\beta}\) matrix, the estimate of the coefficient matrix \(\beta\):

\[\hat\beta=(X^TX)^{-1}X^TY\]

Note: that we changed the notation of \(\beta\) to \(\hat\beta\) on the final output as it represents the estimate of the true value of \(\beta\).

The key properties of the OLS coefficients estimation:

  • The main feature of the OLS coefficients estimation method is the unbiasedness of the coefficients estimation with respect to the actual values. In other words, for any given \(\hat\beta_{i}\), \(E(\hat\beta_{i}) = \beta_{i}\)
  • The sample regression line will always go through the mean of X and Y
  • The mean of \(\hat{Y}\) is equal to the \(\overline{Y}\), the mean the dependent variable
  • The mean of the residuals vector \(\hat\epsilon\) is equal to zero, or \(\frac{\sum_{i = 1}^{N}\hat\epsilon_{i}}{N} = 0\)

The OLS assumptions

  • The model coefficients must follow a linear structure (e.g., \(Y = \hat\beta_{0} + \hat\beta_{1} e^{X}\) is linear model but \(Y = \hat\beta_{0} + X_{1} ^ \hat\ beta_{1}\) is not)
  • There is no perfect collineariy between the indepandent variables, \(X_{1}, X_{2},...,X_{n}\). In other words, non of the indepandent variable is a linear combination of any of the other indepandent variables
  • All the independent variables must be a non zero variance (or non-constant)
  • The error term \(\epsilon\), condition on the matrix of independent variables X, is independent and identically distributed (i.i.d) variable with mean 0 and constance variance \(\sigma^2\)
  • Both the dependent and independent variables are drawing from the population in a random sample. This assumption is not holding when regressing time series data, as typically the observations have some degree of correlation. Therefore this assumption is relaxed when regressing time series data.

Transforming time series problem into linear regression problem

We want to represent the different components of the series in a linear model formation:

\(Y_{t} = T_{t} + S_{t} + C_{t} + I_{t}\), when the series has an additive structure, or

\(Y_{t} = T_{t} \times S_{t} \times C_{t} \times I_{t}\), when the series has an multiplicative structure, where

Features creating is the name of the game here…

Shiny example

To launch shiny example please run the following code:

source("https://raw.githubusercontent.com/RamiKrispin/Time-Series-Workshop/master/Linear%20regression%20model%20-%20shiny%20example.R")

shinyApp(ui = ui, server = server)

ARIMA

Brief Overview

  • White Noise and Random Walks
  • Autoregressive (AR) models
  • Moving average (MA) models
  • Autoregressive moving average (ARMA) models
  • Using ACF & PACF for model ID

Goals

Leave this room with a shallow understanding of

  • Stationarity
  • Differencing
  • ARMA Identification
Model ACF PACF
AR(p) Tails off slowly Cuts off after lag p
MA(q) Cuts off after lag q Tails off slowly
ARMA(p,q) Tails off slowly Tails off slowly
  • ARIMA
  • Seasonal ARIMA
  • Linear Regression with ARIMA errors

Gaussian white noise

We often assume so-called Gaussian white noise, whereby

\[ w_t \sim \text{N}(0,\sigma^2) \]

and the following apply as well

  • autocovariance: \[ \gamma_k = \begin{cases} \sigma^2 & \text{if } k = 0 \\ 0 & \text{if } k \geq 1 \end{cases} \]
  • autocorrelation: \[ \rho_k = \begin{cases} 1 & \text{if } k = 0 \\ 0 & \text{if } k \geq 1 \end{cases} \]

Random walk (RW)

A time series \(\{x_t\}\) is a random walk if

  1. \(x_t = x_{t-1} + w_t\)
  2. \(w_t\) is white noise

Biased random walk

A biased random walk (or random walk with drift) is written as

\[ x_t = x_{t-1} + u + w_t \]

where \(u\) is the bias (drift) per time step and \(w_t\) is white noise

Differencing a biased random walk

First-differencing a biased random walk yields a constant mean (level) \(u\) plus white noise

\[ \begin{align} x_t &= x_{t-1} + u + w_t \\ x_t - x_{t-1} &= x_{t-1} - x_{t-1} + u + w_t \\ x_t - x_{t-1} &= u + w_t \\ \Delta x_t &= u + w_t \end{align} \]


Autoregressive (AR) models

Autoregressive models treat a current state of nature as a function its past state(s)

An autoregressive model of order p, or AR(p), is defined as

\[ x_t = \phi_1 x_{t-1} + \phi_2 x_{t-2} + \dots + \phi_p x_{t-p} + w_t \]

where we assume

  1. \(w_t\) is white noise
  2. \(\phi_p \neq 0\) for an order-p process

Examples of AR(p) models

  • AR(1)
    • \(x_t = 0.5 x_{t-1} + w_t\)
  • AR(1) with \(\phi_1 = 1\) (random walk)
    • \(x_t = x_{t-1} + w_t\)
  • AR(2)
    • \(x_t = -0.2 x_{t-1} + 0.4 x_{t-2} + w_t\)

Stationary AR(p) models

Stationarity means ‘not changing in time’ in the context of time-series models. More generally, if all statistical properties of a time-series are time-constant then a time series is ‘stationary’.

Specifically, stationary time series have the following properties

  1. no systematic change in the mean or variance
  2. no systematic trend
  3. no periodic variations or seasonality

We seek a means for identifying whether our AR(p) models are also stationary

We can write out an AR(p) model using the backshift operator

\[ x_t = \phi_1 x_{t-1} + \phi_2 x_{t-2} + \dots + \phi_p x_{t-p} + w_t \\ \Downarrow \\ \begin{align} x_t - \phi_1 x_{t-1} - \phi_2 x_{t-2} - \dots - \phi_p x_{t-p} &= w_t \\ (1 - \phi_1 \mathbf{B} - \phi_2 \mathbf{B}^2 - \dots - \phi_p \mathbf{B}^p) x_t &= w_t \\ \phi_p (\mathbf{B}) x_t &= w_t \\ \end{align} \]

If we treat \(\mathbf{B}\) as a number (or numbers), we can out write the characteristic equation as

\[ \phi_p (\mathbf{B}) x_t = w_t \\ \Downarrow \\ \phi_p (\mathbf{B}) = 0 \]

To be stationary, all roots of the characteristic equation must exceed 1 in absolute value

Example: Consider the AR(1) model from earlier

\[ \begin{align} x_t &= 0.5 x_{t-1} + w_t \\ x_t - 0.5 x_{t-1} &= w_t \\ (1 - 0.5 \mathbf{B})x_t &= w_t \\ \Downarrow \\ 1 - 0.5 \mathbf{B} &= 0 \\ -0.5 \mathbf{B} &= -1 \\ \mathbf{B} &= 2 \\ \end{align} \]

This model is stationary because \(\mathbf{B} > 1\)

What about this AR(2) model from earlier?

\[ \begin{align} x_t &= -0.2 x_{t-1} + 0.4 x_{t-2} + w_t \\ x_t + 0.2 x_{t-1} - 0.4 x_{t-2} &= w_t \\ (1 + 0.2 \mathbf{B} - 0.4 \mathbf{B}^2)x_t &= w_t \\ \Downarrow \\ 1 + 0.2 \mathbf{B} - 0.4 \mathbf{B}^2 &= 0 \\ \Downarrow \\ \mathbf{B} \approx -1.35 ~ \text{and}& ~ \mathbf{B} \approx 1.85 \end{align} \]

This model is not stationary because only one \(\mathbf{B} > 1\)

What about random walks?

Consider our random walk model

\[ \begin{align} x_t &= x_{t-1} + w_t \\ x_t - x_{t-1} &= w_t \\ (1 - 1 \mathbf{B})x_t &= w_t \\ \Downarrow \\ 1 - 1 \mathbf{B} &= 0 \\ -1 \mathbf{B} &= -1 \\ \mathbf{B} &= 1 \\ \end{align} \]

Random walks are not stationary because \(\mathbf{B} = 1 \ngtr 1\)

Coefficients of AR(1) models

Same value, but different sign

Both positive, but different magnitude


ACF & PACF for AR(p) models

Autocorrelation function (ACF)

Recall that the autocorrelation function (\(\rho_k\)) measures the correlation between \(\{x_t\}\) and a shifted version of itself \(\{x_{t+k}\}\)

ACF for AR(1) models

ACF oscillates for model with \(-\phi\)

For model with large \(\phi\), ACF has longer tail

Partial autocorrelation funcion (PACF)

Recall that the partial autocorrelation function (\(\phi_k\)) measures the correlation between \(\{x_t\}\) and a shifted version of itself \(\{x_{t+k}\}\), with the linear dependence of \(\{x_{t-1},x_{t-2},\dots,x_{t-k-1}\}\) removed

PACF for AR(p) models

Do you see the link between the order p and lag k?

Using ACF & PACF for model ID (1)

Model ACF PACF
AR(p) Tails off slowly Cuts off after lag p

Moving average (MA) models

Moving average models are most commonly used for forecasting a future state

A moving average model of order q, or MA(q), is defined as

\[ x_t = w_t + \theta_1 w_{t-1} + \theta_2 w_{t-2} + \dots + \theta_q w_{t-q} \]

where \(w_t\) is white noise

Each of the \(x_t\) is a sum of the most recent error terms

A moving average model of order q, or MA(q), is defined as

\[ x_t = w_t + \theta_1 w_{t-1} + \theta_2 w_{t-2} + \dots + \theta_q w_{t-q} \]

where \(w_t\) is white noise

Each of the \(x_t\) is a sum of the most recent error terms

Thus, all MA processes are stationary because they are finite sums of stationary WN processes

Examples of MA(q) models
Compare MA(1) & MA(2) with similar structure

AR(p) model as an MA(\(\infty\)) model

It is possible to write an AR(p) model as an MA(\(\infty\)) model

For example, consider an AR(1) model

\[ \begin{align} x_t &= \phi x_{t-1} + w_t \\ x_t &= \phi (\phi x_{t-2} + w_{t-1}) + w_t \\ x_t &= \phi^2 x_{t-2} + \phi w_{t-1} + w_t \\ x_t &= \phi^3 x_{t-3} + \phi^2 w_{t-2} + \phi w_{t-1} + w_t \\ & \Downarrow \\ x_t &= w_t + \phi w_{t-1}+ \phi^2 w_{t-2} + \dots + \phi^k w_{t-k} + \phi^{k+1} x_{t-k-1} \end{align} \]

If our AR(1) model is stationary, then

\[ \lvert \phi \rvert < 1 ~ \Rightarrow ~ \lim_{k \to \infty} \phi^{k+1} = 0 \]

so

\[ \begin{align} x_t &= w_t + \phi w_{t-1}+ \phi^2 w_{t-2} + \dots + \phi^k w_{t-k} + \phi^{k+1} x_{t-k-1} \\ & \Downarrow \\ x_t &= w_t + \phi w_{t-1}+ \phi^2 w_{t-2} + \dots + \phi^k w_{t-k} \end{align} \] \[ \]

Invertible MA(q) models

An MA(q) process is invertible if it can be written as a stationary autoregressive process of infinite order without an error term

For example, consider an MA(1) model

\[ \begin{align} x_t &= w_t + \theta w_{t-1} \\ & \Downarrow \\ w_t &= x_t - \theta w_{t-1} \\ w_t &= x_t - \theta (x_{t-1} - \theta w_{t-2}) \\ w_t &= x_t - \theta x_{t-1} - \theta^2 w_{t-2} \\ & ~~\vdots \\ w_t &= x_t - \theta x_{t-1} + \dots + (-\theta)^k x_{t-k} + (-\theta)^{k+1} w_{t-k-1} \\ \end{align} \]

If we constrain \(\lvert \theta \rvert < 1\), then

\[ \lim_{k \to \infty} (-\theta)^{k+1} w_{t-k-1} = 0 \]

and

\[ \begin{align} w_t &= x_t - \theta x_{t-1} + \dots + (-\theta)^k x_{t-k} + (-\theta)^{k+1} w_{t-k-1} \\ & \Downarrow \\ w_t &= x_t - \theta x_{t-1} + \dots + (-\theta)^k x_{t-k} \\ w_t &= x_t + \sum_{k=1}^\infty(-\theta)^k x_{t-k} \end{align} \]

Q: Why do we care if an MA(q) model is invertible?

A: It helps us identify the model’s parameters

Example, these MA(1) models are equivalent
  • \(x_t = w_t + \frac{1}{5} w_{t-1}, ~\text{with} ~w_t \sim ~\text{N}(0,25)\)
  • \(x_t = w_t + 5 w_{t-1}, ~\text{with} ~w_t \sim ~\text{N}(0,1)\)

But we select the first model because it is invertable (the coefficient is less than 1).

ACF & PACF for MA(q) models

ACF for MA(q) models

Do you see the link between the order q and lag k?

Using ACF & PACF for model ID (2)

Model ACF PACF
AR(p) Tails off slowly Cuts off after lag p
MA(q) Cuts off after lag q Tails off slowly

Autoregressive moving average models (ARMA)

An autoregressive moving average, or ARMA(p,q), model is written as

\[ x_t = \phi_1 x_{t-1} + \dots + \phi_p x_{t-p} + w_t + \theta_1 w_{t-1} + \dots + \theta_q w_{t-q} \]

We can write an ARMA(p,q) model using the backshift operator

\[ \phi_p (\mathbf{B}) x_t= \theta_q (\mathbf{B}) w_t \]

ARMA models are stationary if all roots of \(\phi_p (\mathbf{B}) > 1\)

ARMA models are invertible if all roots of \(\theta_q (\mathbf{B}) > 1\)

Examples of ARMA(p,q) models

ACF for ARMA(p,q) models

PACF for ARMA(p,q) models

Using ACF & PACF for model ID (3)

Model ACF PACF
AR(p) Tails off slowly Cuts off after lag p
MA(q) Cuts off after lag q Tails off slowly
ARMA(p,q) Tails off slowly Tails off slowly

Autoregressive integrated moving average (ARIMA) models

If the data do not appear stationary, differencing can help

This leads to the class of autoregressive integrated moving average (ARIMA) models

ARIMA models are indexed with orders (p,d,q) where d indicates the order of differencing

ARIMA(p,d,q) models

For \(d > 0\), \(\{x_t\}\) is an ARIMA(p,d,q) process if \((1-\mathbf{B})^d x_t\) is an ARMA(p,q) process

For example, if \(\{x_t\}\) is an ARIMA(1,1,0) process then \(\Delta \{x_t\}\) is an ARMA(1,0) = AR(1) process


Automating the Box-Jenkins method

Box-Jenkins Method

  • Model form selection
    • Evaluate stationarity
    • Selection of the differencing level (d) – to fix stationarity problems
    • Selection of the AR level (p)
    • Selection of the MA level (q)
  • Parameter estimation
  • Model checking

Fortunately, the Box-Jenkins method is automated with the forecast package function forecast::auto.arima.

forecast::auto.arima

Let’s look at some examples of auto.arima in action using simulated data.

But auto.arima isn’t a cure-all. Let’s try a different simulation with a more complicated model. We’ll see that auto.arima, like any automated procedure, should be used with caution.

Of course, this is simulated data. To be fair to auto.arima, I should run multiple distince simulations and see how often the auto.arima function correctly gets the model order and how well it estimates the model coefficients.

However, keep in mind, when it comes to working with real time series data one only gets to observe a single realized state of the world. Tread carefully.


Seasonal ARIMA (SARIMA)

where \(m\) is the number of observations per year.

Recall that seasonality violates stationarity. Let’s first try to remove the seasonality by taking the first seasonal difference (\(m=4\) so the seasonal diff is on the 4th lag).

Still non-stationary. Let’s also take the first difference

auto.arima with Seasonal data


Dynamic Linear Regression (Linear Models with ARIMA errors)

The problem is that to forecast beyond observed data requires forecasting the regressors xreg.

Instead of forecasting the expected outcome with a confidence interval, let’s simulate 30 income forecasts, estimate the expected consumption forecasts, and layer them on top of each other.

A Rant About Maximum Likelihood Estimation

ARIMA estimation relies on Maximum Likelihood Estimate (MLE). This matters because it means, as you get new data, parameters are re-estimated using the old series plus the new.

Below is an example where we split simulated ARIMA data into \(4\) even parts.

The true model is ARIMA(2,0,). You see below that the model fit to each expanding series has not only a changing order (p,d,q), but also changing coefficients.

This is largely a due to how forecast::auto.arima searches for and determines the best fitting model.

But even when we impose the correct model order, ARIMA(2,0,1), the coefficient estimates will differ but certainly overlap in their standard errors.