Introduction to time series

What is a time series?

At a very basic level, a time series is a set of observations taken sequentially in time. It is different than non-temporal data because each data point has an order and is, typically, related to the data points before and after by some process.

A ts can be represented as a set

\[ \{ x_1,x_2,x_3,\dots,x_n \} \]

For example, \[ \{ 10,31,27,42,53,15 \} \]

Examples of time series

data(WWWusage)
par(mai = c(0.9,0.9,0.1,0.1), omi = c(0,0,0,0))
plot.ts(WWWusage, ylab = "", las = 1, col = "blue", lwd = 2)
Number of users connected to the internet

Number of users connected to the internet

data(lynx)
par(mai = c(0.9,0.9,0.1,0.1), omi = c(0,0,0,0))
plot.ts(lynx, ylab = "", las = 1, col = "blue", lwd = 2)
Number of lynx trapped in Canada from 1821-1934

Number of lynx trapped in Canada from 1821-1934

Classification of time series | By some index set

Interval across real time; \(x(t)\)

  • begin/end: \(t \in [1.1,2.5]\)

Discrete time; \(x_t\)

  • Equally spaced: \(t = \{1,2,3,4,5\}\)
  • Equally spaced w/ missing value: \(t = \{1,2,4,5,6\}\)
  • Unequally spaced: \(t = \{2,3,4,6,9\}\)

Classification of time series | By the underlying process

Discrete (eg, total # of fish caught per trawl)

Continuous (eg, salinity, temperature)

Classification of time series | By the number of values recorded

Univariate/scalar (eg, total # of fish caught)

Multivariate/vector (eg, # of each spp of fish caught)

Classification of time series | By the type of values recorded

Integer (eg, # of fish in 5 min trawl = 2413)

Rational (eg, fraction of unclipped fish = 47/951)

Real (eg, fish mass = 10.2 g)

Complex (eg, cos(2π2.43) + i sin(2π2.43))

Statistical analyses of time series

Most statistical analyses are concerned with estimating properties of a population from a sample. For example, we use fish caught in a seine to infer the mean size of fish in a lake. Time series analysis, however, presents a different situation:

  • Although we could vary the length of an observed time series, it is often impossible to make multiple observations at a given point in time

For example, one can’t observe today’s closing price of Microsoft stock more than once. Thus, conventional statistical procedures, based on large sample estimates, are inappropriate.

if (!require("quantmod")) {
    install.packages("quantmod")
    library(quantmod)
}
## Loading required package: quantmod
## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: TTR
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
start <- as.Date("2016-01-01")
end <- as.Date("2016-10-01")
getSymbols("MSFT", src = "yahoo", from = start, to = end)
## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
## 
## This message is shown once per session and may be disabled by setting 
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.
## [1] "MSFT"
plot(MSFT[, "MSFT.Close"], main = "MSFT")

We use a time series model to analyze time series data.

What is a time series model?

A time series model for \(\{x_t\}\) is a specification of the joint distributions of a sequence of random variables \(\{X_t\}\), of which \(\{x_t\}\) is thought to be a realization.

Here is a plot of many realizations from a time series model.

These lines represent the distribution of possible realizations. However, we have only one realization.

Two simple and classic time series models

White noise: \(x_t \sim N(0,1)\)

par(mai = c(0.9,0.9,0.1,0.1), omi = c(0,0,0,0))
matplot(ww, type="l", lty="solid",  las = 1,
        ylab = expression(italic(x[t])), xlab = "Time",
        col = gray(0.5, 0.4))

Random walk: \(x_t = x_{t-1} + w_t,~\text{with}~w_t \sim N(0,1)\)

par(mai = c(0.9,0.9,0.1,0.1), omi = c(0,0,0,0))
matplot(apply(ww, 2, cumsum), type="l", lty="solid",  las = 1,
        ylab = expression(italic(x[t])), xlab = "Time",
        col = gray(0.5, 0.4))

Classical decomposition

Model time series \(\{x_t\}\) as a combination of

  1. trend (\(m_t\))
  2. seasonal component (\(s_t\))
  3. remainder (\(e_t\))

\(x_t = m_t + s_t + e_t\)

Classical decomposition | 1. The trend (\(m_t\))

We need a way to extract the so-called signal. One common method is via “linear filters”

\[ m_t = \sum_{i=-\infty}^{\infty} \lambda_i x_{t+1} \]

For example, a moving average

\[ m_t = \sum_{i=-a}^{a} \frac{1}{2a + 1} x_{t+i} \]

If \(a = 1\), then

\[ m_t = \frac{1}{3}(x_{t-1} + x_t + x_{t+1}) \]

Example of linear filtering

Here is a time series.

Monthly airline passengers from 1949-1960

Monthly airline passengers from 1949-1960

A linear filter with \(a=3\) closely tracks the data.

Monthly airline passengers from 1949-1960

Monthly airline passengers from 1949-1960

As we increase the length of data that is averaged from 1 on each side (\(a=3\)) to 4 on each side (\(a=9\)), the trend line is smoother.

Monthly airline passengers from 1949-1960

Monthly airline passengers from 1949-1960

When we increase up to 13 points on each side (\(a=27\)), the trend line is very smooth.

Monthly airline passengers from 1949-1960

Monthly airline passengers from 1949-1960

Classical decomposition | 2. Seasonal effect (\(s_t\))

Once we have an estimate of the trend \(m_t\), we can estimate \(s_t\) simply by subtraction:

\[ s_t = x_t - m_t \]

This is the seasonal effect (\(s_t\)), assuming \(\lambda = 1/9\), but, \(s_t\) includes the remainder \(e_t\) as well. Instead we can estimate the mean seasonal effect (\(s_t\)).

seas_2 <- decompose(xx)$seasonal
par(mai = c(0.9,0.9,0.1,0.1), omi = c(0,0,0,0))
plot.ts(seas_2, las = 1, ylab = "")

Classical decomposition | 3. Remainder (\(e_t\))

Now we can estimate \(e_t\) via subtraction:

\[ e_t = x_t - m_t - s_t \]

ee <- decompose(xx)$random
par(mai = c(0.9,0.9,0.1,0.1), omi = c(0,0,0,0))
plot.ts(ee, las = 1, ylab = "")

Log-transformed data

Let’s repeat the decomposition with the log of the airline data.

lx <- log(AirPassengers)
par(mai = c(0.9,0.9,0.1,0.1), omi = c(0,0,0,0))
plot.ts(lx, las = 1, ylab = "")
Log monthly airline passengers from 1949-1960

Log monthly airline passengers from 1949-1960

The trend (\(m_t\))

Seasonal effect (\(s_t\)) with error (\(e_t\))

Mean seasonal effect (\(s_t\))

Remainder (\(e_t\))

le <- lx - pp - seas_2
par(mai = c(0.9,0.9,0.1,0.1), omi = c(0,0,0,0))
plot.ts(le, las = 1, ylab = "")