At a very basic level, a time series is a set of observations taken sequentially in time. It is different than non-temporal data because each data point has an order and is, typically, related to the data points before and after by some process.
A ts can be represented as a set
\[ \{ x_1,x_2,x_3,\dots,x_n \} \]
For example, \[ \{ 10,31,27,42,53,15 \} \]
data(WWWusage)
par(mai = c(0.9,0.9,0.1,0.1), omi = c(0,0,0,0))
plot.ts(WWWusage, ylab = "", las = 1, col = "blue", lwd = 2)
Number of users connected to the internet
data(lynx)
par(mai = c(0.9,0.9,0.1,0.1), omi = c(0,0,0,0))
plot.ts(lynx, ylab = "", las = 1, col = "blue", lwd = 2)
Number of lynx trapped in Canada from 1821-1934
Interval across real time; \(x(t)\)
Discrete time; \(x_t\)
Discrete (eg, total # of fish caught per trawl)
Continuous (eg, salinity, temperature)
Univariate/scalar (eg, total # of fish caught)
Multivariate/vector (eg, # of each spp of fish caught)
Integer (eg, # of fish in 5 min trawl = 2413)
Rational (eg, fraction of unclipped fish = 47/951)
Real (eg, fish mass = 10.2 g)
Complex (eg, cos(2π2.43) + i sin(2π2.43))
Most statistical analyses are concerned with estimating properties of a population from a sample. For example, we use fish caught in a seine to infer the mean size of fish in a lake. Time series analysis, however, presents a different situation:
For example, one can’t observe today’s closing price of Microsoft stock more than once. Thus, conventional statistical procedures, based on large sample estimates, are inappropriate.
if (!require("quantmod")) {
install.packages("quantmod")
library(quantmod)
}
## Loading required package: quantmod
## Loading required package: xts
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: TTR
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
start <- as.Date("2016-01-01")
end <- as.Date("2016-10-01")
getSymbols("MSFT", src = "yahoo", from = start, to = end)
## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
##
## This message is shown once per session and may be disabled by setting
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.
## [1] "MSFT"
plot(MSFT[, "MSFT.Close"], main = "MSFT")
We use a time series model to analyze time series data.
A time series model for \(\{x_t\}\) is a specification of the joint distributions of a sequence of random variables \(\{X_t\}\), of which \(\{x_t\}\) is thought to be a realization.
Here is a plot of many realizations from a time series model.
These lines represent the distribution of possible realizations. However, we have only one realization.
White noise: \(x_t \sim N(0,1)\)
par(mai = c(0.9,0.9,0.1,0.1), omi = c(0,0,0,0))
matplot(ww, type="l", lty="solid", las = 1,
ylab = expression(italic(x[t])), xlab = "Time",
col = gray(0.5, 0.4))
Random walk: \(x_t = x_{t-1} + w_t,~\text{with}~w_t \sim N(0,1)\)
par(mai = c(0.9,0.9,0.1,0.1), omi = c(0,0,0,0))
matplot(apply(ww, 2, cumsum), type="l", lty="solid", las = 1,
ylab = expression(italic(x[t])), xlab = "Time",
col = gray(0.5, 0.4))
\(x_t = m_t + s_t + e_t\)
We need a way to extract the so-called signal. One common method is via “linear filters”
\[ m_t = \sum_{i=-\infty}^{\infty} \lambda_i x_{t+1} \]
For example, a moving average
\[ m_t = \sum_{i=-a}^{a} \frac{1}{2a + 1} x_{t+i} \]
If \(a = 1\), then
\[ m_t = \frac{1}{3}(x_{t-1} + x_t + x_{t+1}) \]
Here is a time series.
Monthly airline passengers from 1949-1960
A linear filter with \(a=3\) closely tracks the data.
Monthly airline passengers from 1949-1960
As we increase the length of data that is averaged from 1 on each side (\(a=3\)) to 4 on each side (\(a=9\)), the trend line is smoother.
Monthly airline passengers from 1949-1960
When we increase up to 13 points on each side (\(a=27\)), the trend line is very smooth.
Monthly airline passengers from 1949-1960
Once we have an estimate of the trend \(m_t\), we can estimate \(s_t\) simply by subtraction:
\[ s_t = x_t - m_t \]
This is the seasonal effect (\(s_t\)), assuming \(\lambda = 1/9\), but, \(s_t\) includes the remainder \(e_t\) as well. Instead we can estimate the mean seasonal effect (\(s_t\)).
seas_2 <- decompose(xx)$seasonal
par(mai = c(0.9,0.9,0.1,0.1), omi = c(0,0,0,0))
plot.ts(seas_2, las = 1, ylab = "")
Now we can estimate \(e_t\) via subtraction:
\[ e_t = x_t - m_t - s_t \]
ee <- decompose(xx)$random
par(mai = c(0.9,0.9,0.1,0.1), omi = c(0,0,0,0))
plot.ts(ee, las = 1, ylab = "")
Let’s repeat the decomposition with the log of the airline data.
lx <- log(AirPassengers)
par(mai = c(0.9,0.9,0.1,0.1), omi = c(0,0,0,0))
plot.ts(lx, las = 1, ylab = "")
Log monthly airline passengers from 1949-1960
le <- lx - pp - seas_2
par(mai = c(0.9,0.9,0.1,0.1), omi = c(0,0,0,0))
plot.ts(le, las = 1, ylab = "")