Visit my website for more like this! I would love to hear your feedback (seriously).
library(TSA, quietly=TRUE, warn.conflicts=FALSE)
## Error: there is no package called 'TSA'
require(knitr)
library(ggplot2)
Heavily borrowed from: Textbook: Time Series Analysis and It's Application
The next few examples are representations of common types of time series. We then defined some terms and theory that are useful to understand prior to moving onto real analysis.
Consider some white noise, smoothed by a moving average using filter()
. This smoother representation eliminates some of the faster oscillations, and leaves us with a more representative trend.
set.seed(1122)
d<-rnorm(500, 0, 1) # 500 samples between 0, 1
D<-filter(d, sides=2, rep(1/3, 3))
par(mfrow=c(2,1))
plot.ts(d, main='White Noise')
plot.ts(D, main='Moving Average')
Create a prediction of the current value of x
as a function of the previous values x - t
. Auto-regressive models, and other similar generalizations can be used as an underlying model for many time series data.
R
:# Add an extra 50 values for boundary effect
d = rnorm(550, 0, 1)
D = filter(d, filter=c(1,-0.9), method='recursive')[-(1:50)]
par(mfrow=c(2,1))
plot.ts(d, main='White Noise')
plot.ts(D, main='Autoregression')
When the drift parameter = 0
, the value of time series at time t
is the value of the series at time t - 1
plus a completely random movement determined by white noise. Here we plot two lines,
Black: drift = 0
Red: drift = 0.2
set.seed(154)
d = rnorm(200, 0, 1); x = cumsum(d)
D = d + 0.2; Dsum = cumsum(D)
plot.ts(Dsum, ylim=c(-5,55), main='Random Walk', ylab='y')
lines(x, col='red'); lines(0.2*(1:200), lty='dashed')
Most time series are composed of an underlying signal with some constant periodic variation, and a random error (noise) term. Generally, we are presented with data that show the signal obscured by noise. The purpose of many time series models is to decompose the time series to understand the underlying trend.
# A simple cosin wave
cs = 3*cos(2*pi*1:500/50 + 0.6*pi)
# Some random noise
noise = rnorm(500, 0, 1)
The ratio of amplitude of the signal to error is called the signal-to-noise ratio (SNR); the larger the SNR, the easier it is to detect the signal. Here, we can easily understand the signal in the second panel, but would have a hard time confidently explaining the third panel.
par(mfrow=c(3,1), mar=c(3,2,2,1), cex.main=1.5)
plot.ts(cs, main=expression(3*cos(2*pi*t/50 + 0.6*pi)))
plot.ts(cs+noise, main=expression(3*cos(2*pi*t/50 + 0.6*pi) + N(0, 1)))
plot.ts(cs+noise*5, main=expression(3*cos(2*pi*t/50 + 0.6*pi) + N(0, 25)))
These simple additive models are some of the most common, and takes the form
\[xt = st + vt\]
where st
denotes an unknown signal, and vt
denotes a white noise or correlated error term.
Since correlation is such an essential component of time series analysis, the best descriptive statistics are expressed in terms of co-variance and correlation.
The Autocorrelation function (ACF) is defined as: \[ρ(s,t) = γ(s,s)γ(t,t)\] This function measures the cross-correlation of a signal with itself. Simply, it is the similarity between observations as a function of the time lag between them.
Cross-correlation is a measure of similarity between two time series as a function of a time lag applied to one of them.
Stationarity, or a stationary process is a stochastic process whose joint probability distribution does not change when shifted in time. Consequently, the mean and variance, if present, do not change over time or follow any trends. In time series analysis, we often have to transform raw data to a stationary process to satisfy the assumptions of time series analysis models and functions. This definition of stationarity is known as strict stationarity, and is generally too strong for most modeling applications. Thus most analysis utilize a milder version called weak stationarity.
Weak Stationarity only requires the mean
and covariance
to remain constant with respect to time. We will now reference weak stationarity as simply, stationary.
See next chapter, here