Time Series Analysis 2

Visit my website for more like this! I would love to hear your feedback (seriously).

require(astsa, quietly=TRUE, warn.conflicts=FALSE)
require(knitr)
## Loading required package: knitr
library(ggplot2)

Data Sources:

Heavily borrowed from:

1.0 Stationarity

In general, it is necessary for time series data to be stationary to satisfy the assumption of time series analysis models.

  • Stationarity, or a stationary process is a stochastic process whose joint probability distribution does not change when shifted in time. Consequently, the mean and variance, if present, do not change over time or follow any trends. In time series analysis, we often have to transform raw data to a stationary process to satisfy the assumptions of time series analysis models and functions. This definition of stationary is known as strict stationarity, and is generally too strong for most modeling applications. Thus most analysis utilize a milder version called weak stationarity.

  • Weak Stationarity only requires the mean and covariance to remain constant with respect to time. We will now reference weak stationary as simply, stationary.

Many stationary series have recognizable patterns for their ACF and PACF, which makes them excellent for identifying the type of ARIMA model to use. However, in raw form, most data are not stationary. A continuous downward trend, for example, violates the stationary since the mean is not constant with t. If the mean and variance are changing, it makes prediction very difficult!

If we remove the trend that causes these violations, we can start to understand the factors that are generating the other seasonal or periodic components.

1.1 How to Transform Data to Stationary

To make data stationary, we must remove the trend, thus the most important step in this process is correctly identifying and modeling the trend within the time series. We do this through various curve fitting techniques, such as linear regression, polynomial regression, or more complex curve fitting techniques.

We can see that the Global temperature data from TSA1 is not stationary, as there is a clear increasing trend over time. This is one of the easier non-stationary to work with, whereby the process has stationary behavior around the clear linear trend. \[xt = μt + yt\]

where xt are the observations, μt is the trend, and yt is some random stationary process.

In this case, attaining stationarity can be achieved by generating a reasonable linear estimate of the trend component, and then simply subtract the trend and work just with the residuals (leftover error). However, in many cases, the trend is not linear and has to be fit with alternative methods.

An alternative to curve fitting approaches to remove the trend is first differencing. A non-stationary time series can be made stationary by taking the first (or higher order) differences. The first difference is the time series at time t minus the series at time t - 1. If for example the slope of the mean is also changing with time (quadratic), we can apply the second difference, or the first difference of the first difference.

fit = lm(gtemp~time(gtemp), na.action=NULL) # generate linear model
par(mfrow=c(2, 1))
plot(resid(fit), type='o', main="Detrended")
plot(diff(gtemp), type='o', main='First Difference')

plot of chunk unnamed-chunk-3

To verify that our data has indeed become stationary, we can consult an ACF plot.

par(mfrow=c(3, 1))
acf(gtemp, 48, main='gtemp')
acf(resid(fit), 48, main='detrended')
acf(diff(gtemp), 48, main='first differences')

plot of chunk unnamed-chunk-4

An advantage of differencing over detrending is that no parameters are estimated in the differences operation. However, this benefit is also a disadvantage if you want to utilize an estimate of the trend at some point anyways. However, if the goal is simply to coerce the data to stationarity, differencing may be more appropriate.

Note: not explored here, yet there is also a technique called fractional differencing, whereby we can apply non-linear transformations to coerce the data to stationary. Ex. Log transformation.

We will see that by understanding how to detrend a time series, we gain insight into how we will model it later. For example, we can see that we can eliminate almost all auto correlation in the global temperature data by applying first differences detrending. This may imply that this series is nearly a random walk with drift.