Seasonal ARIMA Models

Visit my website for more like this! I would love to hear your feedback.

Data Sources:

Textbook: Time Series Analysis and It’s Application
Wikipedia
Online course at Penn State

Included in this tutorial

Seasonal ARIMA models
Model Selection and Forecasting

## Loading required package: knitr
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Loading required package: timeDate
## This is forecast 5.4

Seasonality refers to a regular pattern of changes that repeat for S time periods, where S defines the number of timer periods until the pattern repeats again.

In a seasonal ARIMA model, AR and MA terms predict x_t using data values and errors at times with lags that are multiples of S.

With monthly data and an annual trend (S = 12), a seasonal first order autoregressive model would use x_{t - 12} to predict x_t. For example, if we were selling ice cream, we might predict August sales using last years August sales. Similarly, you could use the past two Augusts and include x_{t - 24}.

Seasonality, of course, usually causes the time series to be nonstationary. Seasonal differencing is defined as a difference between a value and a value with lag that is a multiple of S.

Thus, seasonal differencing removes a seasonal trend and can also get rid of a seasonal random walk (another type of nonstationarity).

However, if an overall trend is present in the data, we may also need non-seasonal differencing. Often a first difference will detrend the data.

It is important to note that non-seasonal behavior will still matter in seasonal models. That is, it is likely that short run non-seasonal components will still contribute to the model. For ice cream sames, it could be that the sames from last August and this July produce the best model.

The seasonal ARIMA Model

This model incorporates both seasonal and non-seasonal factors in a multiplicative model.

\[ARIMA(p, d, q)*(P, D, Q)S\]

where the capital P, D, and Q are the seasonal components of the AR, differencing, and MA components.

Example 1: ARIMA(0, 0, 1) x (0, 0, 1)₁₂

The model includes seasonal and non-seasonal MA(1) terms, no differencing and no AR terms, and the seasonal period is S = 12. Thus this model has MA terms at lags 1, 12, and 13. Also there is a non-zero autocorrelation at lag 11.

arima.m<-arima.sim(list(order = c(0,0,12), ma = c(0.7,rep(0,10),0.9)), n = 200)
acf(arima.m)

plot of chunk unnamed-chunk-2

Example 2: ARIMA(1, 0, 0) x (1, 0, 0)₁₂

A seasonal AR(1) model has significant lags at 1, 12 and 13, with an increasing trend before lag 12, sharply cut off at lag 13.

ar_pacf<-ARMAacf (ar = c(.6,0,0,0,0,0,0,0,0,0,0,.5,-.30),lag.max=30,pacf=T)
plot(ar_pacf, type='h')

plot of chunk unnamed-chunk-3

How to identify a seasonal model

Do a time series plot of the data. Examine it for global trends and seasonality.
Do any necessary differencing…
- if there is seasonality and no trend take a difference of lag S. For example, take a 12th difference for monthly data with seasonality.
- If there is a linear trend and no obvious seasonality, take a first difference. If there is a curved trend, consider a transformation of the data before differencing.
- If there is both trend and seasonality, apply both a non-seasonal and seasonal difference to the data, as two successive operations. For example:

diff1 <- diff(x, 1)
diff1_and_12 <- diff(diff1, 12)

* If there is no obvious trend or seasonality, do not take any differences.

Examine the ACF and PACF of the differenced data (if necessary). We can begin to make some basic guesses about the most appropriate model at this time.
- non-seasonal terms: Examine the early labs(1, 2, 3, …) to judge non-seasonal terms. Spikes in the ACF (at low lags) indicate non-seasonal MA terms. Spikes in the PAC (at low lags) indicated possible non-seasonal AR terms.
- Seasonal terms: Examine the patterns across lags that are multiples of S. For example, for monthly data, look at lags 12, 24, 36 (probably wont need to look much past the first two or three seasonal multiples). Judge the ACF and PACF at the seasonal lags in the same way you do for the earlier lags.
Estimate the model(s) that might be reasonable for the data based on the previous steps.
Examine the residuals (with ACF, Box-Pierce, and other means) to see if the model seems good. Compare AIC or BIC values to determine the best of several models.

Full example

Time series plot of 144 consecutive months on the colorado river

source(url("http://lib.stat.cmu.edu/general/tsa2/Rcode/itall.R"))

##   itall has been installed

dat<-(read.csv('colorado_river.csv'))

plot(dat[,3], type='o')

plot of chunk unnamed-chunk-5

Without having some experience with the data, it is difficult to identify seasonality trends here. If this was your job your would probably know that river flow is indeed seasonal, with perhaps higher flows in late spring and early summer, due to snow runoff.

With this in mind, we could aggregate the data by month to better understand this trend.

require(ggplot2)

ggplot(dat, aes(x=month, y=flow/1000))+
  stat_summary(geom = 'line', fun.y='mean')+ # take the mean of each month
  scale_x_discrete(breaks=seq(1,12,1), labels=seq(1,12,1))+
  theme_bw()+ # add a little style
  facet_wrap(~year) # visualize year by year

## Warning: closing unused connection 5
## (http://lib.stat.cmu.edu/general/tsa2/Rcode/itall.R)

plot of chunk unnamed-chunk-6

Further, looking back at the first plot, it is difficult to see if there is a global trend, if any, it is slight.

Since we hypothesize that there is seasonality, we can take the seasonal difference (create a variable that gives the 12TH differences), then look at the ACF and PACF.

data<-ts(dat[1:72,][,3])

diff_12 <- diff(data, 12)
acf2(diff_12, 48)

plot of chunk unnamed-chunk-7

Fitting the model

Seasonal behavior: we see that for both the ACF and PACF we have significant autocorrelation at seasonal (12, 24, 36) lags. The ACF has a cluster around 12, and not much else besides a tapering pattern throughout. Further, the PACF also has spikes on two multiples of S, AR(2)

Non-seasonal behavior: The PACF shows a clear spike at lag 1, and not much else till lag 11. The ACF also has a tapering pattern in the early lags (and seasonal lags). This is indicative of an AR(1) component.

Let’s try out the model… ARIMA (1,0, 0) x (2, 1, 0)₁₂

data<-ts(data, freq=12)

mod1<-sarima(data, 1,0,0,2,1,0,12)

## initial  value 10.165417 
## iter   2 value 9.844064
## iter   3 value 9.816870
## iter   4 value 9.779773
## iter   5 value 9.769839
## iter   6 value 9.767267
## iter   7 value 9.766853
## iter   8 value 9.766852
## iter   9 value 9.766852
## iter   9 value 9.766852
## iter   9 value 9.766852
## final  value 9.766852 
## converged
## initial  value 9.722784 
## iter   2 value 9.719603
## iter   3 value 9.711661
## iter   4 value 9.709954
## iter   5 value 9.709280
## iter   6 value 9.709266
## iter   7 value 9.709265
## iter   7 value 9.709265
## iter   7 value 9.709265
## final  value 9.709265 
## converged

plot of chunk unnamed-chunk-8

# Using standard R functions
mod2<-Arima(data,order=c(1, 0, 0),
            seasonal=list(order=c(2, 1, 0), period=12))
mod2

## Series: data 
## ARIMA(1,0,0)(2,1,0)[12]                    
## 
## Coefficients:
##         ar1    sar1    sar2
##       0.281  -0.792  -0.193
## s.e.  0.134   0.132   0.168
## 
## sigma^2 estimated as 2.38e+08:  log likelihood=-667.8
## AIC=1344   AICc=1344   BIC=1352

We can see from the diagnostics that this is indeed a decent fit. What doesn’t look good is the residuals vs. the fitted values, which reveals strong non-constant variance. We’ve got three choices for what to do about the non-constant variance: (1) ignore it, (2) go back to step 1 and try a variance stabilizing transformation like log or square root, or (3) use an ARCH model that includes a component for changing variances. We’ll get to ARCH models later.

In the second plot below, we see that we can easily show how close our predictions were to our observed data.

plot(fitted(mod2), mod2$residuals)

plot of chunk unnamed-chunk-9

plot(mod2$x, col='red')
lines(fitted(mod2), col='blue')

plot of chunk unnamed-chunk-9

Forecasting

Now that we have a reasonable prediction, we can forecast the model, say 24 months into the future.

sarima.for(data, 24, 1,0,0,2,1,0,12)

plot of chunk unnamed-chunk-10

## $pred
##      Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct
## 7  19280  20376  33259  31600  90175 135176  74347  32455  24462  26000
## 8  16644  22494  32056  29896  86926 106695  52426  29368  21679  24036
##      Nov    Dec
## 7  22918  17067
## 8  20628  16596
## 
## $se
##     Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec
## 7 17238 17892 17941 17945 17945 17945 17945 17945 17945 17945 17945 17945
## 8 18307 18334 18337 18337 18337 18337 18337 18337 18337 18337 18337 18337

#Using native R commands

predict(mod2, n.ahead=24)

## $pred
##      Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct
## 7  20255  21600  34566  32927  91480 136741  75871  33820  25826  27353
## 8  18181  24105  33671  31515  88541 108213  53968  30981  23294  25654
##      Nov    Dec
## 7  24271  18410
## 8  22245  18220
## 
## $se
##     Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec
## 7 15434 16030 16076 16079 16080 16080 16080 16080 16080 16080 16080 16080
## 8 16396 16420 16422 16422 16422 16422 16422 16422 16422 16422 16422 16422