Visit my website for more like this! I would love to hear your feedback.
Online course at Penn State
## Loading required package: knitr
## Loading required package: zoo
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Loading required package: timeDate
## This is forecast 5.4
Seasonality refers to a regular pattern of changes that repeat for S time periods, where S defines the number of timer periods until the pattern repeats again.
In a seasonal ARIMA model, AR and MA terms predict xt using data values and errors at times with lags that are multiples of S.
Seasonality, of course, usually causes the time series to be nonstationary. Seasonal differencing is defined as a difference between a value and a value with lag that is a multiple of S.
Thus, seasonal differencing removes a seasonal trend and can also get rid of a seasonal random walk (another type of nonstationarity).
However, if an overall trend is present in the data, we may also need non-seasonal differencing. Often a first difference will detrend the data.
It is important to note that non-seasonal behavior will still matter in seasonal models. That is, it is likely that short run non-seasonal components will still contribute to the model. For ice cream sames, it could be that the sames from last August and this July produce the best model.
This model incorporates both seasonal and non-seasonal factors in a multiplicative model.
\[ARIMA(p, d, q)*(P, D, Q)S\]
where the capital P, D, and Q are the seasonal components of the AR, differencing, and MA components.
The model includes seasonal and non-seasonal MA(1) terms, no differencing and no AR terms, and the seasonal period is S = 12. Thus this model has MA terms at lags 1, 12, and 13. Also there is a non-zero autocorrelation at lag 11.
arima.m<-arima.sim(list(order = c(0,0,12), ma = c(0.7,rep(0,10),0.9)), n = 200)
acf(arima.m)
A seasonal AR(1) model has significant lags at 1, 12 and 13, with an increasing trend before lag 12, sharply cut off at lag 13.
ar_pacf<-ARMAacf (ar = c(.6,0,0,0,0,0,0,0,0,0,0,.5,-.30),lag.max=30,pacf=T)
plot(ar_pacf, type='h')
Do a time series plot of the data. Examine it for global trends and seasonality.
diff1 <- diff(x, 1)
diff1_and_12 <- diff(diff1, 12)
* If there is no obvious trend or seasonality, do not take any differences.
Examine the ACF and PACF of the differenced data (if necessary). We can begin to make some basic guesses about the most appropriate model at this time.
Estimate the model(s) that might be reasonable for the data based on the previous steps.
Examine the residuals (with ACF, Box-Pierce, and other means) to see if the model seems good. Compare AIC or BIC values to determine the best of several models.
Time series plot of 144 consecutive months on the colorado river
source(url("http://lib.stat.cmu.edu/general/tsa2/Rcode/itall.R"))
## itall has been installed
dat<-(read.csv('colorado_river.csv'))
plot(dat[,3], type='o')
Without having some experience with the data, it is difficult to identify seasonality trends here. If this was your job your would probably know that river flow is indeed seasonal, with perhaps higher flows in late spring and early summer, due to snow runoff.
With this in mind, we could aggregate the data by month to better understand this trend.
require(ggplot2)
ggplot(dat, aes(x=month, y=flow/1000))+
stat_summary(geom = 'line', fun.y='mean')+ # take the mean of each month
scale_x_discrete(breaks=seq(1,12,1), labels=seq(1,12,1))+
theme_bw()+ # add a little style
facet_wrap(~year) # visualize year by year
## Warning: closing unused connection 5
## (http://lib.stat.cmu.edu/general/tsa2/Rcode/itall.R)
Further, looking back at the first plot, it is difficult to see if there is a global trend, if any, it is slight.
Since we hypothesize that there is seasonality, we can take the seasonal difference (create a variable that gives the 12TH differences), then look at the ACF and PACF.
data<-ts(dat[1:72,][,3])
diff_12 <- diff(data, 12)
acf2(diff_12, 48)
Seasonal behavior: we see that for both the ACF and PACF we have significant autocorrelation at seasonal (12, 24, 36) lags. The ACF has a cluster around 12, and not much else besides a tapering pattern throughout. Further, the PACF also has spikes on two multiples of S, AR(2)
Non-seasonal behavior: The PACF shows a clear spike at lag 1, and not much else till lag 11. The ACF also has a tapering pattern in the early lags (and seasonal lags). This is indicative of an AR(1) component.
Let’s try out the model… ARIMA (1,0, 0) x (2, 1, 0)12
data<-ts(data, freq=12)
mod1<-sarima(data, 1,0,0,2,1,0,12)
## initial value 10.165417
## iter 2 value 9.844064
## iter 3 value 9.816870
## iter 4 value 9.779773
## iter 5 value 9.769839
## iter 6 value 9.767267
## iter 7 value 9.766853
## iter 8 value 9.766852
## iter 9 value 9.766852
## iter 9 value 9.766852
## iter 9 value 9.766852
## final value 9.766852
## converged
## initial value 9.722784
## iter 2 value 9.719603
## iter 3 value 9.711661
## iter 4 value 9.709954
## iter 5 value 9.709280
## iter 6 value 9.709266
## iter 7 value 9.709265
## iter 7 value 9.709265
## iter 7 value 9.709265
## final value 9.709265
## converged
# Using standard R functions
mod2<-Arima(data,order=c(1, 0, 0),
seasonal=list(order=c(2, 1, 0), period=12))
mod2
## Series: data
## ARIMA(1,0,0)(2,1,0)[12]
##
## Coefficients:
## ar1 sar1 sar2
## 0.281 -0.792 -0.193
## s.e. 0.134 0.132 0.168
##
## sigma^2 estimated as 2.38e+08: log likelihood=-667.8
## AIC=1344 AICc=1344 BIC=1352
We can see from the diagnostics that this is indeed a decent fit. What doesn’t look good is the residuals vs. the fitted values, which reveals strong non-constant variance. We’ve got three choices for what to do about the non-constant variance: (1) ignore it, (2) go back to step 1 and try a variance stabilizing transformation like log or square root, or (3) use an ARCH model that includes a component for changing variances. We’ll get to ARCH models later.
In the second plot below, we see that we can easily show how close our predictions were to our observed data.
plot(fitted(mod2), mod2$residuals)
plot(mod2$x, col='red')
lines(fitted(mod2), col='blue')
Now that we have a reasonable prediction, we can forecast the model, say 24 months into the future.
sarima.for(data, 24, 1,0,0,2,1,0,12)
## $pred
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 7 19280 20376 33259 31600 90175 135176 74347 32455 24462 26000
## 8 16644 22494 32056 29896 86926 106695 52426 29368 21679 24036
## Nov Dec
## 7 22918 17067
## 8 20628 16596
##
## $se
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 7 17238 17892 17941 17945 17945 17945 17945 17945 17945 17945 17945 17945
## 8 18307 18334 18337 18337 18337 18337 18337 18337 18337 18337 18337 18337
#Using native R commands
predict(mod2, n.ahead=24)
## $pred
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 7 20255 21600 34566 32927 91480 136741 75871 33820 25826 27353
## 8 18181 24105 33671 31515 88541 108213 53968 30981 23294 25654
## Nov Dec
## 7 24271 18410
## 8 22245 18220
##
## $se
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 7 15434 16030 16076 16079 16080 16080 16080 16080 16080 16080 16080 16080
## 8 16396 16420 16422 16422 16422 16422 16422 16422 16422 16422 16422 16422