Here are some libraries that we need to use some of the functions we need to talk about the topic of the day namely, seasonal ARIMA.
## Warning: package 'forecast' was built under R version 3.3.3
## Warning: package 'quantmod' was built under R version 3.3.3
## Loading required package: xts
## Warning: package 'xts' was built under R version 3.3.3
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.3.3
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: TTR
## Warning: package 'TTR' was built under R version 3.3.3
## Version 0.4-0 included new data defaults. See ?getSymbols.
## Warning: package 'tseries' was built under R version 3.3.3
## Warning: package 'timeSeries' was built under R version 3.3.3
## Loading required package: timeDate
## Warning: package 'timeDate' was built under R version 3.3.3
##
## Attaching package: 'timeSeries'
## The following object is masked from 'package:zoo':
##
## time<-
We followed 5 basic steps to find an ARIMA model that takes into account seasonal trend in time series data. First, we need to plot the data and make some general observations.
data("woolyrnq")
plot(woolyrnq)
It’s sort of dificult to tell if there is seasonal variation, so we should decompose this time series data into its trend components using the ‘TScomp’ function.
decomp1 <- decompose(woolyrnq)
plot(decomp1)
There is really obviously seasonal variation so we’ll have to include it in the ARIMA model using the new parameters we learned in class (P, D, Q) and S. It looks like our seasonal trend time span is 1 year so we’ll use lag 1.
Following the set of steps we learned, now we need to figure out how many differences to take. We should try to figure out the number of seasonal differences first denoted ‘D’ because this might eliminate the need to include a ‘d’ that isn’t 0.
diffseas <- diff(woolyrnq, lag=2)
plot.ts(diffseas)
adf.test(diffseas)
## Warning in adf.test(diffseas): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: diffseas
## Dickey-Fuller = -5.4213, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
Our p-value is below 0.05 so we can safely say that with order 1 differences we have a stationary time series.
From here we can conlude that we should use D=1 and d=0 since the seasonal differences took care of the overall trend.
We learned that to choose the values for a ‘p’ and ‘q’ we should use the ‘Acf’ and ‘Pacf’ functions. These will give us autocorrelation plots with spikes at low lags for the value of ‘q’ we want to use and spikes at low lags for the value of ‘p’ we want to use respectively.
Acf(diffseas)
Pacf(diffseas)
These look kind of funky, but I think it means we should use p=2 and q=2? I guess we’ll find out. Since the spikes on multiples of S will show us which P and Q to use, let’s give 1 for each of those a shot!
Now that we chose values for our parameters, let’s let R build and plot the model for us.
mod1 <- arima(woolyrnq, order = c(2,0,2), season = list( order = c( 1,1,1), period=2))
summary(mod1)
##
## Call:
## arima(x = woolyrnq, order = c(2, 0, 2), seasonal = list(order = c(1, 1, 1),
## period = 2))
##
## Coefficients:
## ar1 ar2 ma1 ma2 sar1 sma1
## 0.5505 -0.0133 0.3025 -0.6975 -0.9831 0.7882
## s.e. 0.2913 0.1305 0.2800 0.2788 0.0152 0.0782
##
## sigma^2 estimated as 160912: log likelihood = -871.2, aic = 1756.4
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -11.15169 397.7543 287.4847 -0.6250771 5.297079 0.5118937
## ACF1
## Training set 0.0005418477
We can test whether this model represents the observed data using a Box-Ljung test.
Box.test(mod1$residuals, type = "Ljung")
##
## Box-Ljung test
##
## data: mod1$residuals
## X-squared = 3.5827e-05, df = 1, p-value = 0.9952
Our p-value is huge so this arima does a really cruddy job of representing the observed data.
We can try to improve it by adjusting the values we select for the ARIMA parameters.
newmod <- arima(woolyrnq, order = c(1,0,1), season = list( order = c( 2,1,2), period=2))
summary(newmod)
##
## Call:
## arima(x = woolyrnq, order = c(1, 0, 1), seasonal = list(order = c(2, 1, 2),
## period = 2))
##
## Coefficients:
## ar1 ma1 sar1 sar2 sma1 sma2
## 0.9545 -0.1132 -0.9878 -0.0068 -0.2364 -0.7635
## s.e. 0.0367 0.1028 0.1455 0.1367 0.1178 0.1113
##
## sigma^2 estimated as 159097: log likelihood = -871.13, aic = 1756.26
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -33.46866 395.5042 287.5235 -1.077802 5.309838 0.5119628
## ACF1
## Training set -0.004721302
Box.test(newmod$residuals, type = "Ljung")
##
## Box-Ljung test
##
## data: newmod$residuals
## X-squared = 0.00272, df = 1, p-value = 0.9584
Wow this one was also sort of crappy.
Well, that’s all folks.