Learning Log 2/26

Here are some libraries that we need to use some of the functions we need to talk about the topic of the day namely, seasonal ARIMA.

## Warning: package 'forecast' was built under R version 3.3.3

## Warning: package 'quantmod' was built under R version 3.3.3

## Loading required package: xts

## Warning: package 'xts' was built under R version 3.3.3

## Loading required package: zoo

## Warning: package 'zoo' was built under R version 3.3.3

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## Loading required package: TTR

## Warning: package 'TTR' was built under R version 3.3.3

## Version 0.4-0 included new data defaults. See ?getSymbols.

## Warning: package 'tseries' was built under R version 3.3.3

## Warning: package 'timeSeries' was built under R version 3.3.3

## Loading required package: timeDate

## Warning: package 'timeDate' was built under R version 3.3.3

## 
## Attaching package: 'timeSeries'

## The following object is masked from 'package:zoo':
## 
##     time<-

We followed 5 basic steps to find an ARIMA model that takes into account seasonal trend in time series data. First, we need to plot the data and make some general observations.

data("woolyrnq")
plot(woolyrnq)

It’s sort of dificult to tell if there is seasonal variation, so we should decompose this time series data into its trend components using the ‘TScomp’ function.

decomp1 <- decompose(woolyrnq)
plot(decomp1)

There is really obviously seasonal variation so we’ll have to include it in the ARIMA model using the new parameters we learned in class (P, D, Q) and S. It looks like our seasonal trend time span is 1 year so we’ll use lag 1.

Building the Model

Step 2: D and d

Following the set of steps we learned, now we need to figure out how many differences to take. We should try to figure out the number of seasonal differences first denoted ‘D’ because this might eliminate the need to include a ‘d’ that isn’t 0.

diffseas <- diff(woolyrnq, lag=2)
plot.ts(diffseas)

adf.test(diffseas)

## Warning in adf.test(diffseas): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  diffseas
## Dickey-Fuller = -5.4213, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary

Our p-value is below 0.05 so we can safely say that with order 1 differences we have a stationary time series.

From here we can conlude that we should use D=1 and d=0 since the seasonal differences took care of the overall trend.

Step 3: P’s and Q’s

We learned that to choose the values for a ‘p’ and ‘q’ we should use the ‘Acf’ and ‘Pacf’ functions. These will give us autocorrelation plots with spikes at low lags for the value of ‘q’ we want to use and spikes at low lags for the value of ‘p’ we want to use respectively.

Acf(diffseas)

Pacf(diffseas)

These look kind of funky, but I think it means we should use p=2 and q=2? I guess we’ll find out. Since the spikes on multiples of S will show us which P and Q to use, let’s give 1 for each of those a shot!

The Resulting Model

Now that we chose values for our parameters, let’s let R build and plot the model for us.

mod1 <- arima(woolyrnq, order = c(2,0,2), season = list( order = c( 1,1,1), period=2))
summary(mod1)

## 
## Call:
## arima(x = woolyrnq, order = c(2, 0, 2), seasonal = list(order = c(1, 1, 1), 
##     period = 2))
## 
## Coefficients:
##          ar1      ar2     ma1      ma2     sar1    sma1
##       0.5505  -0.0133  0.3025  -0.6975  -0.9831  0.7882
## s.e.  0.2913   0.1305  0.2800   0.2788   0.0152  0.0782
## 
## sigma^2 estimated as 160912:  log likelihood = -871.2,  aic = 1756.4
## 
## Training set error measures:
##                     ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -11.15169 397.7543 287.4847 -0.6250771 5.297079 0.5118937
##                      ACF1
## Training set 0.0005418477

Diagnostics

We can test whether this model represents the observed data using a Box-Ljung test.

Box.test(mod1$residuals, type = "Ljung")

## 
##  Box-Ljung test
## 
## data:  mod1$residuals
## X-squared = 3.5827e-05, df = 1, p-value = 0.9952

Our p-value is huge so this arima does a really cruddy job of representing the observed data.

We can try to improve it by adjusting the values we select for the ARIMA parameters.

newmod <- arima(woolyrnq, order = c(1,0,1), season = list( order = c( 2,1,2), period=2))
summary(newmod)

## 
## Call:
## arima(x = woolyrnq, order = c(1, 0, 1), seasonal = list(order = c(2, 1, 2), 
##     period = 2))
## 
## Coefficients:
##          ar1      ma1     sar1     sar2     sma1     sma2
##       0.9545  -0.1132  -0.9878  -0.0068  -0.2364  -0.7635
## s.e.  0.0367   0.1028   0.1455   0.1367   0.1178   0.1113
## 
## sigma^2 estimated as 159097:  log likelihood = -871.13,  aic = 1756.26
## 
## Training set error measures:
##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -33.46866 395.5042 287.5235 -1.077802 5.309838 0.5119628
##                      ACF1
## Training set -0.004721302

Box.test(newmod$residuals, type = "Ljung")

## 
##  Box-Ljung test
## 
## data:  newmod$residuals
## X-squared = 0.00272, df = 1, p-value = 0.9584

Wow this one was also sort of crappy.

Well, that’s all folks.