In class we talked about seasonal ARIMA and decomposing time series data into different parts as well as creating our own ARIMA model instead of using the auto.arima command.

The data set I will use is the air passengers data.

library(forecast)
library(quantmod)
## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: TTR
## Version 0.4-0 included new data defaults. See ?getSymbols.
library(tseries)
library(timeSeries)
## Loading required package: timeDate
## 
## Attaching package: 'timeSeries'
## The following object is masked from 'package:zoo':
## 
##     time<-
library(forecast)
library(xts)
plot(AirPassengers)

The first thing about this plot is that we can definitely see an increase in size of the variation so we will want to do a log transformation but we also can tell there is an obvious seasonal component. We want to decompose this now to find the seasonal piece.

dc <- decompose(log(AirPassengers))
plot(dc)

Now that we have our plot split into seasonal, trend, and randomness components we can see the overall trend upwards and the consistent seasonal trend. We will need to include the seasonal part in our ARIMA model we make.

Next we will need to find our order of differencing. We will use a lag of 12 because we have monthly data so that means that it goes back 12 months to take the difference.

diffs <- diff(log(AirPassengers), lag=12)
plot(diffs)

adf.test(diffs)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diffs
## Dickey-Fuller = -3.1899, Lag order = 5, p-value = 0.09265
## alternative hypothesis: stationary

Our p-value is low enough and our data seems stationary so we do not need to include the non-seasonal differencing.

Now we need to find our p and q.

acf(diffs)

pacf(diffs)

The first shows only one spike after 8 so we will do a Q of 1 and a q of at least 4.

The second shows two spikes right away so we will use p of at least 1 and a P of 0 since there is no spike at 12.

We will create our model.

m1 <- arima(log(AirPassengers), order = c(1, 0, 4), season = list(order = c(0, 1, 1), period= 12))
summary(m1)
## 
## Call:
## arima(x = log(AirPassengers), order = c(1, 0, 4), seasonal = list(order = c(0, 
##     1, 1), period = 12))
## 
## Coefficients:
##          ar1      ma1     ma2      ma3      ma4     sma1
##       0.9981  -0.3776  0.0565  -0.1641  -0.0615  -0.5815
## s.e.  0.0029   0.0933  0.1022   0.1013   0.0976   0.0803
## 
## sigma^2 estimated as 0.001302:  log likelihood = 247.39,  aic = -480.77
## 
## Training set error measures:
##                       ME       RMSE        MAE        MPE      MAPE
## Training set 0.002452187 0.03457656 0.02584615 0.04772145 0.4685735
##                  MASE          ACF1
## Training set 0.285323 -0.0007259697

This output gives ur our AIC, our coefficients, sigma-squared, and our log likelihood.