library(quantmod)
## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: TTR
## Version 0.4-0 included new data defaults. See ?getSymbols.
library(tseries)
library(timeSeries)
## Loading required package: timeDate
## 
## Attaching package: 'timeSeries'
## The following object is masked from 'package:zoo':
## 
##     time<-
library(forecast)
library(xts)

In this class we learned about ARIMA models: autoregressive integrated moving average models. The first part of this is autoregressive models. This means that we predict a point based on a number of previous points before it, each with a different constant in front of it. The number of points is the number that gives us the best fit for prediction, which we label p. An example of how to create only this part of our arima model in R is with the following code. The seed is just a random number to start on to create arbitrary data. The arima.sim function is set up to use the seed to create the data set. The one in the vector is so that our arima model uses the autoregressive model, which is the first of three parts in our full arima model.

set.seed(2115)
fakedata <- arima.sim(list(order=c(1,0,0),ar=.9) , n=200)
plot(fakedata)

The plot just shows the fake data that we have created. Now we can create and arima model based on it to predict the next data points in time.

mod1 <- auto.arima(fakedata)
mod1  
## Series: fakedata 
## ARIMA(1,0,0) with zero mean 
## 
## Coefficients:
##          ar1
##       0.9266
## s.e.  0.0285
## 
## sigma^2 estimated as 1.004:  log likelihood=-284.66
## AIC=573.32   AICc=573.38   BIC=579.92

The model says that our coefficient in front of our previous term is 0.9266 with a standard error of .0285. It should also be noted that the mean is zero. This is good because in these models our mean should always be zero and we should have constant variance. Now let’s use the forecast function to predict points with our model. The h=10 means that we are predicting 10 points into the futre.

mycast1 <- forecast(mod1, h=10)
plot(mycast1) 

The blue line contains the predicted points, while the dark blue area contains an 80% confidence interval for the points, which makes sense because it gets wider as time goes on. It is harder to predict point the farther out you try to predict them. The light blue area is a 95% confidence interval. Now let’s try the moving average part of the arima model. This part predicts points based on the previous q error terms. Where q is the number of error terms that provides the best fit. Also, each error term is given a constant in front of it. Notice that this time our 1 is in the third part of our vector now. This is because the moving average is the third part of our full arima model.

set.seed(2.718)
ma.data <- arima.sim(list(order=c(0,0,1), ma=.9), n = 200)
mod2 <- auto.arima(ma.data)
mod2
## Series: ma.data 
## ARIMA(1,0,2) with zero mean 
## 
## Coefficients:
##           ar1    ma1     ma2
##       -0.8354  1.723  0.7496
## s.e.   0.2317  0.237  0.2105
## 
## sigma^2 estimated as 1.143:  log likelihood=-296.35
## AIC=600.71   AICc=600.91   BIC=613.9

This time our arima is modeled on the previous two error terms as well as our previous term as the ma terms are moving average terms. Once again we can use this to predict into the future.

plot(forecast(mod2, h=10))

This time our data jumps up and down a lot around a center that is fairly constant mean, so our predictions are fairly centered around that mean with wide confidence intervals. The last part of our model, representing the d in arima(p,d,q), is difference level. A difference level of one means that instead of using previous terms, it uses the differences in the previous terms. If it were level two it would use the differences of those differences and the levels progress in the same way from there. So now that we understand the full arima models, let’s use some real data to create arima models. We’ll use data on the johnson and johnson company sales.

library(astsa)
## 
## Attaching package: 'astsa'
## The following object is masked from 'package:forecast':
## 
##     gas
jj.mod <- auto.arima(jj)
jj.mod
## Series: jj 
## ARIMA(1,1,2)(0,1,0)[4] 
## 
## Coefficients:
##           ar1      ma1      ma2
##       -0.7921  -0.0970  -0.3945
## s.e.   0.1396   0.1802   0.1580
## 
## sigma^2 estimated as 0.1834:  log likelihood=-44.07
## AIC=96.14   AICc=96.68   BIC=105.61

We can see that the best fit for this model is to use first level differences for one previous term and two previous errors. The second vector you see means that we have seasonal variation. We will cover this later but for now we will ignore it. Now lets make a prediction using this model.

jjcast <- forecast(jj.mod, h=10)
plot(jjcast)

We can see that this is a vary accurate model because even the terms farther out have small confidence intervals. Let’s try one more data set. This time we’ll use gold prices.

gomod <- auto.arima(gold)
gomod
## Series: gold 
## ARIMA(1,1,2) with drift 
## 
## Coefficients:
## Warning in sqrt(diag(x$var.coef)): NaNs produced
##           ar1      ma1      ma2   drift
##       -0.1628  -0.1744  -0.0552  0.0713
## s.e.      NaN      NaN      NaN  0.1151
## 
## sigma^2 estimated as 32.47:  log likelihood=-3410.82
## AIC=6831.64   AICc=6831.7   BIC=6856.69

Coincidentally, we’ve found the same type of arima function using the previous term and the two previous error terms on first level differences. This time we do not have seasonal variation, but we do have drift, meaning the data drifts upward. Let’s try and predict 20 points into the future this time.

gocast <- forecast(gomod, h=20)
plot(gocast)

Our prediction line is fairly constant but our intervals grow substantially larger over time, which is to be expected.