Intro to ARIMA

On Tuesday in class, we went over ARIMA or autoregressive integrated moving average models.

Autoregressive Model of Order P

We learned that the autoregressive model of order p is AR(p). Where \(Y_t\) depends on \(Y_t-1_\), \(Y_t-2_\), \(Y_t-3_\) to \(Y_t-p_\) \[Y_t = x1Y_t-1_ + x2Y_t-2_ +xpY_t-p_ +E_t\] where E_t is iid and normal, x1…..xp are constants, \(Y_t\) has mean 0 and constant variance. p is the number of previous y observations.

Lets look at an ARIMA model in R.

library(quantmod)
## Warning: package 'quantmod' was built under R version 3.4.4
## Loading required package: xts
## Warning: package 'xts' was built under R version 3.4.3
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.4.3
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: TTR
## Warning: package 'TTR' was built under R version 3.4.3
## Version 0.4-0 included new data defaults. See ?getSymbols.
library(tseries)
## Warning: package 'tseries' was built under R version 3.4.4
library(timeSeries)
## Warning: package 'timeSeries' was built under R version 3.4.4
## Loading required package: timeDate
## Warning: package 'timeDate' was built under R version 3.4.3
## 
## Attaching package: 'timeSeries'
## The following object is masked from 'package:zoo':
## 
##     time<-
library(forecast)
## Warning: package 'forecast' was built under R version 3.4.4
library(xts)

par(mfrow=c(2,1))
# phi = .9
plot(arima.sim(list(order=c(1,0,0),ar=.9) , n=200), ylab = "y", main = (expression(AR(1) ~~~ phi == +.9)))
abline(0,0)
# phi = -.9
plot(arima.sim(list(order=c(1,0,0),ar= -.9) , n=200), ylab = "y", main = (expression(AR(1) ~~~ phi == -.9)))
abline(0,0)

The .9 model has less bumps because the -.9 has negative correlation so it is bumping back and forth more than .9 which has positive correlation.

We can tell what the best ARIMA is by using the auto.arima command. An example of this is located below.

set.seed(2115)
fakedata <- arima.sim(list(order=c(1,0,0),ar=.9) , n=200)
plot(fakedata)

mod1 <- auto.arima(fakedata)
mod1  
## Series: fakedata 
## ARIMA(1,0,0) with zero mean 
## 
## Coefficients:
##          ar1
##       0.9266
## s.e.  0.0285
## 
## sigma^2 estimated as 1.004:  log likelihood=-284.66
## AIC=573.32   AICc=573.38   BIC=579.92

Now that we have a model we can forecast our future points. To forecast for 1 day into the future using this model, we use the command forecast(modelname, h=number_of_days).

mycast1 <- forecast(mod1, h=1)
plot(mycast1) 

Changing h means changing the number of days so lets change h to 10 or 10days.

plot(forecast(mod1, h=10))

We can see the more days we forcast the less correct the model is.

Moving Average Models

Now that we have a clear understanding of the autoregressive model of order p, we can find the moving average models, MA(q). The same requirements above applys to this model.

An example of MA(1) is yt=x1*Et-1 +Et. This means E(yt) = E(xEt-1)+E(Et)=x1E(Et-1)+E(Et)=0 And E(yt given Et-1)=E(x1Et-1 given Et-1)+E(Et given Et-1)=x1E(Et-1 given Et-1)+0= x1 Et-1

Autoregressive Moving Average Model

Now that we know the steps above we can combined the AR and the MA into ARMA. Where an ARMA (0,1) is a model with MA(1) and vice versa. ARMA means that there is a linear regression of Yt and Et. It will have to be normal again, centered at 0 and the xs have to be constants.

Now that we understand the background information, we can input a ARMA code to see how ma affects the moving model and see how it forcasts.

set.seed(1234)
arma.data <- arima.sim(list(order=c(1,0,1), ma=.9, ar = .9), n = 200)
mod3 <- auto.arima(arma.data)
mod3
## Series: arma.data 
## ARIMA(4,0,4) with non-zero mean 
## 
## Coefficients:
##          ar1      ar2     ar3      ar4     ma1     ma2      ma3      ma4
##       1.1663  -0.5789  0.8841  -0.5949  0.7172  0.0696  -0.4784  -0.6420
## s.e.  0.3502   0.5705  0.3427   0.1475  0.3347  0.2136   0.2604   0.1482
##        mean
##       1.989
## s.e.  0.372
## 
## sigma^2 estimated as 0.9662:  log likelihood=-278.04
## AIC=576.09   AICc=577.25   BIC=609.07
plot(forecast(mod3, h=1)) # 1 day into the future

forecast(mod3, h=1)
##     Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
## 201       2.299482 1.039752 3.559213 0.3728907 4.226074
plot(forecast(mod3, h=10)) # 10 days into the future

As we can see this model is more correct than the last plots we did at day 10.

Autoregressive Integrated Moving Average Models

If the time series doesn’t have constant variance and mean we take differences between consecutive observations. To do this we take d=1 use first order diffs calc y2-y1, y3-y2. If that doesn’t work we take the differences of these, (y3-y2)-(y2-y1). This is the intergated piece or the I in ARIMA. auto.arima will chose the best p, d, q but if you want to test by hand use the adf.test function to see if variance is constant.

Now we can put the data into a finished ARIMA model.

plot(WWWusage)

mod4 <- auto.arima(WWWusage)
mod4
## Series: WWWusage 
## ARIMA(1,1,1) 
## 
## Coefficients:
##          ar1     ma1
##       0.6504  0.5256
## s.e.  0.0842  0.0896
## 
## sigma^2 estimated as 9.995:  log likelihood=-254.15
## AIC=514.3   AICc=514.55   BIC=522.08
mycast <- forecast(mod4,h=20)
plot(mycast)

As we can see the prediction interval increases as we forecast more into the future, showing that those predictions 20 years out are more likely to be not correct.

Examples

Now that we know how to use ARIMA lets do some examples.

Example 1.

Using data from Yahoo Finance from the year 2012 and 2015 we will forecast some future points.

getSymbols('TECHM.NS', from='2012-01-01', to='2015-01-01')
## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
## 
## This message is shown once per session and may be disabled by setting 
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.
## 
## WARNING: There have been significant changes to Yahoo Finance data.
## Please see the Warning section of '?getSymbols.yahoo' for details.
## 
## This message is shown once per session and may be disabled by setting
## options("getSymbols.yahoo.warning"=FALSE).
## Warning: TECHM.NS contains missing values. Some functions will not work if
## objects contain missing values in the middle of the series. Consider using
## na.omit(), na.approx(), na.fill(), etc to remove or replace them.
## [1] "TECHM.NS"

With this data we will now build a model based on close price series.

stock_prices <- TECHM.NS[,4]
mod5 <- auto.arima(stock_prices)
mycast5 <- forecast(mod5, h=10)
plot(mycast5)
mod5 <- auto.arima(stock_prices)
mycast5 <- forecast(mod5, h=10)
plot(mycast5)

mod6 <- auto.arima(log(stock_prices))
mycast6 <- forecast(mod6, h=10)
plot(mycast6)

par(mfrow=c(2,1))
plot(mycast5)
plot(mycast6)

Example 2

data(gas)
mod7 <- auto.arima(gas)
mycast7 <- forecast(mod7, h=10)
plot(mycast7)

mod8 <- auto.arima(log(gas))
mycast8 <- forecast(mod8, h=10)
plot(mycast8)

Example 3

data(gold)
mod9 <- auto.arima(gold)
mycast9 <- forecast(mod9, h=10)
plot(mycast9)

mod10 <- auto.arima(log(gold))
mycast10 <- forecast(mod10, h=10)
plot(mycast10)

Example 3

library(astsa)
## Warning: package 'astsa' was built under R version 3.4.3
## 
## Attaching package: 'astsa'
## The following object is masked _by_ '.GlobalEnv':
## 
##     gas
## The following object is masked from 'package:forecast':
## 
##     gas
data(globtemp)
mod11 <- auto.arima(globtemp)
mycast11 <- forecast(mod11, h=10)
plot(mycast11)

mod12 <- auto.arima(log(globtemp))
## Warning in log(globtemp): NaNs produced
mycast12 <- forecast(mod12, h=10)
plot(mycast12)