#install.packages(c("quantmod", "tseries", "timeSeries", "forecast", "xts"))

library(quantmod)
## Warning: package 'quantmod' was built under R version 3.4.4
## Loading required package: xts
## Warning: package 'xts' was built under R version 3.4.4
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.4.4
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: TTR
## Version 0.4-0 included new data defaults. See ?getSymbols.
library(tseries)
## Warning: package 'tseries' was built under R version 3.4.4
library(timeSeries)
## Warning: package 'timeSeries' was built under R version 3.4.4
## Loading required package: timeDate
## Warning: package 'timeDate' was built under R version 3.4.4
## 
## Attaching package: 'timeSeries'
## The following object is masked from 'package:zoo':
## 
##     time<-
library(forecast)
## Warning: package 'forecast' was built under R version 3.4.4
library(xts)

ARIMA

ARIMA models are also called Box-Jenkins models (This is chapters 9-12) ARIMA is autoregressive integrated moving average.

Autoregressive models (AR)

We’ll start with just the AR part: autoregressive models which is just regressing a time series on itself. This means you should let y_t depend on y_(t-1) and depending on the number of time periods, maybe up to y_(t-2),….,y_(t-p).
This is denoted as AR(p) where the order p is the number of time periods you are basing the model off of previous. It can ve written as \[ y_t = {\phi_1} {y_t-1} + {\phi_2} * y_(t-2) + ... + {\phi_p} * y_(t-p) + {\epsilon_t} \] , where y_t is stationary, the error term is iid normally distributed with mean 0 and vairance sigma_2. \(\phi_i\) are constants where \(\phi_p \neq 0\).
the expected value of \(\y_t\) = 0.
example: AR(1) create 2 models: \(\phi_1\) =.9, \(\phi_1\) = -.9 looks like
\(y_t = .9 y_[t_-1] + \epsilon_t\) which implies positive correlation between \(\y_t\) and y_t-1
\(y_t = -.9 y_[t-1] + \epsilon_t\) which implies negative correlation between \(\y_t\) and y_t-1, this also then implies positive correlation with y_t and y_t-2.

We can create a simulation with 200 values with a time series and components phi_1 =.9 and also a phi_1 = -.9. p = 1 which is the first in order=c(1,0,0).

par(mfrow=c(2,1))
# phi = .9
plot(arima.sim(list(order=c(1,0,0),ar=.9) , n=200), ylab = "y", main = (expression(AR(1) ~~~ phi == +.9)))
abline(0,0)
# phi = -.9
plot(arima.sim(list(order=c(1,0,0),ar= -.9) , n=200), ylab = "y", main = (expression(AR(1) ~~~ phi == -.9)))
abline(0,0)

to fit the model, use auto.arima when creating the random data.

set.seed(2115)
fakedata <- arima.sim(list(order=c(1,0,0),ar=.9) , n=200)
plot(fakedata)

mod1 <- auto.arima(fakedata)
mod1  
## Series: fakedata 
## ARIMA(1,0,0) with zero mean 
## 
## Coefficients:
##          ar1
##       0.9266
## s.e.  0.0285
## 
## sigma^2 estimated as 1.004:  log likelihood=-284.66
## AIC=573.32   AICc=573.38   BIC=579.92

The coefficient should be close to .9 but not exact.
Now we can forcast. First we can start just with only 1 day into the future (time period 201)

mycast1 <- forecast(mod1, h=1) # includes prediction intervals
mycast1 
##     Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## 201      -6.557273 -7.841365 -5.273181 -8.521123 -4.593423

We can look at the data a little more by graphing it:

plot(mycast1)

Now to change how many periods, just change h = new value.

plot(forecast(mod1, h=10)) # 10 days into the future

The bright blue line indicates point predictions and the prediction intervals are corresponding to the 80% and 95% confidence intervals. These are wide because predictions are hard and there’s a lot of uncertainty.

Moving Average models (MA)

Now we can go onto the MA part of ARIMA.
y_t is the linear combination of \(\epsilon_[t-1]\), … , \(\epsilon_[t-q]\).
MA(q) where q is the number of past time periods of epsilon we will depend on. This is written as

\(y_t = \mu + \epsilon_t + \theta_1 * \epsilon_(t-1) + \theta_2* \epsilon_(t-2) + ... + \theta_q * \epsilon_t-q\)

where we assume theta_q neq 0 and the errors are iid with mean 0 and variance sigma^2

\(y_t = {\mu} + {\epsilon_t} + {\theta_1} * {\epsilon_t-1}\)
\(E(y_t) = {\mu} + {\theta_1} E(\epsilon_(t-1)) + E(\epsilon_t) = \mu +0 + \theta_1 * 0 = \mu\)
E(y_t given $_t-1) = + 1 E((t-1)) + E(e_t) = +0 + 1 * (t-1) $

This is why it’s called MOVING average, because the average is depending on \(\epsilon_(t-1)\)

Wecan show this on R. Let’s again use the coefficient = +/- .9 and using a moving average of order 1, let \(\theta_(t-1)\) = +/- .9

par(mfrow = c(2,1))
plot(arima.sim(list(order=c(0,0,1), ma=.9), n = 200) , ylab = "y", main = (expression(MA(1) ~~~ theta ==  +.9)))
plot(arima.sim(list(order=c(0,0,1), ma=-.9), n = 200) , ylab = "y", main = (expression(MA(1) ~~~ theta ==  -.9)))

creating the model with theta = .9

set.seed(2.718)
ma.data <- arima.sim(list(order=c(0,0,1), ma=.9), n = 200)
mod2 <- auto.arima(ma.data)
mod2
## Series: ma.data 
## ARIMA(1,0,2) with zero mean 
## 
## Coefficients:
##           ar1    ma1     ma2
##       -0.8354  1.723  0.7496
## s.e.   0.2317  0.237  0.2105
## 
## sigma^2 estimated as 1.143:  log likelihood=-296.35
## AIC=600.71   AICc=600.91   BIC=613.9

Here we can see the coefficents of the random sample data.
phi = -.8354
theta = 1.723

forecast 1 day into the future

plot(forecast(mod2, h=1)) # 1 day into the future

forecast(mod2, h=1)
##     Point Forecast     Lo 80    Hi 80     Lo 95    Hi 95
## 201      0.3159447 -1.054301 1.686191 -1.779666 2.411555

we can forecast 10 days into the future…

plot(forecast(mod2, h=10)) 

Autoregressive Moving Average Models (ARMA ( p,q) )

now we’re going to combine these 2 concepts from above. p and q are same as above.

\(y_t = \phi_1 * y_(t-1) + \phi_2 * y_(t-2) + ... + \phi_p * y_(t-p) + \epsilon_t + \mu + \theta_1 * \epsilon_t-1 + \theta_2 * \epsilon_(t-2) + ... + \theta_q *\epsilon_(t-q)\)
again assume phi_p and theta_q neq 0 and that errors are normally distributed with mean = 0 and variance = sigma^2

example: ARMA(1,1) = \(\phi_1 y_(t-1) + \epsilon_t + \theta_1 \epsilon_t-1 + \mu\)
example: ARMA(1,2) = \(\phi_1 y_(t-1) + \epsilon_t + \theta_1 \epsilon_(t-1) + \theta_2 * \epsilon_(t-2) + \mu\)

all that’s left is about integrate portion of ARIMA.

Full ARIMA(p,d,q)

3 numbers that specify how it is used : p, q, and the number of differences we’re going to take which is d.
It’s important to know why do we use the differences? We want data with 0 mean and constant variance over time. If we don’t see this in our data, we will take differences in consecutive measurements.
Example: try d = 1 which will take first order differences Z_1 = (y_2 - y_1), Z_2 = (y_3 - y_2), Z_3 = (y_4 -y_3), etc
if this does not help our , then we try :
d=2 which will take second order differences (Z_2 - Z_1), then (Z_3 - Z-2), etc
and we can keep going….
test for 0 mean and constant varaince using adf.test(), or use auto.arima (which R will check this for you)

plot(WWWusage)

mod4 <- auto.arima(WWWusage)
mod4
## Series: WWWusage 
## ARIMA(1,1,1) 
## 
## Coefficients:
##          ar1     ma1
##       0.6504  0.5256
## s.e.  0.0842  0.0896
## 
## sigma^2 estimated as 9.995:  log likelihood=-254.15
## AIC=514.3   AICc=514.55   BIC=522.08
mycast <- forecast(mod4,h=20)
plot(mycast)

Though we don’t see constant mean and variance in our data, if we take difference of 1 then we can get those assumptions fulfilled. Thus we can forcast using auto.arima() and forcast 20 periods into the future. R uses p=1, d=1, and q=1.

We can also check out other data of stock prices.

getSymbols('TECHM.NS', from='2012-01-01', to='2015-01-01')
## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
## 
## This message is shown once per session and may be disabled by setting 
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.
## 
## WARNING: There have been significant changes to Yahoo Finance data.
## Please see the Warning section of '?getSymbols.yahoo' for details.
## 
## This message is shown once per session and may be disabled by setting
## options("getSymbols.yahoo.warning"=FALSE).
## Warning: TECHM.NS contains missing values. Some functions will not work if
## objects contain missing values in the middle of the series. Consider using
## na.omit(), na.approx(), na.fill(), etc to remove or replace them.
## [1] "TECHM.NS"
stock_prices <- TECHM.NS[,4]

then create the model ARIMA using the auto.arima command.

mod5 <- auto.arima(stock_prices)
mycast5 <- forecast(mod5, h=10)
plot(mycast5)

We can also try transforming the stock prices still like we would for a normal linear regression model.

mod6 <- auto.arima(log(stock_prices))
mycast6 <- forecast(mod6, h=10)
plot(mycast6)

We plot them to look at the difference.

par(mfrow=c(2,1))
plot(mycast5)
plot(mycast6)

R automatically creates p,d, and q when using this command. The log requires p=2, whereas the normal stock requires p=0.

Now we will try with jj data.

data("AirPassengers")
library(astsa); data(globtemp)
## 
## Attaching package: 'astsa'
## The following object is masked from 'package:forecast':
## 
##     gas
data(jj)
jj
##           Qtr1      Qtr2      Qtr3      Qtr4
## 1960  0.710000  0.630000  0.850000  0.440000
## 1961  0.610000  0.690000  0.920000  0.550000
## 1962  0.720000  0.770000  0.920000  0.600000
## 1963  0.830000  0.800000  1.000000  0.770000
## 1964  0.920000  1.000000  1.240000  1.000000
## 1965  1.160000  1.300000  1.450000  1.250000
## 1966  1.260000  1.380000  1.860000  1.560000
## 1967  1.530000  1.590000  1.830000  1.860000
## 1968  1.530000  2.070000  2.340000  2.250000
## 1969  2.160000  2.430000  2.700000  2.250000
## 1970  2.790000  3.420000  3.690000  3.600000
## 1971  3.600000  4.320000  4.320000  4.050000
## 1972  4.860000  5.040000  5.040000  4.410000
## 1973  5.580000  5.850000  6.570000  5.310000
## 1974  6.030000  6.390000  6.930000  5.850000
## 1975  6.930000  7.740000  7.830000  6.120000
## 1976  7.740000  8.910000  8.280000  6.840000
## 1977  9.540000 10.260000  9.540000  8.729999
## 1978 11.880000 12.060000 12.150000  8.910000
## 1979 14.040000 12.960000 14.850000  9.990000
## 1980 16.200000 14.670000 16.020000 11.610000
data(jj)
jjmod <- auto.arima(jj)
mycastjj <- forecast(jjmod, h=100)
plot(mycastjj)

This looks like there might be some seasonal/monthy differences we should consider when plotting another model.