In class, we discussed the components that make up the ARIMA model, or the Autoregressive Integrated Moving Average models. There are three components:
Autoregressive
Moving Average
Integration
I am going to discuss each of these components in further detail when taking a look at all of the different models that these create. We can have an Autoregressive Model(AR), a Moving Average Model(MA), an Autoregressive Moving Average Model(ARMA), and an Autoregressive Integrated Moving Average Model(ARIMA).
First, we discussed the autoregressive model of order p, which is denoted as AR(p). For an autoregressive model, the response \(y_t\) depends on \(y_{t-1},y_{t-2},...,y_{t-p}\). The model has the following equation: \[y_t= \phi_1 y_{t-1} + \phi_2 y_{t-2} +...+ \phi_p y_{t-p} + \epsilon_t\] where:
-\(\phi_p \neq 0\), and \(\phi_1,...,\phi_p\) are constants
-\(y_t\) has a mean of 0, with a constant variance and no seasonality
We did an example in class for an Autoregressive Model of order p=1, which is denoted as AR(1). To create all of the models in this learning log, we need to install and call the following packages : quantmod, tseries, timeSeries, forecast, and xts.
## install.packages(c("quantmod", "tseries", "timeSeries", "forecast", "xts"))
library(quantmod)
## Loading required package: xts
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: TTR
## Version 0.4-0 included new data defaults. See ?getSymbols.
library(tseries)
library(timeSeries)
## Loading required package: timeDate
## Warning in as.POSIXlt.POSIXct(Sys.time()): unknown timezone 'zone/tz/2018c.
## 1.0/zoneinfo/America/Chicago'
##
## Attaching package: 'timeSeries'
## The following object is masked from 'package:zoo':
##
## time<-
library(forecast)
library(xts)
Now, we are going to look at two different AR(1) models, with different values for \(\phi\). The equation for an AR(1) is \(y_t=\phi_1 y_{t-1} +\epsilon_t\). In order to simulate the data, we use the arima.sim function and specify that it is of the first order with order=c(1,0,0), and we can specify the value of \(\phi\) with ar. For the models we chose, we used +0.9 and -0.9 for values of \(\phi\). If \(\phi = +0.9\), then the equation is going to be \(y_t=0.9 y_{t-1} +\epsilon_t\), but if \(\phi = -0.9\), then the equation is \(y_t=-0.9 y_{t-1} +\epsilon_t\). A positive \(\phi\) means that two consecutive measurements are positively correlated, while a negative \(\phi\) means that two consecutive measurements are negatively correlated. Let’s plot the models with different \(\phi\) values and see what they look like:
########## Autoregressive Models
# Let's look at 2 ar(1) models
par(mfrow=c(2,1))
# phi = .9
plot(arima.sim(list(order=c(1,0,0),ar=.9) , n=200), ylab = "y", main = (expression(AR(1) ~~~ phi == +.9)))
abline(0,0)
# phi = -.9
plot(arima.sim(list(order=c(1,0,0),ar= -.9) , n=200), ylab = "y", main = (expression(AR(1) ~~~ phi == -.9)))
abline(0,0)
Looking at the two graphs, it is clear that there is a lot more choppiness and spikes in the model where \(\phi=-0.9\). This is due to the fact that there is switching from positive to negative occurring. Now we want to be able to fit a model with the simulated data. We can take a look at the model with \(\phi=0.9\) for the sake of this learning log. To create the model, we use the auto.arima function on our simulated data set. This function acts similarly to the lm function that we are familiar with for linear models.
# Let's fit a model using simulated ar(1) data
set.seed(2115)
fakedata <- arima.sim(list(order=c(1,0,0),ar=.9) , n=200)
plot(fakedata)
mod1 <- auto.arima(fakedata)
mod1
## Series: fakedata
## ARIMA(1,0,0) with zero mean
##
## Coefficients:
## ar1
## 0.9266
## s.e. 0.0285
##
## sigma^2 estimated as 1.004: log likelihood=-284.66
## AIC=573.32 AICc=573.38 BIC=579.92
In our output we are given the coefficient 0.9266, which means that our final equation for our model is: \(y_t=0.9266 y_{t-1} +\epsilon_t\). Also, the output mentions that there is a mean of zero, and that the estimated variance is 1.004. So, our model’s \(\epsilon_t\) follows a normal distribution with \(\mu=0\) and \(\sigma^2=1.004\)
Now that we have a model, we can use this to predict what future values might be for the response. To forecast response values, we use the forecast function, and can specify the number of time periods to forecast by changing the value of h. We are going to look at h=1, a forecast one time period into the future, as well as h=10, a forecast with 10 time periods into the future. We are going to plot these forecasts so that we can get a visual of what to expect.
# Now let's forecast using the model we just created
# 1 day into the future
mycast1 <- forecast(mod1, h=1) # includes prediction intervals
plot(mycast1)
plot(forecast(mod1, h=10)) # 10 days into the future
The dark blue line represents the point predictions for the forecast. You can note that there is a dark blue shaded area around the line, which represents the 80% confidence interval. Outside of that is a lighter blue shaded area, which represents the 95% confidence interval.
Now we want to explore the second component of our ARIMA models: Moving Average models. A Moving Average model of order q is denoted as MA(q). For a moving average model, our response \(y_t\) depends on \(\epsilon_{t-1},\epsilon_{t-2},...,\epsilon_{t-q}\).
The model has the following equation: \[y_t= \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} +...+ \theta_q \epsilon_{t-q} + \epsilon_t\] where:
-\(\theta_q \neq 0\) and \(\theta_1,...,\theta_q\) are constant
Like with our Autoregressive model, we are going to simulate a fake data set, and look at different values for the coefficient, which is \(\theta\) for this model. We are going to use \(\theta=+0.9\) and \(\theta=-0.9\), which can be changed with specifying the value of ma. In the autoregressive model, the order was c(1,0,0), but for the moving average model, we are going to use c(0,0,1) to indicate that it is a first order moving average model. The equation for a MA(1) is \(y_t=\theta_1 \epsilon_{t-1} +\epsilon_t\) We are again going to plot each simulation:
########### Moving Average Models
par(mfrow = c(2,1))
plot(arima.sim(list(order=c(0,0,1), ma=.9), n = 200) , ylab = "y", main = (expression(MA(1) ~~~ theta == +.9)))
plot(arima.sim(list(order=c(0,0,1), ma=-.9), n = 200) , ylab = "y", main = (expression(MA(1) ~~~ theta == -.9)))
Now we are going to fit a model using the simulated MA(1) data. Again, we use the auto.arima function to do so.
set.seed(2.718)
ma.data <- arima.sim(list(order=c(0,0,1), ma=.9), n = 200)
mod2 <- auto.arima(ma.data)
mod2
## Series: ma.data
## ARIMA(1,0,2) with zero mean
##
## Coefficients:
## ar1 ma1 ma2
## -0.8354 1.723 0.7496
## s.e. 0.2317 0.237 0.2105
##
## sigma^2 estimated as 1.143: log likelihood=-296.35
## AIC=600.71 AICc=600.91 BIC=613.9
The resulting model of fit has an autoregressive component of order 1, and moving average component of order 2, which is seen in ARIMA(1,0,2). The first number is the autoregressive component, the last one is the moving average component, and the middle number is the integrated component, which will be discussed later. We are given the following equation for the model after adding the coefficients: \(y_t= -0.8354y_{t-1} + 1.723\epsilon_{t-1} + 0.7496\epsilon_{t-2} +\epsilon_t\) Now we want to create a forecast for 1 day into the future and 10 days into the future. Like before, we use the forecast function on the model, and specify the number of days by changing the value of h.
# Now let's forecast using the model we just created
plot(forecast(mod2, h=1)) # 1 day into the future
forecast(mod2, h=1)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 201 0.3159447 -1.054301 1.686191 -1.779666 2.411555
plot(forecast(mod2, h=10)) # 10 days into the future
Our third model that we can create combines the previous two models we’ve looked at: the Autoregressive model, and the Moving Average model. It is denoted as ARMA with orders p and q, which are from the autoregressive and moving average models respectively. The model has the following equation: \[y_t= \phi_1 y_{t-1} + \phi_2 y_{t-2} +...+ \phi_p y_{t-p} +\theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} +...+ \theta_q \epsilon_{t-q} + \epsilon_t\] where: - the error term \(\epsilon_t\) is independent and identically distributed and normally distributed with a mean of 0 and variance of \(\sigma^2\)
-\(\theta_q \neq 0\),$ _p 0$ and \(\theta\) and \(\phi\) are all constants
If there is order zero of one of the model types , then it simply does not include that component in the model. For example, ARMA(0,1) is the same thing as a first order moving average model (MA(1)), and ARMA(1,0) is the same thing as a first order autoregressive model (AR(1)).
Here is an example of how to create an ARMA model of fit. In this case, both ma and ar are specified since both ar included.
########### Autoregressive Moving Average Models (ARMA)
# Let's fit a model using simulated MA(1) data
set.seed(1234)
arma.data <- arima.sim(list(order=c(1,0,1), ma=.9, ar = .9), n = 200)
mod3 <- auto.arima(arma.data)
mod3
## Series: arma.data
## ARIMA(4,0,4) with non-zero mean
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1 ma2 ma3 ma4
## 1.1663 -0.5789 0.8841 -0.5949 0.7172 0.0696 -0.4784 -0.6420
## s.e. 0.3502 0.5705 0.3427 0.1475 0.3347 0.2136 0.2604 0.1482
## mean
## 1.989
## s.e. 0.372
##
## sigma^2 estimated as 0.9662: log likelihood=-278.04
## AIC=576.09 AICc=577.25 BIC=609.07
As you can see, there are four theta terms and four phi terms, meaning that p=4 and q=4. What’s different with this model is that the mean is no longer zero. We can now use this model to forecast into the future, just like in previous models. We are again going to look at the forecast one day into the future as well as 10 days into the future.
# Now let's forecast using the model we just created
plot(forecast(mod3, h=1)) # 1 day into the future
forecast(mod3, h=1)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 201 2.299482 1.039752 3.559213 0.3728907 4.226074
plot(forecast(mod3, h=10)) # 10 days into the future
Our final model type incorporates all of the components. It is like ARMA, except now we have the integrated factor. For this, we want a time series with constant variance and a mean of zero (also known as a stationary time series). If we don’t have this, we then take the difference between consecutive observations. Each time that the difference is taken is an order of integration. For example, for a first order (denoted d=1), we calculate \(y_2-y_1\), \(y_3-y_2\), \(y_4-y_3\)… where each of these values is called \(z_1\), \(z_2\), \(z_3\)…,respectively. The second order of integration, or d=2 would then be \(z_2-z_1\), \(z_3-z_2\)…
For an example of using the ARIMA model, we can use a real data set as opposed to a simulated dataset. We can use WWWusage, which shows the internet usage per minute (number of users connected to the Internet through a server every minute). We can plot the data, create the ARIMA model, and then create a forecast for 20 days out.
plot(WWWusage)
mod4 <- auto.arima(WWWusage)
mod4
## Series: WWWusage
## ARIMA(1,1,1)
##
## Coefficients:
## ar1 ma1
## 0.6504 0.5256
## s.e. 0.0842 0.0896
##
## sigma^2 estimated as 9.995: log likelihood=-254.15
## AIC=514.3 AICc=514.55 BIC=522.08
mycast <- forecast(mod4,h=20)
plot(mycast)
We can conduct another example, but this time it is pulling from a Yahoo finance data set. This data set shows the changing of stock prices over time.
############# Money time
# Pull data from Yahoo finance
getSymbols('TECHM.NS', from='2012-01-01', to='2015-01-01')
## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
##
## This message is shown once per session and may be disabled by setting
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.
##
## WARNING: There have been significant changes to Yahoo Finance data.
## Please see the Warning section of '?getSymbols.yahoo' for details.
##
## This message is shown once per session and may be disabled by setting
## options("getSymbols.yahoo.warning"=FALSE).
## Warning: TECHM.NS contains missing values. Some functions will not work if
## objects contain missing values in the middle of the series. Consider using
## na.omit(), na.approx(), na.fill(), etc to remove or replace them.
## [1] "TECHM.NS"
We are going to look at the closing prices for the stock over time, which are found in the 4th column of the data set. We can then create an ARIMA model that forecasts 10 days into the future. Like previous models we have done, we can also use transformations to try and improve the model. In this example, we used the log of the stock prices as well.
# Select the relevant close price series
stock_prices <- TECHM.NS[,4]
mod5 <- auto.arima(stock_prices)
mycast5 <- forecast(mod5, h=10)
plot(mycast5)
mod6 <- auto.arima(log(stock_prices))
mycast6 <- forecast(mod6, h=10)
plot(mycast6)
par(mfrow=c(2,1))
plot(mycast5)
plot(mycast6)
For some more examples, you can look to the AirPassengers, globtemp, jj, gas, and gold data sets.
data("AirPassengers")
library(astsa); data(globtemp)
##
## Attaching package: 'astsa'
## The following object is masked from 'package:forecast':
##
## gas
data(jj)
mod7 <- auto.arima(AirPassengers)
mycast7 <- forecast(mod7, h=10)
plot(mycast7)
forecast(mycast7,h=1)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1961 445.6349 430.8903 460.3794 423.085 468.1847
mod8 <- auto.arima(jj)
mycast8 <- forecast(mod8, h=10)
plot(mycast8)
forecast(mycast8,h=5)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 1981 Q1 17.78046 17.23158 18.32934 16.94102 18.61990
## 1981 Q2 16.25397 15.70173 16.80621 15.40939 17.09855
## 1981 Q3 17.60119 17.00261 18.19976 16.68575 18.51663
## 1981 Q4 13.19339 12.58713 13.79965 12.26620 14.12058
## 1982 Q1 19.36210 18.39633 20.32788 17.88507 20.83913
mod9 <- auto.arima(gas)
mycast9 <- forecast(mod9, h=50)
plot(mycast9)
forecast(mycast9,h=10)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2010.481 189.3976 178.4237 200.3715 172.6145 206.1807
## 2010.500 187.5015 171.5810 203.4219 163.1533 211.8496
## 2010.519 184.9345 164.8505 205.0184 154.2187 215.6502
## 2010.538 184.9345 160.5483 209.3207 147.6390 222.2299
## 2010.558 184.9345 156.8986 212.9703 142.0574 227.8116
## 2010.577 184.9345 153.6722 216.1967 137.1230 232.7460
## 2010.596 184.9345 150.7489 219.1200 132.6522 237.2167
## 2010.615 184.9345 148.0567 221.8123 128.5348 241.3342
## 2010.635 184.9345 145.5480 224.3209 124.6981 245.1709
## 2010.654 184.9345 143.1898 226.6791 121.0916 248.7774
mod10 <- auto.arima(gold)
mycast10 <- forecast(mod10, h=50)
plot(mycast10)
forecast(mycast9,h=10)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2010.481 189.3976 178.4237 200.3715 172.6145 206.1807
## 2010.500 187.5015 171.5810 203.4219 163.1533 211.8496
## 2010.519 184.9345 164.8505 205.0184 154.2187 215.6502
## 2010.538 184.9345 160.5483 209.3207 147.6390 222.2299
## 2010.558 184.9345 156.8986 212.9703 142.0574 227.8116
## 2010.577 184.9345 153.6722 216.1967 137.1230 232.7460
## 2010.596 184.9345 150.7489 219.1200 132.6522 237.2167
## 2010.615 184.9345 148.0567 221.8123 128.5348 241.3342
## 2010.635 184.9345 145.5480 224.3209 124.6981 245.1709
## 2010.654 184.9345 143.1898 226.6791 121.0916 248.7774