Today in class we practiced creating a few new types of time series models. The basic idea is that these models are now dependent on previous observations. Sorry about all the warnings that are about to pop up. This will be ugly.
library(quantmod)
## Warning: package 'quantmod' was built under R version 3.3.3
## Loading required package: xts
## Warning: package 'xts' was built under R version 3.3.3
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.3.3
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: TTR
## Warning: package 'TTR' was built under R version 3.3.3
## Version 0.4-0 included new data defaults. See ?getSymbols.
library(tseries)
## Warning: package 'tseries' was built under R version 3.3.3
library(timeSeries)
## Warning: package 'timeSeries' was built under R version 3.3.3
## Loading required package: timeDate
## Warning: package 'timeDate' was built under R version 3.3.3
##
## Attaching package: 'timeSeries'
## The following object is masked from 'package:zoo':
##
## time<-
library(forecast)
## Warning: package 'forecast' was built under R version 3.3.3
library(xts)
An autoregressive model of order p is denoted AR(p) where p is the number of previous observations of y that we rely on. In this type of model. Thus, we can say that \(y_t\) depends on \(y_{t-1}, . . . , y_{t-p}\).
We assume that our time series has no upwards or downwards trend and no seasonality. Also our error terms are iid N(0,\(\sigma^2\)). Also \(y_t\) has a mean of 0, and constant variance.
e.g. AR(1): \(y_t = \phi y_{t-1} + \epsilon_t\) . if \(\phi = 0.9\) then two consecutive measurements are positively correlated and if \(\phi = -0.9\) then two consecutive measurements are negatively correlated.
par(mfrow=c(2,1))
plot(arima.sim(list(order=c(1,0,0),ar=.9) , n=200), ylab = "y", main = (expression(AR(1) ~~~ phi == +.9)))
abline(0,0)
plot(arima.sim(list(order=c(1,0,0),ar= -.9) , n=200), ylab = "y", main = (expression(AR(1) ~~~ phi == -.9)))
abline(0,0)
With that set of code, we just simulated random points. Notice that the negatively correlated consecutive points have a lot more jumping around.
Now, let’s create 200 samples to simulate the AR(1) model.
set.seed(2115)
fakedata <- arima.sim(list(order=c(1,0,0),ar=.9) , n=200)
plot(fakedata)
mod1 <- auto.arima(fakedata)
mod1
## Series: fakedata
## ARIMA(1,0,0) with zero mean
##
## Coefficients:
## ar1
## 0.9266
## s.e. 0.0285
##
## sigma^2 estimated as 1.004: log likelihood=-284.66
## AIC=573.32 AICc=573.38 BIC=579.92
So we can write \(y_t = 0.966 y_{t-1} + \epsilon_t\) and the error term is iid N(0,1.004).
Now we can forecast one day into the future (h=1) and 10 days into the future (h=10) :
(mycast1 <- forecast(mod1, h=1))
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 201 -6.557273 -7.841365 -5.273181 -8.521123 -4.593423
plot(mycast1)
plot(forecast(mod1, h=10))
So we created a point prediction, and two prediction intervals: one with 80% confidence and one with 90% confidence. The blue dot is the point prediction and the blue area is where the prediction intervals were. If we do 10 days into the future you can see the darker blue shaded area where the 80% prediction interval is on the second graph. There’s a huge amount of fluctuation so our prediction intervals are rather large.
We say that in a moving average model, \(y_t\) depends on \(\epsilon_{t-1}, ... \epsilon_{t-q}\). It is denoted MA(q) where q is the number of
Thus our general equation is \(y_t = \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2}. . .+ \epsilon_t\) where \(\epsilon_t\) is iid N(0, \(\sigma^2\)).
e.g. for MA(1) \(y_t = \theta_1 \epsilon_{t-1} + \epsilon_t\)
Can we find the expected value of \(y_t\) given \(\epsilon_{t-1}\) ??? Yes, it’s \(\theta_1 \epsilon_{t-1}\) Note that we decided the expected value of \(\epsilon_{t}\) is 0!
^All of that means that all of the \(y_t\) values will hop around 0, but the variance will adjust after inclusion of previous points. The \(\theta_t \epsilon_{t-1}\) term is changing over time so the average is moving.
par(mfrow = c(2,1))
plot(arima.sim(list(order=c(0,0,1), ma=.9), n = 200) , ylab = "y", main = (expression(MA(1) ~~~ theta == +.9)))
plot(arima.sim(list(order=c(0,0,1), ma=-.9), n = 200) , ylab = "y", main = (expression(MA(1) ~~~ theta == -.9)))
for fun we can look at that^
So now we can fit a model to the simulated MA(1) data:
set.seed(2.718)
ma.data <- arima.sim(list(order=c(0,0,1), ma=.9), n = 200)
mod2 <- auto.arima(ma.data)
mod2
## Series: ma.data
## ARIMA(1,0,2) with zero mean
##
## Coefficients:
## ar1 ma1 ma2
## -0.8354 1.723 0.7496
## s.e. 0.2317 0.237 0.2105
##
## sigma^2 estimated as 1.143: log likelihood=-296.35
## AIC=600.71 AICc=600.91 BIC=613.9
plot(forecast(mod2, h=1))
forecast(mod2, h=1)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 201 0.3159447 -1.054301 1.686191 -1.779666 2.411555
plot(forecast(mod2, h=10))
Again, we get the blue dot for the point prediction and some dark blue shading for the 80% prediction interval, the lighter blue for the 90% prediction interval.
We can have an autoregressive moving average model (ARIMA)! Thus \(y_t\) will depend on both \(y_{t-1}, . . . , y_{t-p}\) and \(\epsilon_{t-1}, ... \epsilon_{t-q}\).
Thus, we get a model that looks like \(y_t = \phi_1 y_{t-1} + . . . + \phi_{p} y_{t-p} + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2}. . .+ \epsilon_t\) where the error terms are iid N(0, \(\sigma^2\)) and none of the coefficients are zero and they are all constants.
Note: ARMA(0,1) = MA(1) and ARMA(1,0) = AR(1)
set.seed(1234)
arma.data <- arima.sim(list(order=c(1,0,1), ma=.9, ar = .9), n = 200)
mod3 <- auto.arima(arma.data)
mod3
## Series: arma.data
## ARIMA(4,0,4) with non-zero mean
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1 ma2 ma3 ma4
## 1.1663 -0.5789 0.8841 -0.5949 0.7172 0.0696 -0.4784 -0.6420
## s.e. 0.3502 0.5705 0.3427 0.1475 0.3347 0.2136 0.2604 0.1482
## mean
## 1.989
## s.e. 0.372
##
## sigma^2 estimated as 0.9662: log likelihood=-278.04
## AIC=576.09 AICc=577.25 BIC=609.07
So what happens when we have a time series that has constant variance and a mean 0? This is called a stationary time series. If these conditions are not met, then we take differences between consecutive observations.
If d=1, we’ll use first order differences. That means that \(y_2 - y_1\), \(y_3 - y_2\), etc. will represent \(z_1\), \(z_2\), etc. respectively and we’ll use those for the ARIMA model. If these differences do not prodce a stationary time series, we’ll take the differences between the z values.
If d=2, we’ll use second order differences. That means that we use \(z_2 - z_1\), etc. instead. Hopefully this produces a stationary time series???? Is that what we’re doing???
Thus we end up with an ARIMA(p,d,q) model. Note that the “integrated” part is the order of difference. Now all of our assumptions are met because we have a model that has mean 0 and constant variance for sure with AR(p) and MA(q). Note: the ‘adf.test’ function will test for mean 0 and constant variance by hand.
Our WWW data is on how many people used the internet during its first years or something.
Notice in the code that we’re forecasting 20 days out and we use ‘auto.arima’ to automatically create the ARIMA model.
plot(WWWusage)
mod4 <- auto.arima(WWWusage)
mod4
## Series: WWWusage
## ARIMA(1,1,1)
##
## Coefficients:
## ar1 ma1
## 0.6504 0.5256
## s.e. 0.0842 0.0896
##
## sigma^2 estimated as 9.995: log likelihood=-254.15
## AIC=514.3 AICc=514.55 BIC=522.08
mycast <- forecast(mod4,h=20)
plot(mycast)
Now using Yahoo.finance data we can look at some stock prices (open, high, low, close) for 3 years between 2012 and 2015.
getSymbols('TECHM.NS', from='2012-01-01', to='2015-01-01')
## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
##
## This message is shown once per session and may be disabled by setting
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.
##
## WARNING: There have been significant changes to Yahoo Finance data.
## Please see the Warning section of '?getSymbols.yahoo' for details.
##
## This message is shown once per session and may be disabled by setting
## options("getSymbols.yahoo.warning"=FALSE).
## Warning: TECHM.NS contains missing values. Some functions will not work if
## objects contain missing values in the middle of the series. Consider using
## na.omit(), na.approx(), na.fill(), etc to remove or replace them.
## [1] "TECHM.NS"
stock_prices <- TECHM.NS[,4]
mod5 <- auto.arima(stock_prices)
mycast5 <- forecast(mod5, h=10)
plot(mycast5)
Note: drift = upward or downward trend.
library(astsa)
## Warning: package 'astsa' was built under R version 3.3.3
##
## Attaching package: 'astsa'
## The following object is masked from 'package:forecast':
##
## gas
data(globtemp)
plot(globtemp)
This data is the global mean temperature deviations from 1951 to 1980 in degrees centigrade. Using the ‘auto.arima’ will give us
tempmod<-auto.arima(globtemp)
tempmod
## Series: globtemp
## ARIMA(1,1,1) with drift
##
## Coefficients:
## ar1 ma1 drift
## 0.3549 -0.7663 0.0072
## s.e. 0.1314 0.0874 0.0032
##
## sigma^2 estimated as 0.01011: log likelihood=119.88
## AIC=-231.76 AICc=-231.46 BIC=-220.14
From the output we know that our ARIMA model looks something like \[y_t = (0.3549)y_{t-1} + (-0.7663)\epsilon_{t-1} + \epsilon_t\]
To forecast five years we can use the ‘forecast’ function like this:
letshaveaforecast<-forecast(tempmod,h=5)
plot(letshaveaforecast)
We put h=5 in the input to show that we were forecasting the next 5 periods. The output plot shows us the pointn predictions using a blue line. The shaded regions represent the 80% prediction interval (in the darker shade) and the 90% prediction interval (in the lighter shade).