ARIMA
Today in class we learned about ARIMA AR stands for autoregressive MA stands for moving average and I stands for integrated
This model is used to forecast for the future. Throughout this Learning Log, I will adress more statistical terms and functions. #
##install.packages(c("quantmod", "tseries", "timeSeries", "forecast", "xts"))
library(quantmod)
## Warning: package 'quantmod' was built under R version 3.4.4
## Loading required package: xts
## Warning: package 'xts' was built under R version 3.4.4
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.4.4
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: TTR
## Warning: package 'TTR' was built under R version 3.4.3
## Version 0.4-0 included new data defaults. See ?getSymbols.
library(timeSeries)
## Warning: package 'timeSeries' was built under R version 3.4.4
## Loading required package: timeDate
## Warning: package 'timeDate' was built under R version 3.4.4
##
## Attaching package: 'timeSeries'
## The following object is masked from 'package:zoo':
##
## time<-
library(forecast)
## Warning: package 'forecast' was built under R version 3.4.4
library(xts)
Autoregressive Models
par(mfrow=c(2,1))
plot(arima.sim(list(order=c(1,0,0),ar=.9) , n=200), ylab = "y", main = (expression(AR(1) ~~~ phi == +.9)))
abline(0,0)
plot(arima.sim(list(order=c(1,0,0),ar= -.9) , n=200), ylab = "y", main = (expression(AR(1) ~~~ phi == -.9)))
abline(0,0)
Right now, we are simulating random points.
set.seed(2115)
fakedata <- arima.sim(list(order=c(1,0,0),ar=.9) , n=200)
plot(fakedata)
mod1 <- auto.arima(fakedata)
mod1
## Series: fakedata
## ARIMA(1,0,0) with zero mean
##
## Coefficients:
## ar1
## 0.9266
## s.e. 0.0285
##
## sigma^2 estimated as 1.004: log likelihood=-284.66
## AIC=573.32 AICc=573.38 BIC=579.92
We see the output from the data It tells us that p is equal to .1 the mean is 0
Then we see coefficients - ar1 parameter is estimated to be .9266 the standard error is .0285
The model it is estimating would be
\(y_{t} = .966*(\)y_{t-1}$) + \(y_{b}\) end \(e_{t}\) ~ N(0,1.004)
P is the number of previous y observations that we depend on
mycast1 <- forecast(mod1, h=1) # includes prediction intervals
plot(mycast1)
plot(forecast(mod1, h=10)) # 10 days into the future
Lower and upper bounds for the prediction intervals
Moving Average Models
par(mfrow = c(2,1))
plot(arima.sim(list(order=c(0,0,1), ma=.9), n = 200) , ylab = "y", main = (expression(MA(1) ~~~ theta == +.9)))
plot(arima.sim(list(order=c(0,0,1), ma=-.9), n = 200) , ylab = "y", main = (expression(MA(1) ~~~ theta == -.9)))
set.seed(2.718)
ma.data <- arima.sim(list(order=c(0,0,1), ma=.9), n = 200)
mod2 <- auto.arima(ma.data)
mod2
## Series: ma.data
## ARIMA(1,0,2) with zero mean
##
## Coefficients:
## ar1 ma1 ma2
## -0.8354 1.723 0.7496
## s.e. 0.2317 0.237 0.2105
##
## sigma^2 estimated as 1.143: log likelihood=-296.35
## AIC=600.71 AICc=600.91 BIC=613.9
The seed number is just random (makes sure everyone in the class gets the same random number) We are simulating data here. q is our moving average number, so here we are setting q = 1.
Simulated data does not necessarily mean that the data will look the same as the data it was simulated from.
We can see from the output, we have an ARIMA model with p = 1 and q = 2… This is something that can happen.
plot(forecast(mod2, h=1)) # 1 day into the future
forecast(mod2, h=1)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 201 0.3159447 -1.054301 1.686191 -1.779666 2.411555
plot(forecast(mod2, h=10)) # 10 days into the future
The blue dot is for the point prediction. The blue around it is the prediction interval. The first graph is 1 day into the future and the second one is ten days into the future.
Autoregressive Moving Average Models (ARMA)
set.seed(1234)
arma.data <- arima.sim(list(order=c(1,0,1), ma=.9, ar = .9), n = 200)
mod3 <- auto.arima(arma.data)
mod3
## Series: arma.data
## ARIMA(4,0,4) with non-zero mean
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1 ma2 ma3 ma4
## 1.1663 -0.5789 0.8841 -0.5949 0.7172 0.0696 -0.4784 -0.6420
## s.e. 0.3502 0.5705 0.3427 0.1475 0.3347 0.2136 0.2604 0.1482
## mean
## 1.989
## s.e. 0.372
##
## sigma^2 estimated as 0.9662: log likelihood=-278.04
## AIC=576.09 AICc=577.25 BIC=609.07
Simulate some data p = 1 q= 1
Simulated 200 observations from this
Very complicated output
Sample size of 200 may not be big enough
plot(forecast(mod3, h=1)) # 1 day into the future
forecast(mod3, h=1)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 201 2.299482 1.039752 3.559213 0.3728907 4.226074
plot(forecast(mod3, h=10)) # 10 days into the future
Use the forecase command and say how far out we want to forecast.
Autoregressive Integrated Moving Average Models (ARIMA)
plot(WWWusage)
mod4 <- auto.arima(WWWusage)
mod4
## Series: WWWusage
## ARIMA(1,1,1)
##
## Coefficients:
## ar1 ma1
## 0.6504 0.5256
## s.e. 0.0842 0.0896
##
## sigma^2 estimated as 9.995: log likelihood=-254.15
## AIC=514.3 AICc=514.55 BIC=522.08
mycast <- forecast(mod4,h=20)
plot(mycast)
Time series of the number of users connected to the internet every minute Basically number of users over time
auto.arima(time series) is how you fit the model For this data, we get ARIMA(1,1,1)
In this example, we are going 20 time periods out.
Money time
getSymbols('TECHM.NS', from='2012-01-01', to='2015-01-01')
## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
##
## This message is shown once per session and may be disabled by setting
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.
##
## WARNING: There have been significant changes to Yahoo Finance data.
## Please see the Warning section of '?getSymbols.yahoo' for details.
##
## This message is shown once per session and may be disabled by setting
## options("getSymbols.yahoo.warning"=FALSE).
## Warning: TECHM.NS contains missing values. Some functions will not work if
## objects contain missing values in the middle of the series. Consider using
## na.omit(), na.approx(), na.fill(), etc to remove or replace them.
## [1] "TECHM.NS"
stock_prices <- TECHM.NS[,4]
mod5 <- auto.arima(stock_prices)
mycast5 <- forecast(mod5, h=10)
plot(mycast5)
mod6 <- auto.arima(log(stock_prices))
mycast6 <- forecast(mod6, h=10)
plot(mycast6)
par(mfrow=c(2,1))
plot(mycast5)
plot(mycast6)
We see an upward trend which does make sense.
Here in this example, we are forecasting 10 days out.
The blue is the forecast.
data("AirPassengers")
data(globtemp)
## Warning in data(globtemp): data set 'globtemp' not found
data(jj)
## Warning in data(jj): data set 'jj' not found
I am going to work with these libraries above. I am going to use this data to create a model…
##myMod <- arima(globtemp, c = (1,0,0))
##myMod
Now, I am going to use my model to forecast into the future.
##mycast1 <- forecast(myMod, h=1) # includes prediction intervals
##plot(mycast1)
##plot(forecast(myMod, h=10)) # 10 days into the future
We can see the blue on the plot are the forecasted data points. The first one is for the next day and the next plot is the forecasted data for the next 10 days.
##mod2 <- arima(globtemp , order = c(0,0,1))
##mod2
Here is my moving average model.
##plot(forecast(mod2, h=1)) # 1 day into the future
##forecast(mod2, h=1)
##plot(forecast(mod2, h=10)) # 10 days into the future
##mod2 <- arima(globtemp , order = c(1,1,1))
##mod2
Here is my moving average model.
##plot(forecast(mod2, h=1)) # 1 day into the future
##forecast(mod2, h=1)
##plot(forecast(mod2, h=10)) # 10 days into the future