Today in class we learned about autoregressive integrated moving average (ARIMA) models. These models have three main parts, which are described in the following three sections. Each of the three components can be combined to create different types of time series models.
While multiple different models can be formed, there are certain assumptions that apply to all ARIMA models. First, all of the error terms, \({\epsilon_t}\), are normally distributed with a mean of zero and variance of \({\sigma^2}\). Also, all \({\phi_t}\) and \({\theta_t}\) are constants. In addition, the time series has a mean of zero and a constant variance, which means that the time series is “stationary”. Lastly, we do not want there to be seasonality in our model.
First, we learned about autoregressive models, which is what the “AR” stands for in “ARIMA”. As the name implies, an autogreressive model is a time series model that regresses upon itself. In other words, the forecast produced from an autoregressive model is based on past observations. More specifically, for a model of order p, the time series at time t depends on the last p observations. In general, \({\phi_t}\) is used for the general coefficients in this type of model.
Second, we learned about moving average models, which is what the “MA” in “ARIMA” stands for. In this type of model, our prediction, \(y_t\), is dependent on the residuals from previous time periods. Specifically, a moving average model of order q is based on the residuals from the last q time periods. In general, \({\theta_t}\) is used for the general coefficients in this type of model.
Note that for this model the long term mean for \(y_t\) is zero. However, if we are given some value for \({\epsilon_{t-x}}\), which is a previous residual, the expectation of \(y_t\) is dependent on that value.
Finally, we talked about the integrated component of an ARIMA model, which accounts for the “I” in “ARIMA”. As we mentioned, we need our time series to be stationary. However, if our time series is not stationary we can take the differences of consecutive observations and create a better model. For first order differences (d=1) we take the differences of consecutive observations from our data. If those differences are stationary we will use those to create our time series. However, if those do not have a mean of zero and constant variance, we may have to use second order differences (d=2) where we take the differences of the first order differences. Overall though, we want to use the smallest order of differences possible.
Now we will look at the globtemp data in the astasa library to create an ARIMA model.
Before we begin, we need to install the packages that will allow us to use the auto.arima function. We use the coding below to do so.
library(quantmod)
## Warning: package 'quantmod' was built under R version 3.4.4
## Loading required package: xts
## Warning: package 'xts' was built under R version 3.4.4
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.4.3
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: TTR
## Warning: package 'TTR' was built under R version 3.4.3
## Version 0.4-0 included new data defaults. See ?getSymbols.
library(tseries)
## Warning: package 'tseries' was built under R version 3.4.4
library(timeSeries)
## Warning: package 'timeSeries' was built under R version 3.4.2
## Loading required package: timeDate
## Warning: package 'timeDate' was built under R version 3.4.3
## Warning in as.POSIXlt.POSIXct(Sys.time()): unknown timezone 'zone/tz/2018c.
## 1.0/zoneinfo/America/Chicago'
##
## Attaching package: 'timeSeries'
## The following object is masked from 'package:zoo':
##
## time<-
library(forecast)
## Warning: package 'forecast' was built under R version 3.4.4
library(xts)
Now, let’s call and attach our data. As mentioned, we will use the globtemp data in the astasa library.
library(astsa)
## Warning: package 'astsa' was built under R version 3.4.3
##
## Attaching package: 'astsa'
## The following object is masked from 'package:forecast':
##
## gas
data(globtemp)
To get a better understanding of our data, let’s plot the data by using the plot command and inputting the name of our data.
plot(globtemp)
From the plot we see that the temperature fluctuates over time, but overall it looks like there’s an upward trend in our data, especially since the year 1965.
Now that we know what our data looks like, let’s make an ARIMA model. We can use the auto.arima command to create an ARIMA model from the data we input. R automatically creates a model with the best values for p, d, and q.
mymod <- auto.arima(globtemp)
mymod
## Series: globtemp
## ARIMA(1,1,1) with drift
##
## Coefficients:
## ar1 ma1 drift
## 0.3549 -0.7663 0.0072
## s.e. 0.1314 0.0874 0.0032
##
## sigma^2 estimated as 0.01011: log likelihood=119.88
## AIC=-231.76 AICc=-231.46 BIC=-220.14
The output tells us that our autoregression aspect of our model is of the first order. This means that our time series at time t is dependent on the most recent previous observation. Also, the integrated portion of our model is of the first order. This means that our model depends on the difference of previous observations to get a time series with mean zero and constant variance. Lastly, the moving average aspect is also of the first order, which means that our model depnds on the residual from the most recent previous time period.
Our output also tells us the coefficients for our model. We see that our coefficient for our past observation is 0.3549. Since this value is positive, we would expect consecutive observations to be positively correlated, all else held constant. Also, our coefficient for our past residual is -0.7663. Therefore we would expect our observation to have an inverse relationship with the past residual, all else held constant.
Lastly, our output lists the “drift” of our model, which is the overall trend. In this case it is an upward slope of 0.0072 degrees per year.
Now, let’s use our model to create forecasts. First, let’s forecast the temperature for one year in the future. We use the forecast command, inputting the name of our model and the number of years in advance that we want to forecast. Since we want to predict for one year in the future, we use h=1.
myfcast1 <- forecast(mymod, h=1)
myfcast1
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2016 0.8031838 0.6743294 0.9320383 0.606118 1.00025
The output first gives us a point estimate for the temperature one year in the future. Then it gives us the upper and lower bounds for an 80% prediction interval and the uppwer and lower bounds for a 95% prediciton interval.
From our specific output, we predict that the temperature for one year in advance will be 0.8031838 degrees. Also, we are 80% confident that the temperature in one year will be between 0.6743294 degrees and 0.9320383 degrees. Finally, we are 95% confident that the temerature in one year will be between 0.606118 degrees and 1.00025 degrees.
Now we can plot our forecast to see how it compares ot the time series. Use the plot command, inputting the name of our forecast.
plot(myfcast1)
Our plot illustrates the time series with the balck line, and it plots our point prediction by the blue dot.
Now, let’s create a forecast for 10 years in the future. We will repeat the steps that we used to create our last forecast, but now we will use h=10 instead of h=1.
myfcast10 <- forecast(mymod, h=10)
myfcast10
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2016 0.8031838 0.6743294 0.9320383 0.6061180 1.000250
## 2017 0.7841177 0.6346035 0.9336320 0.5554555 1.012780
## 2018 0.7819961 0.6219774 0.9420147 0.5372687 1.026723
## 2019 0.7858872 0.6181356 0.9536388 0.5293333 1.042441
## 2020 0.7919120 0.6174348 0.9663893 0.5250721 1.058752
## 2021 0.7986940 0.6179620 0.9794260 0.5222882 1.075100
## 2022 0.8057447 0.6190423 0.9924470 0.5202080 1.091281
## 2023 0.8128907 0.6204288 1.0053525 0.5185456 1.107236
## 2024 0.8200705 0.6220253 1.0181156 0.5171866 1.122954
## 2025 0.8272623 0.6237901 1.0307345 0.5160785 1.138446
Our output gives us the point estimates and lower and upper bounds to 80% and 95% confidence intervals for the next 10 years. These values can be interpretted in the same way that they were previously.
Finally, let’s plot our 10-year forecast.
plot(myfcast10)
On the plot, the blue line represents the point predictions for the next 10 years. The dark blue shaded area represents the 80% confidence interval, and the light blue area represents the 95% confidence interval. We see that the areas of the confidence intervals get wider as the year increases; this is becasue it is more challenging to make predictions further into the future.