This learning log is focused on forecasting. ARIMA stands for autoregressive integrated moving average models, and those are what will be used to forecast.

Autoregressive Model

We use an autoregressive model when there is some sort of idea that the previous points of time series data has an effect on the next point. It is the attempt to fit a linear model to the next point using the previous ones.

We can call this technique using the auto.arima() function on our time series data.

(arima1 <- auto.arima(TimeseriesData))
## Series: TimeseriesData 
## ARIMA(1,1,0) 
## 
## Coefficients:
##          ar1
##       0.2359
## s.e.  0.0243
## 
## sigma^2 estimated as 33161:  log likelihood=-10974.67
## AIC=21953.35   AICc=21953.35   BIC=21964.17

The coefficients explain how linear model is using the last data point in addition to the error term. There is a positive correlation with the previous point.

Moving Average Model

The moving average method uses one or more error terms from the previous residuals as a linear combination to model the next point.

We can call this method using the arima.sim() function with a specific piece inside to show that we want the moving average. One of the key pieces in the code is the ma=“” portion which specifies the Theta value of the moving average model’s error term. I am simulating time series data here.

plot(arima.sim(list(order=c(0,0,1), ma=.9), n = 200), main = "simulated time series data MA", ylab = "Value")

plot(arima.sim(list(order=c(0,0,1), ma=.1), n = 200), main = "simulated time series data MA", ylab = "Value")

ARIMA forecasting

The full arima model takes both moving average and automatic regression into the model along with an additional “difference” term. The extra term is for technical reasons. (to maintain mean value 0 and homoscedasticity)

It is super simple to implement forecasting with a complete model. The code below shows how to project using forecast() with the arguments being the model and h=“” which means the number of points you wish to forecast.

forecast(arima1,h=12)
##      Point Forecast      Lo 80      Hi 80      Lo 95      Hi 95
## 1659      -336.5657  -569.9383 -103.19316  -693.4783   20.34681
## 1660      -416.1958  -787.2095  -45.18211  -983.6122  151.22066
## 1661      -434.9801  -912.9962   43.03592 -1166.0426  296.08233
## 1662      -439.4113 -1006.1595  127.33702 -1306.1779  427.35543
## 1663      -440.4565 -1084.1556  203.24250 -1424.9093  543.99618
## 1664      -440.7031 -1153.1618  271.75557 -1530.3146  648.90841
## 1665      -440.7613 -1215.9199  334.39732 -1626.2640  744.74149
## 1666      -440.7750 -1273.9317  392.38173 -1714.9783  833.42825
## 1667      -440.7782 -1328.1512  446.59473 -1797.8981  916.34160
## 1668      -440.7790 -1379.2414  497.68343 -1876.0334  994.47544
## 1669      -440.7792 -1427.6899  546.13155 -1950.1289 1068.57053
## 1670      -440.7792 -1473.8687  592.31024 -2020.7532 1139.19477

Output from this chunck is super easy to understand. Each point has a projected mean and descriptive quantiles. Lets look at this differently with a plot that includes the estimates and quantiles.

plot(forecast(arima1,h=20))

There is obviously something going wrong here, but you can see the range it provides with the dark blue line being the projection with ranges shown in shades of blue.