We’ll use an example dataset of number of births per month in New York City from January 1946 to December 1959. (Available at: http://robjhyndman.com/tsdldata/data/nybirths.dat)
# Loading Libraries
library(TTR)
# Loading Data
births <- scan("http://robjhyndman.com/tsdldata/data/nybirths.dat")
# Convert to Time Series
birthstimeseries <- ts(births, frequency=12, start=c(1946,1))
# Display Output
birthstimeseries
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1946 26.663 23.598 26.931 24.740 25.806 24.364 24.477 23.901 23.175 23.227 21.672 21.870
## 1947 21.439 21.089 23.709 21.669 21.752 20.761 23.479 23.824 23.105 23.110 21.759 22.073
## 1948 21.937 20.035 23.590 21.672 22.222 22.123 23.950 23.504 22.238 23.142 21.059 21.573
## 1949 21.548 20.000 22.424 20.615 21.761 22.874 24.104 23.748 23.262 22.907 21.519 22.025
## 1950 22.604 20.894 24.677 23.673 25.320 23.583 24.671 24.454 24.122 24.252 22.084 22.991
## 1951 23.287 23.049 25.076 24.037 24.430 24.667 26.451 25.618 25.014 25.110 22.964 23.981
## 1952 23.798 22.270 24.775 22.646 23.988 24.737 26.276 25.816 25.210 25.199 23.162 24.707
## 1953 24.364 22.644 25.565 24.062 25.431 24.635 27.009 26.606 26.268 26.462 25.246 25.180
## 1954 24.657 23.304 26.982 26.199 27.210 26.122 26.706 26.878 26.152 26.379 24.712 25.688
## 1955 24.990 24.239 26.721 23.475 24.767 26.219 28.361 28.599 27.914 27.784 25.693 26.881
## 1956 26.217 24.218 27.914 26.975 28.527 27.139 28.982 28.169 28.056 29.136 26.291 26.987
## 1957 26.589 24.848 27.543 26.896 28.878 27.390 28.065 28.141 29.048 28.484 26.634 27.735
## 1958 27.132 24.924 28.963 26.589 27.931 28.009 29.229 28.759 28.405 27.945 25.912 26.619
## 1959 26.076 25.286 27.660 25.951 26.398 25.565 28.865 30.000 29.261 29.012 26.992 27.897
# Get A Sense of Plot
plot(birthstimeseries)
What different components are there in this time series?
# Creating Decomposed Dataset
birthstimeseriescomponents <- decompose(birthstimeseries)
# Looking at the Different Componants
plot(birthstimeseriescomponents)
# What if we Want to Know What Month Most Babies Are Born in?
round(birthstimeseriescomponents$seasonal, digits = 2)
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1946 -0.68 -2.08 0.86 -0.80 0.25 -0.15 1.46 1.16 0.69 0.78 -1.11 -0.38
## 1947 -0.68 -2.08 0.86 -0.80 0.25 -0.15 1.46 1.16 0.69 0.78 -1.11 -0.38
## 1948 -0.68 -2.08 0.86 -0.80 0.25 -0.15 1.46 1.16 0.69 0.78 -1.11 -0.38
## 1949 -0.68 -2.08 0.86 -0.80 0.25 -0.15 1.46 1.16 0.69 0.78 -1.11 -0.38
## 1950 -0.68 -2.08 0.86 -0.80 0.25 -0.15 1.46 1.16 0.69 0.78 -1.11 -0.38
## 1951 -0.68 -2.08 0.86 -0.80 0.25 -0.15 1.46 1.16 0.69 0.78 -1.11 -0.38
## 1952 -0.68 -2.08 0.86 -0.80 0.25 -0.15 1.46 1.16 0.69 0.78 -1.11 -0.38
## 1953 -0.68 -2.08 0.86 -0.80 0.25 -0.15 1.46 1.16 0.69 0.78 -1.11 -0.38
## 1954 -0.68 -2.08 0.86 -0.80 0.25 -0.15 1.46 1.16 0.69 0.78 -1.11 -0.38
## 1955 -0.68 -2.08 0.86 -0.80 0.25 -0.15 1.46 1.16 0.69 0.78 -1.11 -0.38
## 1956 -0.68 -2.08 0.86 -0.80 0.25 -0.15 1.46 1.16 0.69 0.78 -1.11 -0.38
## 1957 -0.68 -2.08 0.86 -0.80 0.25 -0.15 1.46 1.16 0.69 0.78 -1.11 -0.38
## 1958 -0.68 -2.08 0.86 -0.80 0.25 -0.15 1.46 1.16 0.69 0.78 -1.11 -0.38
## 1959 -0.68 -2.08 0.86 -0.80 0.25 -0.15 1.46 1.16 0.69 0.78 -1.11 -0.38
Looks like July (although the rest of late summer/early fall is rather high as well)
The first method we’re going to look at is the autoregressive function, or \(AR(p)\)
\(AR(p)\) is estimating a model based on previous lags of the series
\(AR(1)\) is using just the past lag, \(AR(2)\) uses the second lag, etc.
Let’s try forecasting with just the first lag (\(AR(1)\))
# Loading Libraries
library(forecast)
# Fitting AR(1) to Model
fit <- Arima(birthstimeseries,
order=c(1,0,0),
seasonal = c(0, 0, 0))
# Original Plot in Red, Fitted in Blue
plot(fit$x,col="red")
lines(fitted(fit),col="blue")
# Absolute Sum of Residuals
sum(abs(fit$residuals))
## [1] 191.1206
Not terrible, we’re overestimating in the beginning and underestimating at the end. Also if you look closely, our time series increases when it should be going down making the gaps in prediction much higher than it appears…
Let’s try an \(AR(2)\) model
# Fitting AR(2) to Model
fit <- Arima(birthstimeseries,
order=c(2,0,0),
seasonal = c(0, 0, 0)
)
# Original Plot in Red, Fitted in Blue
plot(fit$x,col="red")
lines(fitted(fit),col="blue")
# Absolute Sum of Residuals
sum(abs(fit$residuals))
## [1] 164.1701
Looks like our error went down. In fact, our error will monotonically decrease as more parameters are added to the model. We’ll talk about how to avoid overfitting in a later section.
The autoregressive function applied coefficients to The Previous Value in order to predict the next value. The moving average [\(MA(q)\)] function applies coefficients to The Difference Between the Previous Value and the Mean of the Series. For simplicity’s sake we’ll define the residual as the difference between the observed value and mean of the series
Like \(AR(p)\), \(MA(q)\) can be applied to just the previous residual, the past two, past three, etc.
Let’s see how well the \(MA(1)\) fits the time series
# Fitting MA(1) to Model
fit <- Arima(birthstimeseries,
order=c(0,0,1),
seasonal = c(0, 0, 0)
)
# Original Plot in Red, Fitted in Blue
plot(fit$x,col="red")
lines(fitted(fit),col="blue")
# Absolute Sum of Residuals
sum(abs(fit$residuals))
## [1] 256.213
Wow, that was a much higher error compared to the \(AR(1)\) and \(AR(2)\). Let’s see if the \(MA(2)\) has any better luck…
# Fitting MA(2) to Model
fit <- Arima(birthstimeseries,
order=c(0,0,2),
seasonal = c(0, 0, 0)
)
# Original Plot in Red, Fitted in Blue
plot(fit$x,col="red")
lines(fitted(fit),col="blue")
# Absolute Sum of Residuals
sum(abs(fit$residuals))
## [1] 207.5665
Better but still not nearly as good as the \(AR(2)\)
You may be thinking, “But wait, if the \(MA(p)\) applies coefficients to the difference of the residual, and our trend increases, won’t that warp our predictions?”
Exactly! Which is where difference or integration comes into play
The above methods (AR and MA) only work if the time series is stationary. (Weakly) stationary series are defined as having constant mean, variance and autocovariance.
You can test if a time series is stationary or not using an Augmented Dickey-Fuller (http://www.statisticshowto.com/adf-augmented-dickey-fuller-test/) or KPSS (http://www.statisticshowto.com/kpss-test/) test.
Let’s try differencing our model:
# Loading Libraries
library(tseries)
# Using KPSS to check number of differences needed
ndiffs(birthstimeseries, test = "kpss")
## [1] 1
# Applying First Difference to Time Series
birthstimeseries1 <- diff(birthstimeseries, differences=1)
# Plotting Time Series
plot(birthstimeseries1)
Yes, this is much more stationary
Each of the things we learned (AR, MA, Integrating) can also be done seasonally
A seasonal AR(p) model applies a coefficient to last period’s value in the same season. For example, if it was March 2008, a seasonal AR(1) model would apply some coefficient to March 2007 in order to predict March 2008
Likewise, the MA(q) model applies a coefficient to last period’s residual in the same way.
It may also be necessary to remove seasonal non-stationarity from a time series. An example of this may be if Black Friday sales get bigger and bigger every year, but sales in other months stay the same. In that case, we would have seasonal non-stationarity. To check this use nsdiffs instead of ndiffs.
# Fitting SAR(1) to Model
fit <- Arima(birthstimeseries,
order=c(0,0,0),
seasonal = c(1, 0, 0)
)
# Original Plot in Red, Fitted in Blue
plot(fit$x,col="red")
lines(fitted(fit),col="blue")
# Absolute Sum of Residuals
sum(abs(fit$residuals))
## [1] 151.8576
The seasonal AR(1) has the lowest error yet! Let’s see what the error on the seasonal MA(1) is.
# Fitting SMA(1) to Model
fit <- Arima(birthstimeseries,
order=c(0,0,0),
seasonal = c(0, 0, 1)
)
# Original Plot in Red, Fitted in Blue
plot(fit$x,col="red")
lines(fitted(fit),col="blue")
# Absolute Sum of Residuals
sum(abs(fit$residuals))
## [1] 228.1871
Second worst only to the nonseasonal MA(1)
As you might have guessed by now, all 6 of these functions can be combined in a single equation.
We call this process (somewhat unimaginatively) the Autoregressive Integrated Moving Average or ARIMA(p, d, q). The p indicates the order of autoregressive function, d is the number of times the time series is differenced, and the q is the order of the moving average function. When seasonality is needed, we can expand the notation to ARIMA(p, d, q)(P, D, Q)[S]. P is the seasonal order of the autoregressive function, D is the seasonal differencing, and Q is the order of the seasonal moving average. S is the number of seasons per period, e.g. if we have monthly data, S=12 for each month in a year.
To estimate our ARIMA model we can use the package auto.arima to determine the form of our equation and estimate our parameters. We can also do this process by hand, but that might be a talk for another day.
# Fitting ARIMA to Model
fit <- auto.arima(birthstimeseries, approximation = FALSE)
# Original Plot in Red, Fitted in Blue
plot(fit$x,col="red")
lines(fitted(fit),col="blue")
# Absolute Sum of Residuals
sum(abs(fit$residuals))
## [1] 73.68962
Not surprisingly, this is the model that minimizes our average error. But there are still some questions.
auto.arima overfitting the data?Let’s answer the first question, and then talk about model selection.
# Fitting ARIMA to Model
fit <- auto.arima(birthstimeseries, approximation = FALSE)
# Retreiving Model
arimaorder(fit)
## [1] 3 1 2 2 1 1 12
In our notation we have a ARIMA(3, 1, 2)(2, 1, 1)[12].
Models are selected though the process of minimizing “Information Criterion”.
AIC first one created. Stands for Akaike information criterion. Using this criterion, the best model minimizes \(AIC = 2k - ln(L)\) where k is the number of parameters and ln(L) is the maximum value of the likelihood function for the model. (auto.arima’s default is AIC)
BIC Calculates the penalty term differently than AIC (BIC uses ln(n)*k rather than 2k where n is the sample size). Statisticians have argued about which is better usually along the lines that AIC/AICc is more practical/more sensical while BIC can theoretically calculate the “true model.”
AICc Is a further AIC correction. When the sample size is small, the AIC will select too many parameters and overfit the data. The AICc assumes the model is univariate, linear in parameters, and has normally distributed residuals. The AICc uses the following formula instead: \(AIC + \frac{2k^2 + 2k}{n - k - 1}\) where k is the number of parameters and n is the sample size of the time series. AICc adds a greater penalty for smaller sample sizes but, as \(n\to\infty\) AICc converges to AIC.
Most forecasts aren’t great. Even if you have selected the best model you can do for this series, it may still not be that accurate… A thorough explanation of different methods of forecasting are presented by Rob J Hyndman in his paper found here: https://robjhyndman.com/papers/mase.pdf.
Using the Mean Absolute Scaled Error we can test the ability of a forecast to predict out of sample data. MASE is equal to \(Mean(|\frac{e_t}{\frac{1}{n-1}|Y_i-Y_{i-1}|}|)\)
For our series we can withhold two periods worth of data (2 years), estimate our model based on the remaining years, and check how close we get to the actual numbers.
Our comparison is a random walk, or lagged version of our time series. If the calculated MASE is less than 1, then our time series predicted those values better than random walk, if it is greater then our model performed worse than the random walk.
# Load Libraries
library(forecast)
library(zoo)
# Fit ARIMA(p, d, q) For All But Most Recent Two Years)
fitknown <- auto.arima(as.ts(head(as.zoo(birthstimeseries), -24)))
# Forecast Known Values (Most Recent Two Years)
fcastknown <- forecast(fitknown)
# Calculating MASE
mean(abs((tail(as.numeric(birthstimeseries), 23) - tail(as.numeric(fcastknown$mean), 23)) /
((1/23)*(sum(abs(tail(as.numeric(birthstimeseries), 23) - tail(head(lag(as.numeric(birthstimeseries)), -1), 23)))))
))
## [1] 0.9441733
Nice, the method we used was better than lagged method for the most recent two years. Hopefully that bodes well in predicting future values
In addition to just using an arima model, if you have information about another variable that is recorded in the same frequency as your time series object, you can include it in the forecast as a regressor. Of course this needs to be a variable that you know the future value of otherwise the estimated relationship between it and your time series variable is moot.
Machine Learning can also be incorporated in forecasts. Unfortunately, they tend to work about just as well as traditional statistical models and are more computationally expensive. Below is an example of using neural networks in time series if you are interested, but that too will likely be a topic for another time.
https://www.r-bloggers.com/time-series-forecasting-with-recurrent-neural-networks/