Textbook: Hyndman, R.J., & Athanasopoulos, G. (2018) Forecasting: principles and practice, 2nd edition, OTexts: Melbourne, Australia. Link. Accessed on October 2, 2020.
Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers, and 1,000 random numbers.
The difference among these figures is because as sample size increases (i.e. from 36 \(\rightarrow\) 360 \(\rightarrow\) 1000 random numbers), the autocorrelations approach zero. As depicted, the ACF bars of the data with the smallest number of samples are the tallest than the ACF with a larger number of samples. For white noise series, each autocorrelation is expected to be close to zero. If one or more large spikes are outside these bounds, or if substantially more than 5% of spikes are outside these bounds, then the series is probably not white noise. Thus based on this, all three figures are of white noise.
The critical values are defined to be within \(\pm \frac{1.96}{\sqrt{T}}\) where T is the length of the time series. As T gets larger, the absolute value of the critical value becomes smaller, thus with a smaller sample size of the time series, the ACF can capture the autocorrelation clearer. When the absolute value of critical values is large for smaller data set there should be larger autocorrelation than zero.
A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
ggtsdisplay(ibmclose, main = "Daily Closing IBM Stock Price", ylab = "Sales", xlab = "Time in Days")From the time series plot, it is evident that there are trends in this set. For example, there is a downward trend starting from around 210 to 270 days. The ACF highlights that the slow decrease in as the lags increase is due to the trend, but there is no “scalloped” shape, suggesting that there is no seasonality. From the partial autocorrelation (PACF), the 1st lag is nearly one, and all other PACF are close to zero, so other lags are autocorrelation. Thus, altogether confirming that it is non-stationary and should be differenced to make into a stationary time series.
For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
usnetelecggtsdisplay(usnetelec, main = "Annual US Net Electricity Generation", ylab = "Billion kWh", xlab = "Year")This time series has an upward trend, both depicted in the time series and the ACF plots. Moreover, the PACF suggests that the 1st lag is nearly one, and all other PACF are close to zero, so other lags are autocorrelation. This confirms that it is a non-stationary time series.
bct = function(ts, title){
a = autoplot(ts) + labs(title = sprintf("Before: %s", title))
lambda = BoxCox.lambda(ts)
at = autoplot(BoxCox(ts, lambda)) + labs(title = sprintf("Transformed: %s", title),
subtitle = sprintf("lambda = %0.2f", lambda))
gridExtra::grid.arrange(a, at)
}
bct(usnetelec, title = "Annual US Net Electricity Generation")There does not appear to be any noticeable changes that can help with better interpretation. This may be due to the lack of seasonality variation in the time series data. Nonetheless, a unit root test was conducted to determine more objectively whether differencing is required. The results suggest that for time series to be made stationary, the number of differences required is 2. The test statistic is much bigger than the 1% critical value, indicating that the null hypothesis is rejected. That is, the data are not stationary.
tst = BoxCox(usnetelec, BoxCox.lambda(usnetelec))
sprintf("The number of differences required for time series to be made stationary is %0.0f.",
ndiffs(tst, test = "kpss"))## [1] "The number of differences required for time series to be made stationary is 2."
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 3 lags.
##
## Value of test-statistic is: 1.4583
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
Now the time series is differenced, and the unit root test is applied once again. This time, the test statistic is smaller than the 1% critical value, and well within the range expected for stationary data. Moreover, the difference of order 1 produced all autocorrelation to be within the limit unlike with the difference of order 2. Thus, the differencing of the Box-Cox transformed data is stationary.
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 3 lags.
##
## Value of test-statistic is: 0.4315
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
ggtsdisplay(diff(tst, differences = 1), main = "Annual US Net Electricity Generation",
ylab = "Billion kWh", xlab = "Year")It is also interesting to note that since the time series does not have seasonality and only has an upward trend, it does not require a Box-Cox transformation. And so, applying the unit root test on the time series itself, the test statistic is smaller than the 1% critical value, and well within the range expected for stationary data. Thus, the differencing of the data is stationary.
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 3 lags.
##
## Value of test-statistic is: 0.1585
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
ggtsdisplay(diff(usnetelec, differences = 1), main = "Annual US Net Electricity Generation",
ylab = "Billion kWh", xlab = "Year")usgdpIt is evident that this time series has an upward trend, as depicted in the time series. This confirms that it is a non-stationary time series. The plot above fits with a transformation of \(lambda=0.37\), and it is apparent that the curvature of the graph was reduced. Thus, a unit root test was conducted and the results suggest that for time series to be made stationary, the number of differences required is 1.
tst = BoxCox(usgdp, BoxCox.lambda(usgdp))
sprintf("The number of differences required for time series to be made stationary is %0.0f.",
ndiffs(tst, test = "kpss"))## [1] "The number of differences required for time series to be made stationary is 1."
With the test statistic being smaller than the 1% critical value, the differencing of the Box-Cox transformed data is now stationary.
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 4 lags.
##
## Value of test-statistic is: 0.2013
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
mcopperThis time series has a slight seasonality variation, as depicted in the time series. There are also periods of both upward and downward trends over the years. This confirms that it is a non-stationary time series. The plot above fits with a transformation of \(\lambda=0.19\), and the transformation allowed seasonal variation to be more observably stable. The ndiff suggest that for time series to be made stationary, the number of differences required is 1.
tst = BoxCox(mcopper, BoxCox.lambda(mcopper))
sprintf("The number of differences required for time series to be made stationary is %0.0f.",
ndiffs(tst, test = "kpss"))## [1] "The number of differences required for time series to be made stationary is 1."
The data can be seen to be stationary after 1964. The test statistic is also smaller than the 1% critical value, making this transformed data stationary.
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 6 lags.
##
## Value of test-statistic is: 0.0573
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
ggtsdisplay(diff(tst, differences = 1), main = "Monthly Copper Prices", ylab = "Price", xlab = "Year")enplanementsggtsdisplay(enplanements, main = "Monthly US Domestic Enplanements", ylab = "$Million", xlab = "Year")This time series has seasonality variation and an upward trend, as depicted in the time series. There is an unusually large drop in the series in the year 2002. Nonetheless, it can be stated that it is a non-stationary time series. The plot above fits with a transformation of \(\lambda=-0.23\), and this transformation was also able to stabilize the variation in the data. Furthermore, the results suggest that for time series to be made stationary, the number of differences required is 1.
tst = BoxCox(enplanements, BoxCox.lambda(enplanements))
sprintf("The number of differences required for time series to be made stationary is %0.0f.",
ndiffs(tst, test = "kpss"))## [1] "The number of differences required for time series to be made stationary is 1."
With some seasonality, first, a seasonal difference is done, followed by an additional difference. The plots reveal that most of the autocorrelations are within the 95% limit. The test statistic is also smaller than the 1% critical value, and well within the range expected for stationary data. Note that the unusual drop in 2002 is still present when the time series is differenced. It can be concluded that the differencing of the Box-Cox transformed data made it stationary.
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 0.0424
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
visitorsggtsdisplay(visitors, main = "Monthly Australian Short-Term Overseas Vistors", ylab = "# of Vistors",
xlab = "Year")This time series has seasonality variation and an upward trend, as depicted in the time series. Thus, it is a non-stationary time series. The plot above fits with a transformation of \(\lambda=0.28\), and this transformation was also able to stabilize the variation in the data. The results suggest that for time series to be made stationary, the number of differences required is 1.
tst = BoxCox(visitors, BoxCox.lambda(visitors))
sprintf("The number of differences required for time series to be made stationary is %0.0f.",
ndiffs(tst, test = "kpss"))## [1] "The number of differences required for time series to be made stationary is 1."
Because of the seasonality variation, first, a seasonal difference is done, followed by an additional difference. The plots reveal that most of the autocorrelations are within the 95% limit. The test statistic is also smaller than the 1% critical value, thus differencing made the Box-Cox transformed data stationary.
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 4 lags.
##
## Value of test-statistic is: 0.0158
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
ggtsdisplay(newts, main = "Monthly Australian Short-Term Overseas Vistors",
ylab = "# of Vistors", xlab = "Year")For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
retaildata = readxl::read_excel("retail.xlsx", skip = 1)
myts = ts(retaildata[, 13], frequency = 12, start = c(1982,4))
ggtsdisplay(myts, main = "New South Wales - Department Stores", xlab = "Year", ylab = "Sales")This time series has seasonality variation and an upward trend, as depicted in the time series. Thus, it is a non-stationary time series. Moreover, the optimal Box-Cox transformation is with \(\lambda=0.21\). The resulting transformation highlights the stabilization of the seasonal variability throughout the years. The plot shows how the data before 1990, with smaller variability, were stretched, while after the year 2000, with larger variability, were diminished. The result suggest that for time series to be made stationary, the number of differences required is 1.
tst = BoxCox(myts, BoxCox.lambda(myts))
sprintf("The number of differences required for time series to be made stationary is %0.0f.",
ndiffs(tst, test = "kpss"))## [1] "The number of differences required for time series to be made stationary is 1."
Due to the seasonality variation, a seasonal difference is done, followed by an additional difference. The plots reveal that most of the autocorrelations are within the 95% limit. A unit root test and the test statistic highlighted to be smaller than the 1% critical value. Therefore, tt can be concluded that the differencing of the Box-Cox transformed data is now made stationary.
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 0.0119
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
Use R to simulate and plot some data from simple ARIMA models.
For an AR(1) model: \(−1<\phi_1<1\). When \(\phi_1<0,y_t\) tends to oscillate around the mean and appears to be a much more sawtooth time series, when \(\phi_1=0,y_t\) is equivalent to white noise, and when \(\phi_1>0,y_t\) show some higher autocorrelation.
phi = c(-0.5, 0, 0.5)
for (i in 1:3){
y = ts(numeric(100))
e = rnorm(100)
for(j in 2:100){
y[j] = phi[i]*y[j-1] + e[j]
}
p = autoplot(y) + labs(title = sprintf("phi = %0.1f", phi[i]))
acf = ggAcf(y) + labs(title = sprintf("phi = %0.1f", phi[i]))
gridExtra::grid.arrange(p,acf, ncol = 2)
}For an MA(1) model: \(−1<\theta_1<1\)
ma.1 = function(theta, sigma, n){
y = ts(numeric(n))
e = rnorm(n, sigma)
for(i in 2:n)
y[i] = theta*e[i-1] + e[i]
return(y)
}Similar to AR(1) model, MA(1) models changes as \(\theta_1\) changes: \(−1<\theta_1<1\). Setting \(|\theta_1|<1\), the most recent observations have higher weight than observations from the more distant past making the process is invertible. Thus, when \(|\theta_1|\neq 0,y_t\) tends to oscillate around the mean and show some higher autocorrelation, and when \(\theta_1=0,y_t\) is equivalent to white noise.
theta = c(-0.5, 0, 0.5)
sigma = 1
n = 100
for (i in 1:3){
y = ma.1(theta[i], sigma, n)
p = autoplot(y) + labs(title = sprintf("theta = %0.1f", theta[i]))
acf = ggAcf(y) + labs(title = sprintf("theta = %0.1f", theta[i]))
gridExtra::grid.arrange(p,acf, ncol = 2)
}set.seed(525)
phi = 0.6
theta = 0.6
sigma = 1
y1 = ts(numeric(100))
e = rnorm(1000, sigma)
for(i in 2:100)
y1[i] = phi*y1[i-1] + theta*e[i-1] + e[i]
p1 = autoplot(y1) + labs(y = "y", title = expression(paste("ARMA(1,1): ", phi[1], "= 0.6, ", theta[1], "= 0.6")))
acf1 = ggAcf(y1) + labs(y = "y", title = expression(paste("ARMA(1,1): ", phi[1], "= 0.6, ", theta[1], "= 0.6")))
gridExtra::grid.arrange(p1, acf1, ncol = 2)Because \(|\phi_1| = 0.6 < 1\) and \(|\theta_1| = 0.6 < 1\), this process is both stationary and invertible.
set.seed(300)
phi_1 = -0.8
phi_2 = 0.3
sigma = 1
y2 = ts(numeric(100))
e = rnorm(100, sigma)
for(i in 3:100)
y2[i] = phi_1*y2[i-1] + phi_2*y2[i-2] + e[i]
p2 = autoplot(y2) + labs(y = "y", title = expression(paste("AR(2): ", phi[1], "= -0.8, ", phi[2], "= 0.3")))
acf2 = ggAcf(y2) + labs(y = "y", title = expression(paste("AR(2): ", phi[1], "= -0.8, ", phi[2], "= 0.3")))
gridExtra::grid.arrange(p2, acf2, ncol = 2)The constraints on the values of the parameters resulted in a non-stationary series. For a stationary data, the AR(2) model requires \(-1<\phi_2<1, \phi_1+\phi_2<1,\phi_2-\phi_1<1.\) Thus, when simulating higher-order AR(p) models, care must be taken when choosing a set of coefficients that result in a stationary model.
An AR process is stationary if it is invertible, and given the parameter, non-stationary data were generated. An ARMA(p,q) process is stationary if its autoregressive lag polynomial is invertible. From the two generated models above, the ARMA(1,1) generated data shows that is a seasonal and non-seasonal component. The ARMA(1,1) processes show geometric decay in the ACF and significant until the first lag. On the other hand, the AR(2) model shows that the amplitude of the curve increases exponentially over time. The ACF of the AR(2) model alternates between positive and negative values of autocorrelation because \(\phi_1\) is negative. The AR(2) model shows a geometric decay at every other lag (due to the \(\phi_2\) parameter).
Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.
This time series has an upward trend, both depicted in the time series and the ACF plots. Moreover, the PACF suggests that the 1st lag is nearly one, and all other PACF are white noise. This confirms that it is a non-stationary time series. Due to the lack of seasonality variation, Box-Cox transformation is not required.
sprintf("The number of differences required for time series to be made stationary is %0.0f.",
ndiffs(wmurders, test = "kpss"))## [1] "The number of differences required for time series to be made stationary is 2."
Furthermore, a unit root test resulted in a smaller than the 1% critical value, and fell well within the range expected for stationary data. Thus, the differencing of the data has now made it stationary.
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 3 lags.
##
## Value of test-statistic is: 0.0458
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
Ignoring one significant spike in each plot that is just outside the limits, and not in the first few lags, the PACF plots is exponentially decaying and there is a significant spike at lag 1 in the PACF, but none beyond lag 1. This suggests p = 1. The ndiffs function suggested a difference of order is 2, thus d = 2. Lastly, the ACF shows a significant spike at 2, suggesting q = 2.
An ARIMA(p,d,q) model is where p = order of the autoregressive part; d = degree of first differencing involved; q = order of the moving average part. Altogether, a possible model is ARIMA(1,2,2).
If c = 0 and d = 2, the long-term forecasts will follow a straight line. If c \(\neq\) 0 and d = 2, the long-term forecasts will follow a quadratic trend. It is recommended that a quadratic or higher-order trend is dangerous for forecasting. In this case d = 2, thus a constant is will be omitted from this model.
\[ (1 - \phi_1 B)(1 - B)^2 y_t = (1 + \theta_1 B + \theta_2 B^2) \epsilon_t \]
## Series: wmurders
## ARIMA(1,2,2)
##
## Coefficients:
## ar1 ma1 ma2
## -0.7677 -0.2812 -0.4977
## s.e. 0.2350 0.2879 0.2762
##
## sigma^2 estimated as 0.04552: log likelihood=7.38
## AIC=-6.75 AICc=-5.92 BIC=1.13
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,2,2)
## Q* = 9.6215, df = 7, p-value = 0.2111
##
## Model df: 3. Total lags used: 10
The ACF plot of the residuals from the ARIMA(1,2,2) model shows that all autocorrelations are within the threshold limits, indicating that the residuals are behaving like white noise. The residual time plot highlights a few large positive and negative residuals after 1975, and quite at the end after 2000. Next, the histogram suggests that the residuals are not extremely skewed as the mean of the residuals is near zero. And lastly, based on the Ljung-Box test, the residuals are not distinguishable from a white noise series, \(Q^∗=9.62, df=7, p−value>0.05\).
Thus, the model is
\[ (1 + 0.77 B)(1 - B)^2 y_t = (1 - 0.28 B - 0.50 B^2) \epsilon_t \]
Using R, the forecast threes times ahead is:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2005 2.534015 2.260584 2.807446 2.115838 2.952192
## 2006 2.404157 2.026793 2.781521 1.827029 2.981286
## 2007 2.331482 1.829669 2.833296 1.564025 3.098940
By “hand”, the forecast three times ahead is:
\[\begin{equation} \begin{aligned} (1 - \phi_1B)(1 - B)^2 y_t = (1 + \theta_1 B + \theta_2 B^2) \epsilon_t \\ (1 - 2B + B^2 + \phi_1 B + 2\phi_1 B^2 - \phi_1 B^3)y_t = (1 + \theta_1 B + \theta_2 B^2) \epsilon_t \\ y_t - 2y_{t-1} + y_{t-2} + \phi_1y_{t-1} + 2 \phi_1 y_{t-2} - \phi_1 y_{t-3} = \epsilon_t + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} \\ \end{aligned} \end{equation}\]
Transposing and factoring, the equation becomes:
\[ y_t = (2 + \phi_1)y_{t-1} - (1 + 2\phi_1)y_{t-2} + \phi_1y_{t-3} + \epsilon_t + \theta_1\epsilon_{t-1} + \theta_2 \epsilon_{t-2} \]
And with \(\phi_1 = -0.7677\), \(\theta_1 = -0.2812\), \(\theta_2 = -0.4977\), and \(\epsilon_t = 0\), the formula and calculations for the next three year forecast are:
t = length(wmurders)
e = fit$residuals
fc1 = (2-0.7677)*wmurders[t] - (1-2*0.7677)*wmurders[t-1] - 0.7677*wmurders[t-2] - 0.2812*e[t] - 0.4977*e[t-1]
fc2 = (2-0.7677)*fc1 - (1-2*0.7677)*wmurders[t] - 0.7677*wmurders[t-1] - 0.2812*0 - 0.4977*e[t]
fc3 = (2-0.7677)*fc2 - (1-2*0.7677)*fc1 - 0.7677*wmurders[t] - 0.2812*0 - 0.4977*0
sprintf("Next three years forecast are %s.", paste(round(c(fc1, fc2, fc3),3), collapse=", "))## [1] "Next three years forecast are 2.534, 2.404, 2.331."
With a difference of less than 0.001, these values are very close to that calculated by R.
The plot shows that the forecast for the next three times is following the apparent decreasing trend.
auto.arima() give the same model you have chosen? If not, which model do you think is better?With varying parameter settings, the auto.arima would result in the model of ARIMA(1,2,1).
## Series: wmurders
## ARIMA(1,2,1)
##
## Coefficients:
## ar1 ma1
## -0.2434 -0.8261
## s.e. 0.1553 0.1143
##
## sigma^2 estimated as 0.04632: log likelihood=6.44
## AIC=-6.88 AICc=-6.39 BIC=-0.97
sprintf("The RMSE of ARIMA(1,2,2) is %0.3f, while the RMSE for the auto ARIMA, i.e. ARIMA(1,2,1), is %0.3f.",
accuracy(fit)[2],accuracy(fit2)[2])## [1] "The RMSE of ARIMA(1,2,2) is 0.203, while the RMSE for the auto ARIMA, i.e. ARIMA(1,2,1), is 0.207."
sprintf("The AIC of ARIMA(1,2,2) is %0.3f, while the AIC for the auto ARIMA, i.e. ARIMA(1,2,1), is %0.3f.",
fit[["aic"]],fit2[["aic"]])## [1] "The AIC of ARIMA(1,2,2) is -6.753, while the AIC for the auto ARIMA, i.e. ARIMA(1,2,1), is -6.880."
ARIMA(1,2,2) was the better performing model for all of the metrics. It has the smallest RSME of the models. RMSE is a measure of how far off predictions are from the actual, giving extra weight to outliers through summing the squared differences. This demonstrates a greater reduction in the randomness by ARIMA(1,2,2). But its difference from ARIMA(1,2,1) is only 0.004. It is hard to judge if that is a significant difference. Thus, consider the AIC. With AIC, it is not the magnitude of the value which is important, but the difference between the various model AIC’s. The magnitude is a function of the number of observations. Accepted rules of thumb in the statistical literature are that if the difference is greater than 10, there is effectively no support for the model with the larger AIC. As a result, the difference is only 0.127 < 2, suggesting there is no difference in models.
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2005 2.470660 2.194836 2.746484 2.048824 2.892496
## 2006 2.363106 1.986351 2.739862 1.786908 2.939304
## 2007 2.252833 1.765391 2.740276 1.507354 2.998313
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,2,1)
## Q* = 12.419, df = 8, p-value = 0.1335
##
## Model df: 2. Total lags used: 10
The forecast and plots above are for ARIMA(1,2,1). Using this model, the forecast for the next three years was found, and when compared to ARIMA(1,2,2), the difference is quite small. Also, there are a time plot, ACF plot, and histogram of the residuals (with an overlaid normal distribution for comparison). Firstly, the residual time plot highlights a few large positive and negative residuals after 1975, and quite at the end after 2000. Next, the histogram suggests that the residuals are not extremely skewed as the mean of the residuals is near zero. Moreover, there are no significant correlations in the residuals series. And lastly, based on the Ljung-Box test, the residuals are not distinguishable from a white noise series, \(Q^∗=12.42, df=8, p−value>0.05\). Overall, it seems that this model is likely to make good predictions as well.
However, based on the smaller error produced by ARIMA(1,2,2), it is concluded to be the better model of the two.