if (!require("fma")) install.packages("fma")
if (!require("fpp")) install.packages("fpp")
library(fma)
library(fpp)
The below figure shows the ACFs for 36 random numbers, 360 random numbers, and for 1,000 random numbers.
Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers.
Explain the differences among these figures. Do they all indicate the data are white noise?
The ARIMA(0, 0, 0) model with all zero components represents a White Noise model with independent and identically distributed data. The arima.sim()
function which uses a default rand.gen = rnorm
can be used to simulate data from a variety of time series models. This means that using arima.sim(model = list(order = c(0, 0, 0)), n = 1)
and rnorm(1)
are two equivalent ways of simulating white noise.
Box.test(arima.sim(model = list(order = c(0, 0, 0)), n = 36), type="Ljung")
##
## Box-Ljung test
##
## data: arima.sim(model = list(order = c(0, 0, 0)), n = 36)
## X-squared = 1.9546, df = 1, p-value = 0.1621
Box.test(arima.sim(model = list(order = c(0, 0, 0)), n = 360), type="Ljung")
##
## Box-Ljung test
##
## data: arima.sim(model = list(order = c(0, 0, 0)), n = 360)
## X-squared = 1.5324, df = 1, p-value = 0.2158
Box.test(arima.sim(model = list(order = c(0, 0, 0)), n = 1000), type="Ljung")
##
## Box-Ljung test
##
## data: arima.sim(model = list(order = c(0, 0, 0)), n = 1000)
## X-squared = 0.11712, df = 1, p-value = 0.7322
The differences between the three figures are the number of lags displayed and the distances of the boundary limits (dashed blue lines) from zero which are both related to the sample size. All the ACF plots indicate the data are white noise since over 95% of the lags fall inside the boundaries. To further demonstrate this, Box-Ljung tests are performed on White Noise data of each sample size to show that autocorrelation in white noise (regardless of sample size) is not significantly different from zero. Although both arima.sim(model = list(order = c(0, 0, 0)), n = 1)
and rnorm(1)
can be used to simulate a white noise model, the former was used to facilitate understanding of the problem.
Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?
The boundary limits (dashed blue lines) produced by the acf()
function are determined by the values entered in the arguments for the random number distribution (rand.gen
) and for the percentage value used for the confidence intervals (level
). The default values are rand.gen=rnorm
and level=95
.
alpha <- 1 - 0.95
for (n in c(36, 360, 1000)) {
plotLags = floor(10 * log10(n / 1))
lower = qnorm(alpha / 2, lower.tail=TRUE) / sqrt(n)
upper = qnorm(alpha / 2, lower.tail=FALSE) / sqrt(n)
width = upper - lower
print(data.frame(n, plotLags, lower, upper, width))
}
## n plotLags lower upper width
## 1 36 15 -0.3266607 0.3266607 0.6533213
## n plotLags lower upper width
## 1 360 25 -0.1032992 0.1032992 0.2065983
## n plotLags lower upper width
## 1 1000 30 -0.0619795 0.0619795 0.123959
The number of lags displayed and the distances of the boundary limits (dashed blue lines) from zero both use functions that involve the sample size. The number of lags displayed on the plot is equal to \(\min { \left[ \left\lfloor 10\log _{ 10 }{ \left( { n }/{ m } \right) } \right\rfloor ,n-1 \right] }\) where \(n\) is the number of observations and \(m\) the number of series. The distance of the boundary limits from zero is equivalent to \(\pm { z_{{\alpha}/{2}} }/{ \sqrt { n } }\) where \(n\) is the number of observations and \(z_{{\alpha}/{2}}\) is the critical value in the normal distribution for \(\alpha=1-0.95=0.05\). The distribution and level of \(\alpha\) results from the acf()
function’s default values of rand.gen=rnorm
and level=95
. The resulting effects on the plots are a number of lags that increases proportional to the base 10 log of the sample size and boundary limits that decrease proportional to the inverse square root of the sample size.
A classic example of a non-stationary series is the daily closing IBM stock prices (data set ibmclose
). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows the series is non-stationary and should be differenced.
In general, a stationary time series has no predictable patterns in the long-term. A time series with trends, seasonality, or heteroskedasticity is not stationary. Transformations such as logarithms can help to stabilize the variance of a time series and differencing can help stabilize the mean of a time series by removing changes in the level (trend and seasonality) of a time series. A time series with aperiodic cyclic behavior can stationary because the timing of the cycles is not predictable in the long-term. The tsdisplay()
function plots a time series along with its ACF and either its PACF, Histogram, Scatterplot, or Spectrum in the lower right corner. The ACF plot shows the autocorrelations between each observation and its immediate predecessor (lagged observation). The PACF plot shows the autocorrelations between the current observation and each lagged observation individually. These plots can be used together to spot nonstationary characteristics that are indicative of a need for differencing.
Plot | Characteristic | Description | Stationary | Nonstationary | Removal |
---|---|---|---|---|---|
Time | Trend | Long Up or Down Periods | Present | Absent | Difference |
Time | Seasonality | Annual, Monthly, etc. | Present | Absent | Difference |
Time | Heteroskedasticity | Change in Variance | Present | Absent | Transform |
Time | Cyclic Behavior | Aperiodic/Unpredictable | Irrelevant | Irrelevant | Irrelevant |
ACF | Exponential Decay | Significant Spike Decrease | Present | Absent | Difference |
PACF | Exponential Decay | Significant Spike Decrease | Present | Absent | Difference |
tsdisplay(ibmclose, plot.type = "partial")
Using these plots together show characteristics that are indicative of a need for differencing. The Time Plot displays long periods of apparent upward and downward trends in the data. Seasonality and changing variance are not readily apparent. There is some cyclical behavior happening, but its unpredictability makes it irrelevant. The ACF Plot shows spikes that are dropping to zero relatively slowly. The PACF Plot does not show any significant spikes meaning that although there are significant autocorrelations between subsequent lags, there are no significant partial autocorrelations between the first observation and other lags that are not its immediately predecessor.
Consider the number of women murdered each year (per 100,000 standard population) in the United States (data set wmurders
).
By studying appropriate graphs of the series in R, find an appropriate \(\textrm{ARIMA}(p,d,q)\) model for these data.
A time series with trends, seasonality, or heteroskedasticity is not stationary. Transformations such as logarithms can help to stabilize the variance of a time series and differencing can help stabilize the mean of a time series by removing changes in the level (trend and seasonality) of a time series. Trends and seasonality in the Time Plot are indicative of a need for differencing. The tsdisplay()
function plots a time series along with its ACF and either its PACF, Histogram, Scatterplot, or Spectrum in the lower right corner. The ACF and PACF plots are helpful in determining the value of \(p\) or \(q\) if both are not positive.
Significant Spikes | \(\textrm{ARIMA}(p,d,0)\) | \(\textrm{ARIMA}(0,d,q)\) |
---|---|---|
ACF | Exponential Decay or Sinusoidal | At, but not beyond, Lag \(q\) |
PACF | At, but not beyond, Lag \(p\) | Exponential Decay or Sinusoidal |
plot(wmurders)
tsdisplay(diff(wmurders), plot.type = "partial")
The Time Plot displays long periods of apparent upward and downward trends in the data. This supports the needs for differencing. After applying first-order differencing to the data, the tsdisplay()
function shows a Time Plot with no apparent trend. The ACF and PACF plots show one significant spike in each. Although these lone significant spikes are in the first few lags, one of the twenty lags (or 0.05%) appearing significant at a 95% confidence interval is within probabilistic expectations. These lone significant spikes are therefore ignored. The appropriate model is an \(\textrm{ARIMA}(0,1,0)\) model.
Should you include a constant in the model? Explain.
The higher the value of \(d\), the more rapidly the prediction intervals increase in size. The constant \(c\) has important effects on the long-term forecasts obtained from ARIMA models. Below are the effects of combining different values of \(d\) and \(c\).
Long-term Forecast | \(c=0\) | \(c\neq0\) |
---|---|---|
\(d=0\) | Goes to Zero | Goes to the Mean of the Data |
\(d=1\) | Goes to a Non-zero Constant | Follows a Straight Line |
\(d=2\) | Follows a Straight Line | Follows a Quadratic Trend |
plot(wmurders)
The \(\textrm{ARIMA}(0,1,0)\) model chosen has a \(d=1\) parameter. The wmurders
data have trends and a non-zero mean. An \(\textrm{ARIMA}(0,1,0)\) model with \(c=0\) represents a random walk and an \(\textrm{ARIMA}(0,1,0)\) model with \(c\neq0\) represents a random walk with drift. Adding drift to these data seems like the most reasonable choice, therefore a constant should be added to the model.
Write this model in terms of the backshift operator.
The backward shift operator \(B\) is a useful notational device when working with time series lags. A seasonal backshift is represented by \(\nabla_m=B^m\) where \(m\) represents the seasonal period. Differencing is represented by \(\nabla^d=(1-B)^d\) where \(d\) represts the differencing order. A differenced seasonal backshift can be annotated as \(\nabla^d_m=(1-B^m)^d\) where \(m\) represents the seasonal period and \(d\) represents the differencing order.
plot(diff(wmurders))
An \(\textrm{ARIMA}(0,1,0)\) model for the non-seasonal wmurders
data has \(m=d=1\). The representation of this model in terms of the backshift operator is equivalent to \(\nabla^d_m y_t=\nabla^1_1 y_t=(1-B^1)^1 y_t=(1-B) y_t=y_t-y_{t-1}\).
Fit the model using R and examine the residuals. Is the model satisfactory?
The arima()
and Arima()
functions are useful when specifying an ARIMA model. The difference is that unlike the arima()
function, the Arima()
function allows for a constant \(c\) when \(d\neq0\), allows the fitted model to be applied to new data, and works well with the forecast()
function. Setting the parameter include.constant
to TRUE
in the Arima()
function sets include.mean=TRUE
if \(d=0\) and include.drift=TRUE
if \(d=1\). Setting the parameter include.constant
to FALSE
sets both include.mean
and include.drift
to FALSE
.
(fit <- Arima(wmurders, order=c(0,1,0), include.constant=TRUE))
## Series: wmurders
## ARIMA(0,1,0) with drift
##
## Coefficients:
## drift
## 0.0030
## s.e. 0.0291
##
## sigma^2 estimated as 0.04649: log likelihood=6.73
## AIC=-9.47 AICc=-9.23 BIC=-5.49
Acf(residuals(fit))
Box.test(residuals(fit), lag=24, fitdf=4, type="Ljung")
##
## Box-Ljung test
##
## data: residuals(fit)
## X-squared = 23.329, df = 20, p-value = 0.2729
The \(\textrm{ARIMA}(0,1,0)\) model is satisfactory. The ACF plot of the residuals shows one of the twenty residuals (or 0.05%) as significant. At a 95% confidence interval this is within probabilistic expectations. The Box-Ljung test also shows that the residuals are not significantly different from white noise.
Forecast three times ahead. Check your forecasts by hand to make sure you know how they have been calculated.
Forecasts are calculated by first expanding the autoregressive (AR) and moving average (MA) parts of equation \(8.6.e.1\) with the appropriate values of \(p\) and \(q\). Then applying the backshift operator using the appropriate value of \(d\). The equation is then manipulated such that the entire equation is written in terms of \(y_t\). This establishes the equation needed for the forecast at time \(k\) where all \(t\) are replaced by \(T+k\); or \(T+k|T\) for any period greater than \(t\). For \(k=1\) the error is equual to the last observed error. For \(k>1\) the error is equual zero. \[\left( 1-\sum _{ i=0 }^{ p }{ \phi _{ i }B^{ i } } \right) \left( 1-B \right)^d y_{ t }=c+\left( 1-\sum _{ i=0 }^{ q }{ \theta _{ i }B^{ i } } \right) e_{ t } \tag{8.6.e.1}\]
(fcast <- forecast(fit, h=3))
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2005 2.592345 2.316033 2.868658 2.169762 3.014929
## 2006 2.595308 2.204543 2.986073 1.997684 3.192931
## 2007 2.598270 2.119683 3.076858 1.866334 3.330207
c <- fcast$model$coef[1] # drift
yt <- wmurders[55] # last obs
et <- 0 # expected white noise
ytPlus1 <- c + yt + et
ytPlus2 <- c + ytPlus1
ytPlus3 <- c + ytPlus2
data.frame(ytPlus1, ytPlus2, ytPlus3, row.names="forecast")
## ytPlus1 ytPlus2 ytPlus3
## forecast 2.592345 2.595308 2.59827
Given the \(\textrm{ARIMA}(0,1,0)\) model where \(p=q=0\) and \(d=1\), get: \[\left( 1-\phi _{ 0 }B^{ 0 } \right) \left( 1-B \right) ^{ 1 }y_{ t }=c+\left( 1-\theta _{ 0 }B^{ 0 } \right) e_{ t }\Rightarrow \left( 1-B \right) y_{ t }=c+e_{ t }\Rightarrow y_{ t }-y_{ t-1 }=c+e_{ t }\Rightarrow y_{ t }=c+y_{ t-1 }+e_{ t }\] Solving for \(y_t\) and replacing all \(t\) with \(T+k\); or \(T+k|T\) for any period greater than \(t\), get: \[y_{ T+1|T }=c+y_{ T }+e_{ T },\quad\quad y_{ T+2|T }=c+y_{ T+1|T },\quad\quad y_{ T+3|T }=c+y_{ T+2|T }\] The three times ahead forecast matches the manual forecasts when using the same value for \(c\) and the expected value of zero-mean white noise for the error.
Create a plot of the series with forecasts and prediction intervals for the next three periods shown.
The Time Plot is a line graph that plots each observed value against the time of the observation, with a single line connecting each observation across the entire period. When plotting forecast results, it displays the forecasts as well as predication intervals.
fcast <- forecast(fit, h=3)
plot(fcast)
The Time Plot of the ARIMA model shows predictions like that of a random walk with drift forecast. This is because as stated in 8.6.b above, an \(\textrm{ARIMA}(0,1,0)\) model with \(c\neq0\) represents a random walk with drift.
Does auto.arima
give the same model you have chosen? If not, which model do you think is better?
The auto.arima()
function chooses an ARMIA model automatically. It uses a variation of the Hyndman and Khandakar algorithm which combines unit root tests, minimization of the AICc, and MLE to obtain an ARIMA model. The function takes some short-cuts in order to speed up the computation and will not always yield the best model. Setting stepwise
and approximation
to FALSE
prevents the function from taking short-cuts.
auto.arima(wmurders, stepwise=FALSE, approximation=FALSE)
## Series: wmurders
## ARIMA(0,2,3)
##
## Coefficients:
## ma1 ma2 ma3
## -1.0154 0.4324 -0.3217
## s.e. 0.1282 0.2278 0.1737
##
## sigma^2 estimated as 0.04475: log likelihood=7.77
## AIC=-7.54 AICc=-6.7 BIC=0.35
fit
## Series: wmurders
## ARIMA(0,1,0) with drift
##
## Coefficients:
## drift
## 0.0030
## s.e. 0.0291
##
## sigma^2 estimated as 0.04649: log likelihood=6.73
## AIC=-9.47 AICc=-9.23 BIC=-5.49
The auto.arima()
function suggests an \(\textrm{ARIMA}(0,2,3)\) model. Compared to the manually chosen \(\textrm{ARIMA}(0,1,0)\) model, the auto.arima()
model outperforms the manual model with a lower \(\sigma^2\) and a higher MLE. Yet the auto.arima()
model has higher AIC, AICc, and BIC than the manual model, therefore appearing inferior by these measures. It looks like the auto.arima()
function may give more weight to \(\sigma^2\) and MLE than AIC, AICc, and BIC when selecting models.
Consider the total net generation of electricity (in billion kilowatt hours) by the U.S. electric industry (monthly for the period 1985-1996); data set usmelec
. In general there are two peaks per year: in mid-summer and mid-winter.
Examine the 12-month moving average of this series to see what kind of trend is involved.
A moving average smoother is helpful in examining what kind of trend is involved in a series. Moving average models should not be confused with moving average smoothing. A moving average model is used for forecasting future values while moving average smoothing is used for estimating the trend-cycle of past values. The ma()
function computes a simple moving average smoother of a given time series.
plot(usmelec, col="grey")
lines(ma(usmelec, order=12), col="red")
The 12-month moving average smoother is a good fit for these data as can be seen by the low volatility of the moving average smoother line. The moving average smoother line shows that the data have an apparent upward trend.
Do the data need transforming? If so, find a suitable transformation.
Heteroskedasticity refers to the circumstance in which the variability of a variable is unequal across the range of values of a second variable. Box-Cox transformations can help to stabilize the variance of a time series. Some typical Box-Cox transformations are: \[-2\Rightarrow \frac { 1 }{ y_t^{ 2 } }, \quad -1\Rightarrow \frac { 1 }{ y_t }, \quad -0.5\Rightarrow \frac { 1 }{ \sqrt { y_t } }, \quad 0\Rightarrow \log { \left( y_t \right) }, \quad 0.5\Rightarrow \sqrt { y_t }, \quad 1\Rightarrow y_t, \quad 2\Rightarrow y_t^{ 2 }\]
(lambda <- BoxCox.lambda(usmelec))
## [1] -0.4772402
usmelecBC <- BoxCox(usmelec, round(lambda, 1))
par(mfrow=c(2, 1))
plot(usmelec)
plot(usmelecBC)
The Box-Cox transformation suggest a \(\lambda=-0.4772402\). Rounding the value to \(\lambda=-0.5\) gives a more interpretable transformation that takes the form of \({ 1 }/{ \sqrt { y_t } }\). This Box-Cox transformation stabilizes the variance and makes the series look relatively homoskedastic with equal variance.
Are the data stationary? If not, find an appropriate differencing which yields stationary data.
In general, a stationary time series has no predictable patterns in the long-term. A time series with trends, seasonality, or heteroskedasticity is not stationary. Transformations can help to stabilize the variance of a time series and differencing can help stabilize the mean of a time series by removing changes in the level (trend and seasonality) of a time series. A time series with aperiodic cyclic behavior can stationary because the timing of the cycles is not predictable in the long-term. Applying the diff()
function to a nonstationary series returns a suitably lagged and iterated differenced times series.
par(mfrow=c(2, 2))
diff1 <- diff(usmelecBC)
plot(diff1, ylab=expression(nabla^1*1/sqrt(y[t])))
lines(decompose(diff1)$trend, col="blue")
diff2 <- diff(usmelecBC, differences=2)
plot(diff2, ylab=expression(nabla^1*1/sqrt(y[t])))
lines(decompose(diff2)$trend, col="blue")
diff3 <- diff(usmelecBC, lag=12)
plot(diff3, ylab=expression(nabla[12]*1/sqrt(y[t])))
lines(decompose(diff3)$seasonal, col="red")
diff4 <- diff(diff(usmelecBC, lag=12))
plot(diff4, ylab=expression(nabla[12]*nabla^1*1/sqrt(y[t])))
The top-left plot shows first-order differencing of the transformed data to remove trend. Occasionally a differenced series will not appear stationary and it may be necessary to difference a second time to obtain a stationary series. In practice, it is almost never necessary to go beyond second-order differences. The top-right plot shows second-order differencing of the transformed data to further remove trend. Since both the first-order and second-order differencing result in a zero mean, only first-order differencing is necessary. The bottom-left shows differencing of the transformed data to remove seasonality. The bottom-right plot shows both first-order differencing and seasonal differencing operations on the transformed data. Applying both to the transformed data removes the trend and seasonality which yields stationary data.
Identify a couple of ARIMA models that might be useful in describing the time series. Which of your models is the best according to their AIC values?
The tsdisplay()
function plots a time series along with its ACF and either its PACF, Histogram, Scatterplot, or Spectrum in the lower right corner. The ACF and PACF plots are helpful in determining the value of \(p\) or \(q\) if both are not positive.
Significant Spikes | \(\textrm{ARIMA}(p,d,0)\) | \(\textrm{ARIMA}(0,d,q)\) |
---|---|---|
ACF | Exponential Decay or Sinusoidal | At, but not beyond, Lag \(q\) |
PACF | At, but not beyond, Lag \(p\) | Exponential Decay or Sinusoidal |
tsdisplay(diff4, plot.type = "partial")
fit1 <- Arima(usmelec, order=c(0,1,3), seasonal=c(0,1,3), lambda=lambda)
fit1$aic
## [1] -4256.02
fit2 <- Arima(usmelec, order=c(2,1,0), seasonal=c(2,1,0), lambda=lambda)
fit2$aic
## [1] -4155.429
fit3 <- Arima(usmelec, order=c(2,1,3), seasonal=c(2,1,3), lambda=lambda)
fit3$aic
## [1] -4250.112
The significant spike at lag 3 followed by exponential decay in the ACF suggests a non-seasonal MA(3) component, and the significant spike at lag 12 in the ACF suggests a seasonal MA(3) component. Therefore one possible model is an \(\textrm{ARIMA}(0,1,3)(0,1,3)_{12}\) model. The significant spike at lag 2 in the PACF followed by exponential decay suggests a non-seasonal AR(2) component, and the significant spike at lag 12 in the ACF suggests a seasonal AR(2) component. Therefore another possible model is an \(\textrm{ARIMA}(2,1,0)(2,1,0)_{12}\) model. Combining both models yields an \(\textrm{ARIMA}(2,1,3)(2,1,3)_{12}\) model. After fitting each model, it turns out that the \(\textrm{ARIMA}(0,1,3)(0,1,3)_{12}\) model has the lowest AIC.
Estimate the parameters of your best model and do diagnostic testing on the residuals. Do the residuals resemble white noise? If not, try to find another ARIMA model which fits better.
White Noise residuals display the characteristics of independent and identically distributed such that the autocorrelation of White Noise is not significantly different from zero. ACF plots, PACF Plots, and the Box-Ljung test are helpful in assessing if data follow a White Noise pattern.
tsdisplay(residuals(fit1))
Box.test(residuals(fit1), lag=24, fitdf=4, type="Ljung")
##
## Box-Ljung test
##
## data: residuals(fit1)
## X-squared = 31.49, df = 20, p-value = 0.04905
tsdisplay(residuals(fit3))
Box.test(residuals(fit3), lag=24, fitdf=4, type="Ljung")
##
## Box-Ljung test
##
## data: residuals(fit3)
## X-squared = 29.553, df = 20, p-value = 0.07742
The residuals of the \(\textrm{ARIMA}(0,1,3)(0,1,3)_{12}\) model appear to display the characteristics of White Noise in the ACF plot with only one of the twenty residuals (or 0.05%) being significant. At a 95% confidence interval this is within probabilistic expectations. The Box-Ljung tests contradicts this conclusion however by showing that the autocorrelations of the residuals for the \(\textrm{ARIMA}(0,1,3)(0,1,3)_{12}\) model are significantly different from zero. The runner-up model with the second lowest AIC is therefore tested. The ACF plot of the residuals of the \(\textrm{ARIMA}(2,1,3)(2,1,3)_{12}\) model show more than one in twenty residuals (over 0.05%) as being significant. This would lead one to believe that the residuals are not White Noise. The Box-Ljung tests contradicts this conclusion as well however by showing that the autocorrelations of the residuals for the \(\textrm{ARIMA}(2,1,3)(2,1,3)_{12}\) model are not significantly different from zero. The \(\textrm{ARIMA}(2,1,3)(2,1,3)_{12}\) model passes the required checks and is therefore suitable for forecasting. This example shows the importance of using visual diagnostics in conjunction with portmanteau tests.
Forecast the next 15 years of generation of electricity by the U.S. electric industry. Get the latest figures from here to check on the accuracy of your forecasts.
For these monthly data, a 15-year prediction consists of \((12)(15)=180\) months. When forecasting, a series can be transformed with a calculated value of \(\lambda\), modeled, then forecasted with the calculated value of \(\lambda\) specified in the forecast()
function. The forecast()
function performs much better however when the calculated \(\lambda\) is built into the model with the Arima()
function,, then forecasted.
lambda <- BoxCox.lambda(usmelec)
fit <- Arima(usmelec, order=c(2,1,3), seasonal=c(2,1,3), lambda=lambda)
fcast <- forecast(fit, h=12*15)
plot(fcast, ylab="kwh in Billions", main="15-year Prediction of Electricity Generation")
# DOWNLOAD ACTUAL DATA
actual <- read.csv(paste0("https://raw.githubusercontent.com/jzuniga123/",
"SPS/master/DATA%20624/electricity-overview.csv"), stringsAsFactors=FALSE)
# CLEAN DOWNLOADED DATA
actual <- head(actual, -1) # remove last row with descriptive text
actual_0 <- as.integer(unlist(strsplit(as.character(head(actual["Month"], 1)), "-")))
actual_N <- as.integer(unlist(strsplit(as.character(tail(actual["Month"], 1)), "-")))
usmelec_test <- ts(actual[2], start=actual_0, end=actual_N, frequency=12)
# PLOT PREDICTIONS AGAINST ACTUAL
buffer <- 5
usmelec_N <- tail(time(usmelec), 1)
fcast_0 <- c(floor(usmelec_N), as.numeric(substr(usmelec_N, 5, 7)) * 12 + 1)
periods <- length(ts(actual[2], start=fcast_0, end=actual_N, frequency=12))
plot(forecast(fit, h=periods+buffer), include=buffer, main="Prediction Validation")
lines(window(usmelec_test, start=fcast_0), col="red")
legend("topleft", lty=1, col=c(1,4,2), c("Training Data", "Forecast Values", "Actual Data"))
The forecasts in the first plot appears to project the trend, seasonality, and heteroskedastic attributes of the time series data accurately. The prediction intervals expand widely after the tenth year in the prediction but the point estimates seem reasonable. For reproducibility, the downloaded validation dataset has been placed on GitHub. It requires some cleaning, but when finally plotted against the predictions, the forecast shows an astonishing level of accuracy. The point estimates of the forecasts and the actual data follow each other closely.
How many years of forecasts do you think are sufficiently accurate to be usable?
The accuracy()
function is helful for obtaining summary measures of the forecast accuracy: Mean Error (ME), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Mean Percentage Error (MPE), Mean Absolute Percentage Error (MAPE), Mean Absolute Scaled Error (MASE), and Autocorrelation of errors at lag 1 (ACF1).
accuracy(forecast(fit3, h=periods), window(usmelec_test, start=c(2010, 11)))
## ME RMSE MAE MPE MAPE MASE
## Training set -0.3673897 7.079241 5.137940 -0.1572162 1.975946 0.5643384
## Test set -4.6585379 10.249873 8.952395 -1.4997715 2.653519 0.9833086
## ACF1 Theil's U
## Training set -0.03110326 NA
## Test set 0.43041838 0.3281338
These predications a highly accurate by every measure. The predictability of the trend, seasonality, and heteroskedastic attributes of the time series makes for a more accurate forecast. With such long and stable patterns in the data, predictions up to a decade would not be unseasonable. After a decade the predication intervals expand wildly therefore any decisions based off later predications would be unwise.
https://rpubs.com/josezuniga/282349
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/arima.sim.html
https://www.rdocumentation.org/packages/forecast/versions/8.1/topics/Acf
http://s1.daumcdn.net/editor/fp/service_nc/pencil/Pencil_chromestore.html
https://campus.datacamp.com/courses/introduction-to-time-series-analysis/predicting-the-future?ex=8