Figure 8.31
shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
Time series data are considered white noise if less than 5% of the ACF spikes are above the 95% limits blue line. By looking at the plots, they all indicate that the data are white noise.
As the sample size become larger the ACF spikes become less significant and drops closer to zero.
A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose
). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
Plot
The data is definitely non-stationary with some seasonality. The above ACF is “decaying”, or decreasing, very slowly, and remains well above the significance range. Also, note that the PACF plot has a significant spike only at lag 1, meaning that all the higher-order autocorrelations are effectively explained by the lag 1 autocorrelation.
For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
Note
If the test statistic is much bigger than the 1% critical value calculated by the ur.kpss()
function, this indicates that the data is non-stationary hence rejecting the null hypothesis. Therefore we may need to difference the data to make it stationary.
usnetelec
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 3 lags.
##
## Value of test-statistic is: 1.464
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
## [1] 1
usgdp
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 4 lags.
##
## Value of test-statistic is: 4.6556
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
## [1] 2
## [1] 0
mcopper
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 6 lags.
##
## Value of test-statistic is: 5.01
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
## [1] 1
Let’s try transforming the data.
mcopper_lambda <- BoxCox.lambda(mcopper)
mcopper %>% BoxCox(mcopper_lambda) %>% diff() %>% autoplot()
enplanements
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 4.4423
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
## [1] 1
## [1] 1
Sometimes it is necessary to take both a seasonal difference and a first difference to obtain stationary data. The second diff()
is seasonal and the first diff()
is for first difference.
enp_lambda <- BoxCox.lambda(enplanements)
enplanements %>% BoxCox(enp_lambda) %>% diff() %>% diff() %>% autoplot()
visitors
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 4 lags.
##
## Value of test-statistic is: 4.6025
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
## [1] 1
## [1] 1
Same situation as with enplanements
.
vis_lambda <- BoxCox.lambda(visitors)
visitors %>% BoxCox(vis_lambda) %>% diff() %>% diff() %>% autoplot()
For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349399C"], frequency=12, start=c(1982,4)) #clothing
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 6.0164
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
## [1] 1
## [1] 1
Use R to simulate and plot some data from simple ARIMA models.
How does the plot change as you change \(\phi_1\)?
y <- ts(numeric(100))
y2 <- ts(numeric(100))
y3 <- ts(numeric(100))
y4 <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100){
y[i] <- 0.6*y[i-1] + e[i]
y2[i] <- 0.1*y2[i-1] + e[i]
y3[i] <- 0.8*y3[i-1] + e[i]
y4[i] <- 1*y4[i-1] + e[i]
}
gridExtra::grid.arrange(
autoplot(y2) + ggtitle("phi = 0.1"),
autoplot(y) + ggtitle("phi = 0.6"),
autoplot(y3) + ggtitle("phi = 0.8"),
autoplot(y4) + ggtitle("phi = 1"), nrow = 2
)
When \(\phi_1\) is gets smaller the plot becomes more stationary. As the value increases the plot moves towards a non-stationary plot, that is there is a slow decay. This confirms the notion that The AR(1) process is stationary if only if \(\|\phi\|<1\) or \(−1 < \phi < 1\).
How does the plot change as you change \(\theta_1\)?
y_ma1 <- ts(numeric(100))
y2_ma1 <- ts(numeric(100))
y3_ma1 <- ts(numeric(100))
y4_ma1 <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100){
y_ma1[i] <- 0.6*e[i-1] + e[i]
y2_ma1[i] <- 0.1*e[i-1] + e[i]
y3_ma1[i] <- 0.8*e[i-1] + e[i]
y4_ma1[i] <- 1*e[i-1] + e[i]
}
gridExtra::grid.arrange(
autoplot(y2_ma1) + ggtitle("phi = 0.1"),
autoplot(y_ma1) + ggtitle("phi = 0.6"),
autoplot(y3_ma1) + ggtitle("phi = 0.8"),
autoplot(y4_ma1) + ggtitle("phi = 1"), nrow = 2
)
The plots remained stationary whether \(\theta_1\) increased or not.
y_ARMA <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
y_ARMA[i] <- 0.6*y_ARMA[i-1] + 0.6*e[i-1] + e[i]
y_AR2 <- ts(numeric(100))
e <- rnorm(100)
for(i in 3:100)
y_AR2[i] <- (-0.8*y_AR2[i-1]) + (0.3*y_AR2[i-2]) + e[i]
ARMA - Plot remains stationary.
AR(2) - As the time increase so the does the variance in AR(2) making it non-stationary.
Consider wmurders
, the number of women murdered each year (per 100,000 standard population) in the United States.
## [1] 2
wm_lambda <- BoxCox.lambda(wmurders)
wmurders %>% BoxCox(wm_lambda) %>% diff() %>% diff() %>% ggtsdisplay()
The appropriate model for this data is ARIMA(1,2,1).
## Series: wmurders
## ARIMA(1,2,1)
##
## Coefficients:
## ar1 ma1
## -0.2434 -0.8261
## s.e. 0.1553 0.1143
##
## sigma^2 estimated as 0.04632: log likelihood=6.44
## AIC=-6.88 AICc=-6.39 BIC=-0.97
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.01065956 0.2072523 0.1528734 -0.2149476 4.335214 0.9400996
## ACF1
## Training set 0.02176343
According to the text a constant is included unless \(d=2\). In this case for the model d is equal to 2 so no I would not include a constant.
\[(1 - B)^{2} y_{t}\]
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,2,1)
## Q* = 12.419, df = 8, p-value = 0.1335
##
## Model df: 2. Total lags used: 10
Yes. The ACF plot of the residuals from the model shows that all autocorrelations are within the threshold limits, indicating that the residuals are behaving like white noise. A portmanteau test returns a large p-value (0.1335
), also confirming that the residuals are white noise.
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2005 2.470660 2.194836 2.746484 2.048824 2.892496
## 2006 2.363106 1.986351 2.739862 1.786908 2.939304
## 2007 2.252833 1.765391 2.740276 1.507354 2.998313
Check your forecasts by hand to make sure that you know how they have been calculated.
auto.arima()
give the same model you have chosen? If not, which model do you think is better?## Series: wmurders
## ARIMA(1,2,1)
##
## Coefficients:
## ar1 ma1
## -0.2434 -0.8261
## s.e. 0.1553 0.1143
##
## sigma^2 estimated as 0.04632: log likelihood=6.44
## AIC=-6.88 AICc=-6.39 BIC=-0.97
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.01065956 0.2072523 0.1528734 -0.2149476 4.335214 0.9400996
## ACF1
## Training set 0.02176343
Yes, same model was chosen by auto.arima()
.