library(fpp2)
library(mlbench)
library(corrplot)
library(ggplot2)
require(gridExtra)
library(car)
library(caret)
library(tidyverse)
library(DT)
library(urca)
Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
a ) Explain the differences among these figures. Do they all indicate that the data are white noise?
The difference between the figures is caused by the length of the time series, which directly affects the acceptable autocorrelation range. Since all of the autocorrelations in the provided range of lags are within the margins, this indicates that the data are white noise.
The critical values are at different distances from the mean of 0 because of distance depends on the size of the time series. Since the length of each series are significantly different, there will be different acceptable ranges for white noise, defined by the following equation:
\(\pm 2 / \sqrt{T}\)
where t is the length of the time series
A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
A stationary time series will look the same at any time interval; while it can have cyclic behavior, there should not be seasonality or a trend.
The following time series plot shows: * A sharp drop in daily closing IBM stock price, indicating a downwards trend * Higher variability between time 200 and time 300 (can be addressed with a log transformation)
autoplot(ibmclose)
The ACF and PACF help determine how the series should be differenced. * The ACF plot shows strong autocorrelation at all lags * The PACF plot only shows a strong autocorrelation at lag 1, because it removes the effects of correlation between lags.
ggtsdisplay(ibmclose %>% log())
This suggests that it might only be neccessary to difference once, using the previous lag. We see that this addresses most of the autocorrelation, since:
ggtsdisplay(ibmclose %>% log() %>% diff(1))
Box.test(ibmclose %>% log() %>% diff(1),lag=10, fitdf=0, type="Lj")
##
## Box-Ljung test
##
## data: ibmclose %>% log() %>% diff(1)
## X-squared = 15.014, df = 10, p-value = 0.1316
ibmclose %>% log() %>% diff(1) %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 0.3932
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
usnetelecThe plot for usnetelec shows an upward trend, which must be addressed to obtain stationary data.
autoplot(usnetelec)
The ACF and PACF plots tell us that most of the autocorrelation comes from the previous time series value. We can difference at lag = 1 to correct this.
ggtsdisplay(usnetelec)
We see that this approach (as well as applying a square root transformation) does make the time series stationary (but still autocorrelated), since:
ggtsdisplay(usnetelec %>% sqrt() %>% diff(1))
Box.test(usnetelec %>% sqrt() %>% diff(1),lag=10, fitdf=0, type="Lj")
##
## Box-Ljung test
##
## data: usnetelec %>% sqrt() %>% diff(1)
## X-squared = 8.2901, df = 10, p-value = 0.6005
usnetelec %>% sqrt() %>% diff(1) %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 3 lags.
##
## Value of test-statistic is: 0.4656
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
usgdpThe plot for usgdp shows an upward trend, which must be addressed to obtain stationary data. There does not seem to be significant seasonality.
ggseasonplot(usgdp, polar = TRUE)
The ACF and PACF plots tell us that most of the autocorrelation comes from the previous time series value. We can difference at lag = 1 to correct this.
ggtsdisplay(usgdp)
We see that taking a difference at lag 1 and performing a BoxCox transform does not address all the autocorrelation, since:
ggtsdisplay(usgdp %>% BoxCox(BoxCox.lambda(usgdp)) %>% diff(1))
Box.test(usgdp %>% BoxCox(BoxCox.lambda(usgdp)) %>% diff(1),lag=24, fitdf=0, type="Lj")
##
## Box-Ljung test
##
## data: usgdp %>% BoxCox(BoxCox.lambda(usgdp)) %>% diff(1)
## X-squared = 65.525, df = 24, p-value = 1.019e-05
usgdp %>% BoxCox(BoxCox.lambda(usgdp)) %>% diff(1) %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 4 lags.
##
## Value of test-statistic is: 0.2013
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
mcopperThe plot for usgdp shows an upward trend, which must be addressed to obtain stationary data. There does not seem to be significant seasonality. There is some changing variability than can be addressed with a BoxCox transform.
autoplot(mcopper)
The ACF and PACF plots tell us that most of the autocorrelation comes from the previous time series value. We can difference at lag = 1 to attempt to correct this.
ggtsdisplay(mcopper)
We see that taking a difference at lag 1 and performing a log transform does make the time series stationary (but still autocorrelated), since:
ggtsdisplay(mcopper %>% log() %>% diff(1))
Box.test(mcopper %>% log() %>% diff(1),lag=24, fitdf=0, type="Lj")
##
## Box-Ljung test
##
## data: mcopper %>% log() %>% diff(1)
## X-squared = 92.239, df = 24, p-value = 6.121e-10
mcopper %>% log() %>% diff(1) %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 6 lags.
##
## Value of test-statistic is: 0.0425
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
enplanementsThe time series plot for enplanements shows strong seasonality, an upwards trend, and changing variability.
autoplot(enplanements)
The ACF and PACF plots show that there is autocorrection with previous values and on a seasonal basis. We can address all of the issues with this time series by taking a BoxCox transform, and applying seasonal differencing.
ggtsdisplay(enplanements)
We see that performing a BoxCox transform, seasonal differencing, and taking a difference at lag 1 does make the time series stationary (but still autocorrelated), since:
ggtsdisplay(enplanements %>% BoxCox(BoxCox.lambda(enplanements)) %>% diff(12) %>% diff(1))
Box.test(enplanements %>% BoxCox(BoxCox.lambda(enplanements)) %>% diff(12) %>% diff(1),lag=24, fitdf=0, type="Lj")
##
## Box-Ljung test
##
## data: enplanements %>% BoxCox(BoxCox.lambda(enplanements)) %>% diff(12) %>% diff(1)
## X-squared = 71.176, df = 24, p-value = 1.449e-06
enplanements %>% BoxCox(BoxCox.lambda(enplanements)) %>% diff(12) %>% diff(1) %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 0.0424
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
visitorsThe time series plot for visitors shows strong seasonality, an upwards trend, and changing variability.
autoplot(visitors)
The ACF and PACF plots show that there is autocorrection with previous values and on a seasonal basis. We can address all of the issues with this time series by taking a BoxCox transform, and applying seasonal differencing.
ggtsdisplay(visitors)
We see that performing a BoxCox transform, seasonal differencing, and taking a difference at lags 12 and 1 does make the time series stationary (but still autocorrelated), since:
ggtsdisplay(visitors %>% BoxCox(BoxCox.lambda(visitors)) %>% diff(12) %>% diff(1))
Box.test(visitors %>% BoxCox(BoxCox.lambda(visitors)) %>% diff(12) %>% diff(1),lag=24, fitdf=0, type="Lj")
##
## Box-Ljung test
##
## data: visitors %>% BoxCox(BoxCox.lambda(visitors)) %>% diff(12) %>% diff(1)
## X-squared = 121.61, df = 24, p-value = 4.996e-15
visitors %>% BoxCox(BoxCox.lambda(visitors)) %>% diff(12) %>% diff(1) %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 4 lags.
##
## Value of test-statistic is: 0.0158
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
Taking a look at this plot, a BoxCox transform will be needed because of the changing variance. Seasonal differencing may also be needed, because this data is highly seasonal. The upwards trend will also require differencing to address.
temp = tempfile(fileext = ".xlsx")
dataURL <- "https://otexts.com/fpp2/extrafiles/retail.xlsx"
download.file(dataURL, destfile=temp, mode='wb')
retaildata <- readxl::read_excel(temp, skip=1)
myts <- ts(retaildata[,"A3349396W"], frequency=12, start=c(1982,4))
autoplot(myts)
The season plot shows annual seasonality.
ggseasonplot(myts, polar = TRUE)
Stationary data for this time series can be obtained applying a boxcox transform, seasonal differencing, and differencing at lag = 12. This produces a unit root test statistic within the range for stationary data.
myts %>% BoxCox(BoxCox.lambda(myts)) %>% diff(12) %>% diff(1) %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 0.0153
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
Use R to simulate and plot some data from simple ARIMA models.
y <- ts(numeric(100))
e <- rnorm(100)
As ϕ1 increases, the time plot becomes smoother. Lower values of ϕ1 result in more variability in the time series.
ar <- function(theta){
y <- ts(numeric(100))
e <- rnorm(100, sd=1)
for(i in 2:100){
y[i] <- theta*y[i-1] + e[i]}
return(y)
}
autoplot(cbind(e, ar(.01), ar(.3), ar(.6), ar(1)), facet = TRUE)
ma <- function(theta){
y <- ts(numeric(100))
e <- rnorm(100, sd=1)
for(i in 2:100){
y[i] <- theta*e[i-1] + e[i]}
return(y)
}
ma(0.6)
## Time Series:
## Start = 1
## End = 100
## Frequency = 1
## [1] 0.000000000 0.236962064 1.242637625 2.523305866 0.756278998
## [6] -1.188786417 -0.578167562 -0.088556015 0.683273253 -0.878297005
## [11] -0.115862117 0.024131211 -1.529967045 -0.211418017 -0.979190138
## [16] -1.704152349 -2.155149502 -1.558129093 2.012170883 2.938886201
## [21] -0.007911785 -1.449904832 -0.729058568 0.217969325 -1.655822229
## [26] -1.735729022 -0.094535656 -0.722319882 0.714426625 -0.703521386
## [31] -1.306252322 -0.160689808 0.777373892 0.439307659 -0.789766660
## [36] -0.643401010 0.660516826 1.577144698 1.881252564 0.846440878
## [41] -0.238520107 -0.213970957 -1.966961995 -1.124381114 -0.289979771
## [46] 0.745553909 -0.082962328 -0.257025746 -0.062334851 -0.859938678
## [51] 0.141240832 2.310235839 0.312121147 -0.425397676 2.807714894
## [56] 1.648913904 0.093834838 2.356358662 1.313778644 0.019745935
## [61] 0.340484591 -0.780234680 0.036273736 2.350327649 2.517932019
## [66] 1.827398650 1.045854961 1.965722867 1.229988846 1.921444728
## [71] 2.106309807 1.575457799 2.906997855 1.651702210 -0.205027516
## [76] -0.951904176 0.981228445 0.699639591 -0.866895172 -1.940582463
## [81] -0.441880942 0.025813051 -1.290290517 -2.051324382 -0.120942776
## [86] 1.632996583 2.538634011 0.099299228 -2.093742587 -0.999229422
## [91] 0.451253549 0.764119180 -1.276635801 -1.669179655 1.200757767
## [96] 1.258707288 0.864186130 0.414292707 1.046640461 0.702123215
As θ1 changes, the pattern of the time series remains consistant. The scale of the time series values increases with θ1.
autoplot(cbind(e, ma(.1), ma(.6), ma(1), ma(3)), facet = TRUE)
y <- ts(numeric(100))
e <- rnorm(100, sd=1)
for(i in 2:100){
y[i] <- 0.6*y[i-1] + 0.6*e[i-1] + e[i]
}
autoplot(y) +
ggtitle('ARMA(1,1)')
y2 <- ts(numeric(100))
e <- rnorm(100, sd=1)
for(i in 3:100){
y2[i] <- (-0.8)*y2[i-1] + 0.3*y2[i-2] + e[i]
}
autoplot(y2) +
ggtitle('AR(2)')
The AR(2) series ends up oscillating around a mean of 0 with increasing variance over time. ARMA(1,1) is the better model, although it also has high autocorrelation.
The reason AR(2) is such a poor model is because it does not follow the following constraint: ϕ2 − ϕ1 < 1
ggAcf(y) + ggtitle('ARMA(1,1)')
ggAcf(y2) + ggtitle('AR(2)')
wmurders, the number of women murdered each year (per 100,000 standard population) in the United States. By studying appropriate graphs of the series in R, find an appropriate ARIMA(p, d, q) model for these data.autoplot(wmurders)
Differencing at lag = 1 will make the time series stationary:
wmurders %>% diff() %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 3 lags.
##
## Value of test-statistic is: 0.4697
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
The PAcf plot, which has high autocorrelation at lag = 1 suggests a AR(1) model would be sufficient.
ggtsdisplay(wmurders)
A constant should not be included in this model, because we differenced the function. When d > 0, the Arima function will set the constant equal to 0.
\[(1-\phi_1B)(1-B)y_t = c + \epsilon_t\]
The residuals for an ARIMA(1, 1, 0) model are checked below. The p-value is not significant and autocorrelations for all lags are acceptable, therefore this model is satisfactory.
fit <- Arima(wmurders, c(1,1,0))
checkresiduals(fit)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,1,0)
## Q* = 13.281, df = 9, p-value = 0.1503
##
## Model df: 1. Total lags used: 10
fit
## Series: wmurders
## ARIMA(1,1,0)
##
## Coefficients:
## ar1
## -0.0841
## s.e. 0.1346
##
## sigma^2 estimated as 0.04616: log likelihood=6.92
## AIC=-9.85 AICc=-9.61 BIC=-5.87
forecast(fit, h=3)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2005 2.595506 2.320177 2.870835 2.174427 3.016585
## 2006 2.594992 2.221624 2.968359 2.023976 3.166007
## 2007 2.595035 2.143387 3.046682 1.904300 3.285770
f1 = wmurders[55] + (-0.0841 * (wmurders[55] - wmurders[54])) - 0.0000025
f2 = f1 + (-0.0841 * (f1 - wmurders[55]))
f3 = f2 + (-0.0841 * (f2 - f1))
print(paste('forecast 1:', f1))
## [1] "forecast 1: 2.5955066804"
print(paste('forecast 2:', f2))
## [1] "forecast 2: 2.59499167887836"
print(paste('forecast 3:', f3))
## [1] "forecast 3: 2.59503499050633"
for(i in 2:51){
theta = 1
test <- wmurders %>% diff()
test[i] <- test[i - 1] + theta*e[i-1] + e[i]}
test
## Time Series:
## Start = 1951
## End = 2004
## Frequency = 1
## [1] -0.066051 0.010941 -0.078785 0.034196 -0.096699 0.145162 -0.055508
## [8] 0.093885 0.081643 0.081254 0.001387 0.048453 -0.047440 0.139087
## [15] 0.123834 0.300088 0.288182 -0.074626 0.104202 0.184722 0.343738
## [22] -0.065378 0.565260 0.000579 -0.094974 -0.412076 0.130620 -0.096573
## [29] 0.182569 0.166352 -0.139356 -0.082214 -0.359198 0.040682 -0.006997
## [36] 0.204758 0.044996 0.035790 -0.066893 0.074951 0.296692 -0.300084
## [43] 0.247114 -0.398660 -0.076133 -0.358104 -0.275239 -0.109680 -0.105621
## [50] -0.229222 -2.275403 -0.506770 -0.135470 -0.072844
autoplot(forecast(fit, h=3))
Auto.arima does not give the same model as my selection. My model projects a constant rate of women’s deaths, while the auto arima model projects decreasing rates over the next three years.
fit_auto <- auto.arima(wmurders, stepwise=FALSE, approximation=FALSE)
autoplot(forecast(fit_auto,3))
Let’s compare the values of AICc (Akaike’s Information Criterion) between the two models. The manual ARIMA model had an AICc of -9.61 while auto.arima() produced an AICc of -6.39. Good models are obtained by minimizing AICc - the manual ARIMA model has the smaller AICc.
fit
## Series: wmurders
## ARIMA(1,1,0)
##
## Coefficients:
## ar1
## -0.0841
## s.e. 0.1346
##
## sigma^2 estimated as 0.04616: log likelihood=6.92
## AIC=-9.85 AICc=-9.61 BIC=-5.87
fit_auto
## Series: wmurders
## ARIMA(0,2,3)
##
## Coefficients:
## ma1 ma2 ma3
## -1.0154 0.4324 -0.3217
## s.e. 0.1282 0.2278 0.1737
##
## sigma^2 estimated as 0.04475: log likelihood=7.77
## AIC=-7.54 AICc=-6.7 BIC=0.35