Exercises 8.1, 8.2, 8.3, 8.5, 8.6 and 8.7 from the Hyndman online Forecasting book.
Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
A time series is white noise if the variables are independent and identically distributed with a mean of zero. This means that all variables have the same variance (sigma^2) and each value has a zero correlation with all other values in the series.The figures indicate that the data is white noise (residuals correlated mean=0,variance=constant).
If the series is a weakly stationary time series (with mean ) with auto-covariances then a law of large numbers holds. The reason why critical values are different is that critical values differ depending upon the amount of data.
A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
ggtsdisplay(ibmclose)
The ACF plot indicates time series (IBM Closing Prices) is not stationary level because of very high autocorrelation values. By looking at the autocorrelation function (ACF) and partial autocorrelation (PACF) plots of the differenced series, we can verify the numbers of AR and/or MA terms that are needed.The ACF plot is merely a bar chart of the coefficients of correlation between a time series and lags of itself.The ACF graph in ibmclose data shows the observed correlation is well above the critical values.In stationary level series,The ACF will drop to zero quickly.The PACF plot is a plot of the partial correlation coefficients between the series and lags of itself.
ibmclose.diff <- diff(ibmclose)
ggtsdisplay(ibmclose.diff)
In stationary level series(differenced),The ACF dropped to zero quickly.However, we still get some extreme values that greater than critical value (0.10).Our threshold is really important in this case because If we think that threshold for critical values needs to be 0.15, we can assume that there are no extreme values.
For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
The trend in timeseries looks nearly linear.I wil check stationary status to determine the order of differencing.
ggtsdisplay(usnetelec, main = "National Net Electricity-RAW SERIES")
Box.test(usnetelec, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: usnetelec
## X-squared = 52.549, df = 1, p-value = 4.196e-13
As we seet hat the p-value is low (p-value = 4.196e-13), the Null hypothesis is REJECTED (Autocorrelation exists).
usnetelec_lambda <- round(BoxCox.lambda(usnetelec),5)
usnetelec_BC <- BoxCox(usnetelec, usnetelec_lambda)
ggtsdisplay(usnetelec_BC, main = paste("National Net Electricity- BoxCox lambda = ",
usnetelec_lambda))
Time series data with Box-Cox transformation still gives non-stationary data (ADF plot)
Box.test(usnetelec_BC, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: usnetelec_BC
## X-squared = 52.191, df = 1, p-value = 5.035e-13
Because the p-value is still low (p-value = 5.035e-13), the Null hypothesis is REJECTED (Autocorrelation exists).
usnetelec_BC_ndiffs <- ndiffs(usnetelec_BC)
ggtsdisplay(diff(usnetelec_BC), main=paste("National Net Electricity - BoxCox lambda = ",
usnetelec_lambda," - first difference"))
ggtsdisplay(diff(diff(usnetelec_BC)), main=paste("National Net Electricity - BoxCox lambda = ",
usnetelec_lambda," - second difference"))
I’m going to test two different lag of orders such as I(1),and (2)
usnetelec_BC %>% diff() -> usnetelec_BC_d1
ggtsdisplay(usnetelec_BC_d1, main = paste("US Net Electricity - BoxCox lambda = ",
usnetelec_lambda, "first difference"))
#Ljung-box test:
Box.test(usnetelec_BC_d1, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: usnetelec_BC_d1
## X-squared = 2.6132, df = 1, p-value = 0.106
I noticed that the p-value is greater than p-value threshold (0.05), we FAIL TO REJECT the null hypothesis, which states the data are independent (i.e., no serial correlation.)So, first-differenced usnetelec
data can be thought of as a white noise series.
kpss.test(usnetelec_BC_d1, "Level", lshort = F)
## Warning in kpss.test(usnetelec_BC_d1, "Level", lshort = F): p-value greater than
## printed p-value
##
## KPSS Test for Level Stationarity
##
## data: usnetelec_BC_d1
## KPSS Level = 0.33954, Truncation lag parameter = 10, p-value = 0.1
kpss.test(usnetelec_BC_d1, "Trend", lshort = F)
## Warning in kpss.test(usnetelec_BC_d1, "Trend", lshort = F): p-value greater than
## printed p-value
##
## KPSS Test for Trend Stationarity
##
## data: usnetelec_BC_d1
## KPSS Trend = 0.074231, Truncation lag parameter = 10, p-value = 0.1
usnetelec_BC_d1 %>% ur.kpss(type="mu", lags="long") %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 10 lags.
##
## Value of test-statistic is: 0.3395
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
usnetelec_BC_d1 %>% ur.kpss(type="tau", lags="long") %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: tau with 10 lags.
##
## Value of test-statistic is: 0.0742
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.119 0.146 0.176 0.216
The first differenced series ,order(1), data with Box-Cox transformation pass on Both KPSS tests which states that second-differencing is not required.
The trending timeseries usgdp
looks nearly linear.I will check stationary status to determine the order of differencing.
ggtsdisplay(usgdp, main = "National GDP-RAW SERIES")
Box.test(usgdp, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: usgdp
## X-squared = 233.07, df = 1, p-value < 2.2e-16
As we seet hat the p-value is low (p-value < 2.2e-16), the Null hypothesis is REJECTED
usgdp_lambda <- round(BoxCox.lambda(usgdp),5)
usgdp_BC <- BoxCox(usgdp, usgdp_lambda)
ggtsdisplay(usgdp_BC, main = paste("National GDP- BoxCox lambda = ",
usgdp_lambda))
Time series data with Box-Cox transformation still gives non-stationary data (ADF plot).The graph of the Box-Cox transformed data shows strong autocorrelation.
Box.test(usgdp_BC, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: usgdp_BC
## X-squared = 233.76, df = 1, p-value < 2.2e-16
Because the p-value is still low (p-value < 2.2e-16), the Null hypothesis is REJECTED .(Autocorrelation exists)
usgdp_BC_ndiffs <- ndiffs(usgdp_BC)
ggtsdisplay(diff(usgdp_BC), main=paste("National GDP- BoxCox lambda = ",
usgdp_lambda," - first difference"))
ggtsdisplay(diff(diff(usgdp_BC)), main=paste("National GDP - BoxCox lambda = ",
usgdp_lambda," - second difference"))
I’m going to test two different orders (1) and (2)
usgdp_BC %>% diff() -> usgdp_BC_d1
ggtsdisplay(usgdp_BC_d1, main = paste("National GDP - BoxCox lambda = ",
usgdp_lambda, "first difference"))
#Ljung-box test:
Box.test(usgdp_BC_d1, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: usgdp_BC_d1
## X-squared = 23.874, df = 1, p-value = 1.029e-06
Because the p-value is still low (p-value < 1.029e-06), the Null hypothesis is REJECTED (Autocorrelation exists).
usgdp_BC %>% diff() %>% diff() -> usgdp_BC_d2
ggtsdisplay(usgdp_BC_d2, main = paste("National GDP - BoxCox lambda = ",
usgdp_lambda, "second difference"))
#Ljung-box test:
Box.test(usgdp_BC_d2, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: usgdp_BC_d2
## X-squared = 41.715, df = 1, p-value = 1.056e-10
The test states that the p-value is less than p-value threshold (0.05), we REJECT the null hypothesis (Autocorrelation exists).
kpss.test(usgdp_BC_d1, "Level", lshort = F)
## Warning in kpss.test(usgdp_BC_d1, "Level", lshort = F): p-value greater than
## printed p-value
##
## KPSS Test for Level Stationarity
##
## data: usgdp_BC_d1
## KPSS Level = 0.23949, Truncation lag parameter = 14, p-value = 0.1
kpss.test(usgdp_BC_d1, "Trend", lshort = F)
## Warning in kpss.test(usgdp_BC_d1, "Trend", lshort = F): p-value greater than
## printed p-value
##
## KPSS Test for Trend Stationarity
##
## data: usgdp_BC_d1
## KPSS Trend = 0.034928, Truncation lag parameter = 14, p-value = 0.1
usgdp_BC_d1 %>% ur.kpss(type="mu", lags="long") %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 14 lags.
##
## Value of test-statistic is: 0.2395
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
usgdp_BC_d1 %>% ur.kpss(type="tau", lags="long") %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: tau with 14 lags.
##
## Value of test-statistic is: 0.0349
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.119 0.146 0.176 0.216
kpss.test(usgdp_BC_d2, "Level", lshort = F)
## Warning in kpss.test(usgdp_BC_d2, "Level", lshort = F): p-value greater than
## printed p-value
##
## KPSS Test for Level Stationarity
##
## data: usgdp_BC_d2
## KPSS Level = 0.035084, Truncation lag parameter = 14, p-value = 0.1
kpss.test(usgdp_BC_d2, "Trend", lshort = F)
## Warning in kpss.test(usgdp_BC_d2, "Trend", lshort = F): p-value greater than
## printed p-value
##
## KPSS Test for Trend Stationarity
##
## data: usgdp_BC_d2
## KPSS Trend = 0.034844, Truncation lag parameter = 14, p-value = 0.1
usgdp_BC_d2 %>% ur.kpss(type="mu", lags="long") %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 14 lags.
##
## Value of test-statistic is: 0.0351
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
usgdp_BC_d2 %>% ur.kpss(type="tau", lags="long") %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: tau with 14 lags.
##
## Value of test-statistic is: 0.0348
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.119 0.146 0.176 0.216
The KPSS test indicates that one differencing order(1) was sufficient to make the Box-Cox transformed data stationary.
The trendin timeseries ‘mcopper’ looks nearly linear.I will check stationary status to determine the order of differencing.
ggtsdisplay(mcopper, main = "Monthly Grade A Copper Prices-RAW SERIES")
Box.test(mcopper, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: mcopper
## X-squared = 539.54, df = 1, p-value < 2.2e-16
As we seet hat the p-value is low (p-value < 2.2e-16), the Null hypothesis is REJECTED (Autocorrelation exists).
mcopper_lambda <- round(BoxCox.lambda(mcopper),5)
mcopper_BC <- BoxCox(mcopper, mcopper_lambda)
ggtsdisplay(mcopper_BC, main = paste("Monthly Grade A Copper Prices- BoxCox lambda = ",
mcopper_lambda))
Time series data with Box-Cox transformation still gives non-stationary data(ADF plot)
Box.test(mcopper_BC, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: mcopper_BC
## X-squared = 550.42, df = 1, p-value < 2.2e-16
Because the p-value is still low (p-value < 2.2e-16), the Null hypothesis is REJECTED (Autocorrelation exists).
mcopper_BC_ndiffs <- ndiffs(mcopper_BC)
ggtsdisplay(diff(mcopper_BC), main=paste("Monthly Grade A Copper Prices - BoxCox lambda = ",
mcopper_lambda," - first difference"))
ggtsdisplay(diff(diff(mcopper_BC)), main=paste("Monthly Grade A Copper Prices - BoxCox lambda = ",
mcopper_lambda," - second difference"))
I’m going to test two different differencing : orders (1) and (2)
mcopper_BC %>% diff() -> mcopper_BC_d1
ggtsdisplay(mcopper_BC_d1, main = paste("Monthly Grade A Copper Prices - BoxCox lambda = ",
mcopper_lambda, "first difference"))
#Ljung-box test:
Box.test(mcopper_BC_d1, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: mcopper_BC_d1
## X-squared = 57.518, df = 1, p-value = 3.353e-14
The p-value is still low (p-value = 3.353e-14), the Null hypothesis is REJECTED (Autocorrelation exists).
kpss.test(mcopper_BC_d1, "Level", lshort = F)
## Warning in kpss.test(mcopper_BC_d1, "Level", lshort = F): p-value greater than
## printed p-value
##
## KPSS Test for Level Stationarity
##
## data: mcopper_BC_d1
## KPSS Level = 0.062912, Truncation lag parameter = 18, p-value = 0.1
kpss.test(mcopper_BC_d1, "Trend", lshort = F)
## Warning in kpss.test(mcopper_BC_d1, "Trend", lshort = F): p-value greater than
## printed p-value
##
## KPSS Test for Trend Stationarity
##
## data: mcopper_BC_d1
## KPSS Trend = 0.057247, Truncation lag parameter = 18, p-value = 0.1
mcopper_BC_d1 %>% ur.kpss(type="mu", lags="long") %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 18 lags.
##
## Value of test-statistic is: 0.0629
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
mcopper_BC_d1 %>% ur.kpss(type="tau", lags="long") %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: tau with 18 lags.
##
## Value of test-statistic is: 0.0572
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.119 0.146 0.176 0.216
mcopper_BC %>% diff() %>% diff() -> mcopper_BC_d2
kpss.test(mcopper_BC_d2, "Level", lshort = F)
## Warning in kpss.test(mcopper_BC_d2, "Level", lshort = F): p-value greater than
## printed p-value
##
## KPSS Test for Level Stationarity
##
## data: mcopper_BC_d2
## KPSS Level = 0.024273, Truncation lag parameter = 18, p-value = 0.1
kpss.test(mcopper_BC_d2, "Trend", lshort = F)
## Warning in kpss.test(mcopper_BC_d2, "Trend", lshort = F): p-value greater than
## printed p-value
##
## KPSS Test for Trend Stationarity
##
## data: mcopper_BC_d2
## KPSS Trend = 0.019782, Truncation lag parameter = 18, p-value = 0.1
mcopper_BC_d2 %>% ur.kpss(type="mu", lags="long") %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 18 lags.
##
## Value of test-statistic is: 0.0243
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
mcopper_BC_d2 %>% ur.kpss(type="tau", lags="long") %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: tau with 18 lags.
##
## Value of test-statistic is: 0.0198
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.119 0.146 0.176 0.216
The KPSS test indicates that one differencing ,order(1), should be used to make the mcopper
data stationary
The graph of the enplanements raw data shows strong autocorrelation as well as seasonality.I will check stationary status to determine the order of differencing.
ggtsdisplay(enplanements, main = "enplanements -RAW SERIES")
Box.test(enplanements, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: enplanements
## X-squared = 249.22, df = 1, p-value < 2.2e-16
As we seet hat the p-value is low (p-value < 2.2e-16), the Null hypothesis is REJECTED (Autocorrelation exists).
enplanements_lambda <- round(BoxCox.lambda(enplanements),5)
enplanements_BC <- BoxCox(enplanements, enplanements_lambda)
ggtsdisplay(enplanements_BC, main = paste("enplanements- BoxCox lambda = ",
enplanements_lambda))
Time series data with Box-Cox transformation still gives non-stationary data(ADF plot).
Box.test(enplanements_BC, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: enplanements_BC
## X-squared = 253.7, df = 1, p-value < 2.2e-16
Because the p-value is still low (p-value < 2.2e-16), the Null hypothesis is REJECTED (Autocorrelation exists).
enplanements_BC_ndiffs <- ndiffs(enplanements_BC)
ggtsdisplay(diff(enplanements_BC), main=paste("enplanements - BoxCox lambda = ",
enplanements_lambda," - first difference"))
ggtsdisplay(diff(diff(enplanements_BC)), main=paste("enplanements - BoxCox lambda = ",
enplanements_lambda," - second difference"))
The graph of the Box-Cox transformed data still shows strong autocorrelation and seasonality I’m going to test two different differencing : orders (1) and (2)
enplanements_BC %>% diff() -> enplanements_BC_d1
ggtsdisplay(enplanements_BC_d1, main = paste("enplanements - BoxCox lambda = ",
enplanements_lambda, "first difference"))
#Ljung-box test:
Box.test(enplanements_BC_d1, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: enplanements_BC_d1
## X-squared = 16.55, df = 1, p-value = 4.739e-05
The p-value is still low (p-value = 4.739e-05), the Null hypothesis is REJECTED (Autocorrelation exists).
enplanements_BC_nsdiffs <- nsdiffs(enplanements_BC)
print(paste("Number of SEASONAL differences : enplanements_BC:", enplanements_BC_nsdiffs))
## [1] "Number of SEASONAL differences : enplanements_BC: 1"
enplanements_BC_ndiffs <- ndiffs(enplanements_BC)
print(paste("Number of differences : enplanements_BC:",
enplanements_BC_ndiffs))
## [1] "Number of differences : enplanements_BC: 1"
ggtsdisplay(diff(enplanements_BC),
main=paste("enplanements - BoxCox lambda = ",
enplanements_lambda," - first difference"))
ggtsdisplay(diff(enplanements,lag=3),
main=paste("enplanements - BoxCox lambda = ",
enplanements_lambda," - quarterly seasonal diff"))
ggtsdisplay(diff(enplanements,lag=12),
main=paste("enplanements - BoxCox lambda = ",
enplanements_lambda," - annual seasonal diff"))
ggtsdisplay(diff(diff(enplanements,lag=3),lag=1),
main=paste("enplanements - BoxCox lambda = ",
enplanements_lambda," - quarterly seasonal diff + first diff"))
ggtsdisplay(diff(diff(enplanements,lag=12),lag=1),
main=paste("enplanements - BoxCox lambda = ",
enplanements_lambda," - annual seasonal diff + first diff"))
The result of Box-Cox transformation with annual seasonal differencing, looks like to provide the best result.
enplanements_BC %>% diff(lag=12) %>% diff(lag=1) -> enplanements_BC_s1_d1
kpss.test(enplanements_BC_s1_d1, "Level", lshort = F)
## Warning in kpss.test(enplanements_BC_s1_d1, "Level", lshort = F): p-value
## greater than printed p-value
##
## KPSS Test for Level Stationarity
##
## data: enplanements_BC_s1_d1
## KPSS Level = 0.074682, Truncation lag parameter = 15, p-value = 0.1
kpss.test(enplanements_BC_s1_d1, "Trend", lshort = F)
## Warning in kpss.test(enplanements_BC_s1_d1, "Trend", lshort = F): p-value
## greater than printed p-value
##
## KPSS Test for Trend Stationarity
##
## data: enplanements_BC_s1_d1
## KPSS Trend = 0.03852, Truncation lag parameter = 15, p-value = 0.1
enplanements_BC_s1_d1 %>% ur.kpss(type="mu", lags="long") %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 15 lags.
##
## Value of test-statistic is: 0.0747
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
enplanements_BC_s1_d1 %>% ur.kpss(type="tau", lags="long") %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: tau with 15 lags.
##
## Value of test-statistic is: 0.0385
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.119 0.146 0.176 0.216
enplanements_BC %>% diff(lag=12) %>% diff(lag=2) -> enplanements_BC_s1_d2
kpss.test(enplanements_BC_s1_d2, "Level", lshort = F)
## Warning in kpss.test(enplanements_BC_s1_d2, "Level", lshort = F): p-value
## greater than printed p-value
##
## KPSS Test for Level Stationarity
##
## data: enplanements_BC_s1_d2
## KPSS Level = 0.10006, Truncation lag parameter = 15, p-value = 0.1
kpss.test(enplanements_BC_s1_d2, "Trend", lshort = F)
## Warning in kpss.test(enplanements_BC_s1_d2, "Trend", lshort = F): p-value
## greater than printed p-value
##
## KPSS Test for Trend Stationarity
##
## data: enplanements_BC_s1_d2
## KPSS Trend = 0.040659, Truncation lag parameter = 15, p-value = 0.1
enplanements_BC_s1_d2 %>% ur.kpss(type="mu", lags="long") %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 15 lags.
##
## Value of test-statistic is: 0.1001
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
enplanements_BC_s1_d2 %>% ur.kpss(type="tau", lags="long") %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: tau with 15 lags.
##
## Value of test-statistic is: 0.0407
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.119 0.146 0.176 0.216
The KPSS test indicates that annual seasonality and first differences,order(1), should be used to make the enplanements
data stationary
The graph of the visitors raw data shows strong autocorrelation as well as seasonality.I will check stationary status to determine the order of differencing.
ggtsdisplay(visitors, main = "visitors -RAW SERIES")
Box.test(visitors, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: visitors
## X-squared = 195.64, df = 1, p-value < 2.2e-16
As we seet hat the p-value is low (p-value < 2.2e-16), the Null hypothesis is REJECTED (Autocorrelation exists).
visitors_lambda <- round(BoxCox.lambda(visitors),5)
visitors_BC <- BoxCox(visitors, visitors_lambda)
ggtsdisplay(visitors_BC, main = paste("visitors- BoxCox lambda = ",
visitors_lambda))
Time series data with Box-Cox transformation still gives non-stationary data(ADF plot).
Box.test(visitors_BC, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: visitors_BC
## X-squared = 205.74, df = 1, p-value < 2.2e-16
Because the p-value is still low (p-value < 2.2e-16), the Null hypothesis is REJECTED (Autocorrelation exists).
visitors_BC_ndiffs <- ndiffs(visitors_BC)
ggtsdisplay(diff(visitors_BC), main=paste("visitors - BoxCox lambda = ",
visitors_lambda," - first difference"))
ggtsdisplay(diff(diff(visitors_BC)), main=paste("visitors - BoxCox lambda = ",
visitors_lambda," - second difference"))
The graph of the Box-Cox transformed data still shows strong autocorrelation and seasonality .I’m going to test two different differencing : orders (1) and (2)
visitors_BC %>% diff() -> visitors_BC_d1
ggtsdisplay(visitors_BC_d1, main = paste("visitors - BoxCox lambda = ",
visitors_lambda, "first difference"))
#Ljung-box test:
Box.test(visitors_BC_d1, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: visitors_BC_d1
## X-squared = 21.922, df = 1, p-value = 2.839e-06
The p-value is still low (p-value = 9.987e-05), the Null hypothesis is REJECTED (Autocorrelation exists).
visitors_BC_nsdiffs <- nsdiffs(visitors_BC)
print(paste("Number of SEASONAL differences for visitors_BC:", visitors_BC_nsdiffs))
## [1] "Number of SEASONAL differences for visitors_BC: 1"
visitors_BC_ndiffs <- ndiffs(visitors_BC)
print(paste("Number of differences suggested visitors_BC:",
visitors_BC_ndiffs))
## [1] "Number of differences suggested visitors_BC: 1"
ggtsdisplay(diff(visitors_BC),
main=paste("visitors - BoxCox lambda = ",
visitors_lambda," - first difference"))
ggtsdisplay(diff(visitors,lag=3),
main=paste("visitors - BoxCox lambda = ",
visitors_lambda," - quarterly seasonal diff"))
ggtsdisplay(diff(visitors,lag=12),
main=paste("visitors - BoxCox lambda = ",
visitors_lambda," - annual seasonal diff"))
ggtsdisplay(diff(diff(enplanements,lag=3),lag=1),
main=paste("visitors - BoxCox lambda = ",
visitors_lambda," - quarterly seasonal diff + first diff"))
ggtsdisplay(diff(diff(visitors,lag=12),lag=1),
main=paste("visitors - BoxCox lambda = ",
visitors_lambda," - annual seasonal diff + first diff"))
The result of Box-Cox transformation with annual seasonal differencing, looks like to provide the best result.Howevers some lags (1,12) are exremly high.
visitors_BC %>% diff(lag=12) %>% diff(lag=1) -> visitors_BC_s1_d1
kpss.test(visitors_BC_s1_d1, "Level", lshort = F)
## Warning in kpss.test(visitors_BC_s1_d1, "Level", lshort = F): p-value greater
## than printed p-value
##
## KPSS Test for Level Stationarity
##
## data: visitors_BC_s1_d1
## KPSS Level = 0.032281, Truncation lag parameter = 14, p-value = 0.1
kpss.test(visitors_BC_s1_d1, "Trend", lshort = F)
## Warning in kpss.test(visitors_BC_s1_d1, "Trend", lshort = F): p-value greater
## than printed p-value
##
## KPSS Test for Trend Stationarity
##
## data: visitors_BC_s1_d1
## KPSS Trend = 0.032061, Truncation lag parameter = 14, p-value = 0.1
visitors_BC_s1_d1 %>% ur.kpss(type="mu", lags="long") %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 14 lags.
##
## Value of test-statistic is: 0.0323
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
visitors_BC_s1_d1 %>% ur.kpss(type="tau", lags="long") %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: tau with 14 lags.
##
## Value of test-statistic is: 0.0321
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.119 0.146 0.176 0.216
visitors_BC %>% diff(lag=12) %>% diff(lag=2) -> visitors_BC_s1_d2
kpss.test(visitors_BC_s1_d2, "Level", lshort = F)
## Warning in kpss.test(visitors_BC_s1_d2, "Level", lshort = F): p-value greater
## than printed p-value
##
## KPSS Test for Level Stationarity
##
## data: visitors_BC_s1_d2
## KPSS Level = 0.047141, Truncation lag parameter = 14, p-value = 0.1
kpss.test(visitors_BC_s1_d2, "Trend", lshort = F)
## Warning in kpss.test(visitors_BC_s1_d2, "Trend", lshort = F): p-value greater
## than printed p-value
##
## KPSS Test for Trend Stationarity
##
## data: visitors_BC_s1_d2
## KPSS Trend = 0.035599, Truncation lag parameter = 14, p-value = 0.1
visitors_BC_s1_d2 %>% ur.kpss(type="mu", lags="long") %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 14 lags.
##
## Value of test-statistic is: 0.0471
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
visitors_BC_s1_d2 %>% ur.kpss(type="tau", lags="long") %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: tau with 14 lags.
##
## Value of test-statistic is: 0.0356
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.119 0.146 0.176 0.216
The KPSS test shows that annual seasonality with Box Cox transformation and first differences,order(1), should be used to make the visitors
data stationary
Use R to simulate and plot some data from simple ARIMA models.
Use the following R code to generate data from an AR(1) model with \(\phi_1=0.6\) and \(\sigma^2=1\). The process starts with \(y_1=0\).
AR1 <- function(phi)
{
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
y[i] <- phi*y[i-1] + e[i]
return(y)
}
phi_1 <- 0.60
ggtsdisplay(AR1(phi_1),main=paste("AR(1) series, phi_1=",phi_1))
Produce a time plot for the series. How does the plot change as you change \(\phi_1\)?
set.seed(12345678)
results = numeric()
phi_seq <- seq(from=-1,to=1,by = 0.1)
for (phi_1 in phi_seq) {
AR1_data <- AR1(phi_1)
AR1_stats <- c(Phi_1 = phi_1, summary(AR1_data),StDev=sd(AR1_data),SdDiff=sd(diff(AR1_data)))
AR1_main <- paste("AR(1) series, phi_1=",phi_1)
ggtsdisplay(AR1_data,main=AR1_main)
results <- rbind(results,AR1_stats)
}
results
## Phi_1 Min. 1st Qu. Median Mean 3rd Qu.
## AR1_stats -1.0 -14.062104 -9.6874291 0.809199676 -0.019024279 9.9235214
## AR1_stats -0.9 -4.443402 -1.1535149 0.140064912 0.095010441 1.1571075
## AR1_stats -0.8 -3.199613 -0.6358272 -0.016910471 0.042523719 0.7985269
## AR1_stats -0.7 -3.859066 -0.6950367 0.119200314 -0.033535258 0.7557429
## AR1_stats -0.6 -1.886435 -0.4584182 0.028648389 0.221188607 0.8664897
## AR1_stats -0.5 -2.766594 -0.8245263 0.031009976 0.042509026 0.9132246
## AR1_stats -0.4 -2.990606 -0.8940255 -0.064001374 -0.101253074 0.5582459
## AR1_stats -0.3 -2.819765 -0.5961379 0.041449779 0.039690417 0.6627684
## AR1_stats -0.2 -2.930321 -0.7490779 0.040460473 -0.021635716 0.8507102
## AR1_stats -0.1 -2.013560 -0.6376682 0.007097826 -0.005197204 0.6251735
## AR1_stats 0.0 -2.419691 -0.4837971 0.175597239 0.185282167 0.9042786
## AR1_stats 0.1 -2.534696 -0.6939627 0.119884378 0.091004011 0.6623034
## AR1_stats 0.2 -2.270066 -0.6670993 0.182851309 0.119082644 0.7244989
## AR1_stats 0.3 -2.228579 -0.8331157 -0.119490306 -0.146005580 0.5389209
## AR1_stats 0.4 -3.097211 -0.7838577 0.058539522 0.032670534 0.7902225
## AR1_stats 0.5 -2.180207 -0.4394042 0.010093219 0.137006940 0.8185993
## AR1_stats 0.6 -3.235804 -0.9358224 -0.123176998 -0.232467066 0.5582484
## AR1_stats 0.7 -3.114780 -0.9351570 0.076418353 0.186031378 1.4078848
## AR1_stats 0.8 -4.604911 -1.5648592 -0.634706373 -0.701113205 0.4273913
## AR1_stats 0.9 -4.107645 -0.8585411 0.263087693 0.126812428 1.4168184
## AR1_stats 1.0 -5.992400 -2.4326263 -1.277668824 -0.822016997 0.4688218
## Max. StDev SdDiff
## AR1_stats 13.653510 9.5012316 19.0611000
## AR1_stats 5.128250 1.8824653 3.6133126
## AR1_stats 3.590143 1.3208697 2.4699870
## AR1_stats 3.165380 1.2467545 2.2241655
## AR1_stats 2.957920 1.0117494 1.6856881
## AR1_stats 3.286173 1.1927434 2.0849159
## AR1_stats 2.437646 1.0867314 1.8239736
## AR1_stats 2.298705 1.0318553 1.5735992
## AR1_stats 2.029409 1.0736376 1.7381326
## AR1_stats 2.704765 0.9346972 1.3756997
## AR1_stats 2.877745 0.9946989 1.5105604
## AR1_stats 2.896166 1.0347156 1.3928242
## AR1_stats 2.289869 1.0006973 1.2175162
## AR1_stats 2.065666 0.9489532 1.0876201
## AR1_stats 2.682038 1.1278613 1.2356588
## AR1_stats 2.618272 1.0019356 1.0561522
## AR1_stats 3.129357 1.2708693 1.0924445
## AR1_stats 3.664113 1.5523935 1.1768035
## AR1_stats 2.065503 1.4541925 0.9896369
## AR1_stats 3.365740 1.6728732 1.0653087
## AR1_stats 6.131975 2.5611879 0.9524565
Write your own code to generate data from an MA(1) model with \(\theta_1=0.6\) and \(\sigma^2=1\).
MA1 <- function(theta_1)
{
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
y[i] <- e[i] + theta_1*e[i-1]
return(y)
}
theta_1 <- 0.60
ggtsdisplay(MA1(theta_1),main=paste("MA(1) series, theta_1=",theta_1))
Produce a time plot for the series. How does the plot change as you change \(\theta_1\) ?
set.seed(12345678)
results = numeric()
theta_seq <- seq(from=-1,to=1,by = 0.1)
for (theta_1 in theta_seq) {
MA1_data <- MA1(theta_1)
MA1_stats <- c(theta_1 = theta_1, summary(MA1_data),StDev=sd(MA1_data),SdDiff=sd(diff(MA1_data)))
MA1_main <- paste("MA(1) series, theta_1=",theta_1)
ggtsdisplay(MA1_data,main=MA1_main)
#print(MA1_stats)
MA1_stats_results <- rbind(results,MA1_stats)
}
results
## numeric(0)
Generate data from an ARMA(1,1) model with \(\phi_1=0.6\), \(\theta_1=0.6\), and \(\sigma^2=1\).
ARMA_1_1 <- function(phi_1, theta_1)
{
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
y[i] <- phi_1*y[i-1] + theta_1*e[i-1] + e[i]
return(y)
}
phi_1 <- 0.60
theta_1 <- 0.60
ARMA_1_1_result <- ARMA_1_1(phi_1,theta_1)
ggtsdisplay(ARMA_1_1_result,main=paste("ARMA(1,1) series, phi_1=", phi_1, ", theta_1=",theta_1))
Generate data from an AR(2) model with \(\phi_1=-0.8\), \(\phi_2=0.3\), and \(\sigma^2=1\).(Note that these parameters will give a non-stationary series.)
AR2 <- function(phi_1, phi_2)
{
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 3:100)
y[i] <- phi_1*y[i-1] + phi_2*y[i-2] + e[i]
return(y)
}
phi_1 <- -0.80
phi_2 <- 0.30
AR2_result <- AR2(phi_1,phi_2)
ggtsdisplay(AR2_result,main=paste("AR(2) series, phi_1=", phi_1, ", phi_2=",phi_2))
Graph the latter two series and compare them.
n=100
par(mfrow=c(2,1))
plot(ARMA_1_1_result[1:n], type="l", main="ARMA(1,1) - 100 observations",col="blue")
plot(AR2_result[1:n], type="l", main="AR(2) - 100 observations",col="red")
Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.
By studying appropriate graphs of the series in R, find an appropriate ARIMA(p,d,q) model for these data.
wmurders %>% ggtsdisplay(main="wmurders (raw dataset)")
The raw data series doesn’t indicates any trend (e.g., seasonality,extreme values.)There is a strong pattern in the ACF function which might cause also pattern in PACF
(KPSS) test:
library(tseries)
kpss.test(wmurders, "Level", lshort = F)
## Warning in kpss.test(wmurders, "Level", lshort = F): p-value greater than
## printed p-value
##
## KPSS Test for Level Stationarity
##
## data: wmurders
## KPSS Level = 0.28623, Truncation lag parameter = 10, p-value = 0.1
kpss.test(wmurders, "Trend", lshort = F)
##
## KPSS Test for Trend Stationarity
##
## data: wmurders
## KPSS Trend = 0.1591, Truncation lag parameter = 10, p-value = 0.03909
wmurders %>% ur.kpss(type="mu", lags="long") %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 10 lags.
##
## Value of test-statistic is: 0.2862
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
wmurders %>% ur.kpss(type="tau", lags="long") %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: tau with 10 lags.
##
## Value of test-statistic is: 0.1591
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.119 0.146 0.176 0.216
Level passes the test, but Trend fails.
ndiffs:
ndiffs(wmurders)
## [1] 2
I(1) first differences
wmurders_d1 <- wmurders %>% diff()
wmurders_d1 %>% ggtsdisplay(main="wmurders (first differences)")
(KPSS) test:
kpss.test(wmurders_d1, "Level", lshort = F)
## Warning in kpss.test(wmurders_d1, "Level", lshort = F): p-value greater than
## printed p-value
##
## KPSS Test for Level Stationarity
##
## data: wmurders_d1
## KPSS Level = 0.33134, Truncation lag parameter = 10, p-value = 0.1
kpss.test(wmurders_d1, "Trend", lshort = F)
## Warning in kpss.test(wmurders_d1, "Trend", lshort = F): p-value greater than
## printed p-value
##
## KPSS Test for Trend Stationarity
##
## data: wmurders_d1
## KPSS Trend = 0.11631, Truncation lag parameter = 10, p-value = 0.1
wmurders_d1 %>% ur.kpss(type="mu", lags="long") %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 10 lags.
##
## Value of test-statistic is: 0.3313
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
wmurders_d1 %>% ur.kpss(type="tau", lags="long") %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: tau with 10 lags.
##
## Value of test-statistic is: 0.1163
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.119 0.146 0.176 0.216
I(2) Second differences
wmurders_d2 <- wmurders %>% diff() %>% diff()
wmurders_d2 %>% ggtsdisplay(main="wmurders (second differences)")
(KPSS) test:
kpss.test(wmurders_d2, "Level", lshort = F)
## Warning in kpss.test(wmurders_d2, "Level", lshort = F): p-value greater than
## printed p-value
##
## KPSS Test for Level Stationarity
##
## data: wmurders_d2
## KPSS Level = 0.12299, Truncation lag parameter = 10, p-value = 0.1
kpss.test(wmurders_d2, "Trend", lshort = F)
## Warning in kpss.test(wmurders_d2, "Trend", lshort = F): p-value greater than
## printed p-value
##
## KPSS Test for Trend Stationarity
##
## data: wmurders_d2
## KPSS Trend = 0.10938, Truncation lag parameter = 10, p-value = 0.1
wmurders_d2 %>% ur.kpss(type="mu", lags="long") %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 10 lags.
##
## Value of test-statistic is: 0.123
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
wmurders_d2 %>% ur.kpss(type="tau", lags="long") %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: tau with 10 lags.
##
## Value of test-statistic is: 0.1094
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.119 0.146 0.176 0.216
The differecing I(2) passes the KPSS test.
The differecing term,d,is 2 so our model will be ARIMA(p,2,q).For the AR and MA terms,we need to look at the ACF and PASF plots.There is as spike in lag 1 so the model may be ARIMA(1,2,1)
Should you include a constant in the model? Explain.
WE should Not include constant in the model, because a constant would create a shift, and there is no shift visible in the raw data.
Write this model in terms of the backshift operator.
We omitted the constat term :
\((1-\phi_1B) (1-B)^2 y_{t} = (1 + \theta_1 B )\varepsilon_t\)
Fit the model using R and examine the residuals. Is the model satisfactory?
my_arima <- Arima(wmurders, order=c(1,2,1))
checkresiduals(my_arima)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,2,1)
## Q* = 12.419, df = 8, p-value = 0.1335
##
## Model df: 2. Total lags used: 10
The residuals plot shows that it is normal distributed however p-values greater than 0.10. The model is satisfactory
Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.
wmurders_forecast <- forecast(my_arima, h = 3)
wmurders_forecast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2005 2.470660 2.194836 2.746484 2.048824 2.892496
## 2006 2.363106 1.986351 2.739862 1.786908 2.939304
## 2007 2.252833 1.765391 2.740276 1.507354 2.998313
Create a plot of the series with forecasts and prediction intervals for the next three periods shown.
autoplot(wmurders_forecast)
Does auto.arima()
give the same model you have chosen? If not, which model do you think is better?
wmurders_fit_autoarima <- auto.arima(wmurders)
wmurders_fit_autoarima
## Series: wmurders
## ARIMA(1,2,1)
##
## Coefficients:
## ar1 ma1
## -0.2434 -0.8261
## s.e. 0.1553 0.1143
##
## sigma^2 estimated as 0.04632: log likelihood=6.44
## AIC=-6.88 AICc=-6.39 BIC=-0.97
wmurders_forecast_autoarima <- forecast(wmurders_fit_autoarima, h = 3)
wmurders_forecast_autoarima
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2005 2.470660 2.194836 2.746484 2.048824 2.892496
## 2006 2.363106 1.986351 2.739862 1.786908 2.939304
## 2007 2.252833 1.765391 2.740276 1.507354 2.998313
autoplot(wmurders_forecast_autoarima)
The model selected is the same as chosen.