Homework Questions

Problem 1

  1. Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
  1. Explain the differences among these figures. Do they all indicate that the data are white noise?

Yes, all three plots indicate that the data are white noise since a majority of the data are below the blue dashed lines. The blue dashed lines on the plots show values significantly different than zero. Since only a few spikes go outside the blue dashed lines, we can conclude the data are white noise.

  1. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The critical values are at different magnitudes since the magnitudes are calculated by +/-2/sqrt(T), where T is the length of time. We expect 95% of the spikes on the plots to be within those bounds. As time increases, the critical values decrease which implies a dataset requires higher autocorrelation to reject the null hypothesis.

library(aTSA)
library(forecast)
library(fpp2)

Problem 2

  1. A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
ggtsdisplay(ibmclose)

As you can see from the above plots, the ACF plot shows the data is autocorrelated by the lag values decreasing slowly and the first lag value is also near the value of 1. This implies the data is not stationary.The PACF shows us there is a strong correlation between the the IBM’s closing price and the first lag value. This implies the future closing price is predicted by the first lagged value and therefore not stationary.

Problem 3

  1. For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
  1. usnetelec

Below we plot the raw data. We see the data has a rough positive, linear relationship. When we look at the ACF plot, we note a slow decrease and most of the values are greater than the critical value. We use the Box-Cox function in R to find the best transformation. We then use the ndiffs function to calculate the order of differencing. The order of differencing shows we must difference the lag values twice. We now perform an Augmented Dickey Fuller (ADF) test after the transformations. We note the data is now stationary. The final plots are of the transformed data.

ggtsdisplay(usnetelec)

df_3a<-BoxCox(usnetelec, lambda=BoxCox.lambda(usnetelec))
ndiffs(df_3a)
## [1] 2
adf.test(diff(diff(df_3a)))
## Augmented Dickey-Fuller Test 
## alternative: stationary 
##  
## Type 1: no drift no trend 
##      lag    ADF p.value
## [1,]   0 -10.65    0.01
## [2,]   1  -7.35    0.01
## [3,]   2  -8.29    0.01
## [4,]   3  -5.40    0.01
## Type 2: with drift no trend 
##      lag    ADF p.value
## [1,]   0 -10.55    0.01
## [2,]   1  -7.29    0.01
## [3,]   2  -8.23    0.01
## [4,]   3  -5.36    0.01
## Type 3: with drift and trend 
##      lag    ADF p.value
## [1,]   0 -10.44    0.01
## [2,]   1  -7.24    0.01
## [3,]   2  -8.18    0.01
## [4,]   3  -5.35    0.01
## ---- 
## Note: in fact, p.value = 0.01 means p.value <= 0.01
ggtsdisplay(diff(diff(df_3a)))

  1. usgdp

Below we plot the raw data. We see the data has a rough positive, linear relationship. When we look at the ACF plot, we note a slow decrease and most of the values are greater than the critical value. We use the Box-Cox function in R to find the best transformation. We then use the ndiffs function to calculate the order of differencing. The order of differencing shows we must difference the lag values once. We now perform an ADF test after the transformations. We note the data is now stationary. The final plots are of the transformed data.

ggtsdisplay(usgdp)

df_3b<- BoxCox(usgdp, lambda=BoxCox.lambda(usgdp))
ndiffs(df_3b)
## [1] 1
adf.test(diff(df_3b))
## Augmented Dickey-Fuller Test 
## alternative: stationary 
##  
## Type 1: no drift no trend 
##      lag   ADF p.value
## [1,]   0 -7.16    0.01
## [2,]   1 -4.74    0.01
## [3,]   2 -4.27    0.01
## [4,]   3 -3.74    0.01
## [5,]   4 -3.42    0.01
## Type 2: with drift no trend 
##      lag    ADF p.value
## [1,]   0 -10.97    0.01
## [2,]   1  -7.90    0.01
## [3,]   2  -7.58    0.01
## [4,]   3  -7.12    0.01
## [5,]   4  -6.97    0.01
## Type 3: with drift and trend 
##      lag    ADF p.value
## [1,]   0 -11.05    0.01
## [2,]   1  -7.97    0.01
## [3,]   2  -7.69    0.01
## [4,]   3  -7.24    0.01
## [5,]   4  -7.13    0.01
## ---- 
## Note: in fact, p.value = 0.01 means p.value <= 0.01
ggtsdisplay(diff(df_3b))

  1. mcopper

Below we plot the raw data. We see the data has a rough positive, linear relationship. The data grows in value slowly. When we look at the ACF plot, we note a slow decrease and most of the values are greater than the critical value. We use the Box-Cox function in R to find the best transformation. We then use the ndiffs function to calculate the order of differencing. The order of differencing shows we must difference the lag values once. We now perform an ADF test after the transformations. We note the data is now stationary. The final plots are of the transformed data.

ggtsdisplay(mcopper)

df_3c<- BoxCox(mcopper, lambda=BoxCox.lambda(mcopper))
ndiffs(df_3c)
## [1] 1
adf.test(diff(df_3c))
## Augmented Dickey-Fuller Test 
## alternative: stationary 
##  
## Type 1: no drift no trend 
##      lag    ADF p.value
## [1,]   0 -16.90    0.01
## [2,]   1 -15.55    0.01
## [3,]   2 -12.19    0.01
## [4,]   3 -10.78    0.01
## [5,]   4  -9.63    0.01
## [6,]   5  -8.58    0.01
## Type 2: with drift no trend 
##      lag    ADF p.value
## [1,]   0 -16.95    0.01
## [2,]   1 -15.62    0.01
## [3,]   2 -12.27    0.01
## [4,]   3 -10.87    0.01
## [5,]   4  -9.73    0.01
## [6,]   5  -8.68    0.01
## Type 3: with drift and trend 
##      lag    ADF p.value
## [1,]   0 -16.94    0.01
## [2,]   1 -15.62    0.01
## [3,]   2 -12.27    0.01
## [4,]   3 -10.87    0.01
## [5,]   4  -9.73    0.01
## [6,]   5  -8.68    0.01
## ---- 
## Note: in fact, p.value = 0.01 means p.value <= 0.01
ggtsdisplay(diff(df_3c))

  1. enplanements

Below we plot the raw data. We see the data has a rough positive, linear relationship. The data grows in value over time and we see seasonality as well. When we look at the ACF plot, we note a slow decrease and most of the values are greater than the critical value. We use the Box-Cox function in R to find the best transformation. We then use the ndiffs function to calculate the order of differencing. The order of differencing shows we must difference the lag values once. We now perform an ADF test after the transformations. We note the data is now stationary. The final plots are of the transformed data.

ggtsdisplay(enplanements)

df_3d<- BoxCox(enplanements, lambda=BoxCox.lambda(enplanements))
ndiffs(df_3d)
## [1] 1
adf.test(diff(df_3d))
## Augmented Dickey-Fuller Test 
## alternative: stationary 
##  
## Type 1: no drift no trend 
##      lag    ADF p.value
## [1,]   0 -21.37    0.01
## [2,]   1 -15.35    0.01
## [3,]   2 -13.97    0.01
## [4,]   3  -8.04    0.01
## [5,]   4  -7.43    0.01
## [6,]   5 -12.13    0.01
## Type 2: with drift no trend 
##      lag    ADF p.value
## [1,]   0 -21.36    0.01
## [2,]   1 -15.35    0.01
## [3,]   2 -14.00    0.01
## [4,]   3  -8.06    0.01
## [5,]   4  -7.45    0.01
## [6,]   5 -12.19    0.01
## Type 3: with drift and trend 
##      lag    ADF p.value
## [1,]   0 -21.32    0.01
## [2,]   1 -15.32    0.01
## [3,]   2 -13.98    0.01
## [4,]   3  -8.04    0.01
## [5,]   4  -7.43    0.01
## [6,]   5 -12.17    0.01
## ---- 
## Note: in fact, p.value = 0.01 means p.value <= 0.01
ggtsdisplay(diff(df_3d))

  1. visitors

Below we plot the raw data. We see the data has a rough positive, linear relationship. The data grows in value over time and we see seasonality as well. When we look at the ACF plot, we note a slow decrease and most of the values are greater than the critical value. We use the Box-Cox function in R to find the best transformation. We then use the ndiffs function to calculate the order of differencing. The order of differencing shows we must difference the lag values once. We now perform an ADF test after the transformations. We note the data is now stationary. The final plots are of the transformed data.

ggtsdisplay(visitors)

df_3e<- BoxCox(visitors, lambda=BoxCox.lambda(visitors))
ndiffs(df_3e)
## [1] 1
adf.test(diff(df_3e))
## Augmented Dickey-Fuller Test 
## alternative: stationary 
##  
## Type 1: no drift no trend 
##      lag    ADF p.value
## [1,]   0 -20.90    0.01
## [2,]   1 -15.29    0.01
## [3,]   2 -13.01    0.01
## [4,]   3  -8.88    0.01
## [5,]   4  -6.40    0.01
## Type 2: with drift no trend 
##      lag    ADF p.value
## [1,]   0 -20.91    0.01
## [2,]   1 -15.34    0.01
## [3,]   2 -13.09    0.01
## [4,]   3  -8.97    0.01
## [5,]   4  -6.45    0.01
## Type 3: with drift and trend 
##      lag    ADF p.value
## [1,]   0 -20.89    0.01
## [2,]   1 -15.33    0.01
## [3,]   2 -13.10    0.01
## [4,]   3  -8.99    0.01
## [5,]   4  -6.46    0.01
## ---- 
## Note: in fact, p.value = 0.01 means p.value <= 0.01
ggtsdisplay(diff(df_3e))

Problem 8

  1. Consider austa, the total international visitors to Australia (in millions) for the period 1980-2015.
  1. Use auto.arima() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.
ggtsdisplay(austa)

df_auto<-auto.arima(austa)
df_auto
## Series: austa 
## ARIMA(0,1,1) with drift 
## 
## Coefficients:
##          ma1   drift
##       0.3006  0.1735
## s.e.  0.1647  0.0390
## 
## sigma^2 = 0.03376:  log likelihood = 10.62
## AIC=-15.24   AICc=-14.46   BIC=-10.57
ggtsdisplay(residuals(df_auto),main="(ARIMA(0,1,1) model residuals")

df_auto%>%forecast(h=10,level=95)%>%autoplot()

The auto.arima() model outputs the values arima(0,1,1) as the suggested model. The residuals show they are white noise.The final plot is the forecast for the next 10 periods.

  1. Plot forecasts from an ARIMA(0,1,1) model with no drift and compare these to part a. Remove the MA term and plot again.
df_8b<-austa-0.1735
df_8bfit<-auto.arima(df_8b,seasonal = FALSE)
df_8bfit%>%forecast(h=10,level=95)%>%autoplot()+ggtitle("Forecasts from ARIMA(0,1,1) with no drift")

df_noma<-arima(austa,order = c(0,1,0),method="ML")
df_noma%>%forecast(h=10,level=95)%>%autoplot()

From the previous problem, we calculated the drift value to be 0.1735. We subtract the drift value from the data to remove the drift from the data. We also compute the model without he MA term. We note in the second model, the prediction does not grow over time.

  1. Plot forecasts from an ARIMA(2,1,3) model with drift. Remove the constant and see what happens.
df_8c<-arima(austa,order = c(2,1,3),method="ML")
df_8c%>%forecast(h=10,level=0.95)%>%autoplot()

df_mean<-austa-mean(austa)
df_nocon<-arima(df_mean,order = c(2,1,3),method="ML")
df_nocon%>%forecast(h=10,level=0.95)%>%autoplot()

When we remove the constant, the dataset shifts to lower values. The relative shapes of the plots remain the same.

  1. Plot forecasts from an ARIMA(0,0,1) model with a constant. Remove the MA term and plot again.
df_8d<-arima(austa,order = c(0,0,1),method="ML")
df_8d%>%forecast(h=10,level=0.95)%>%autoplot()

df_noma2<-arima(austa,order = c(0,0,0),method="ML")
df_noma2%>%forecast(h=10,level=0.95)%>%autoplot()

As you can see, the forecast drastically drop and remain constant at the mean of the dataset.

  1. Plot forecasts from an ARIMA(0,2,1) model with no constant.
df_nocon2<-arima(df_mean,order = c(0,2,1),method="ML")
df_nocon2%>%forecast(h=10,level=0.95)%>%autoplot()

As you can see from the plot, the forecast shows an increasing trend. The 95% confidence interval widens more than the ARIMA model with (0,1,1).

Technical Notes

sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19044)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] expsmooth_2.3 fma_2.4       ggplot2_3.3.5 fpp2_2.4      forecast_8.16
## [6] aTSA_3.1.2   
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.8        lattice_0.20-45   zoo_1.8-9         assertthat_0.2.1 
##  [5] digest_0.6.29     lmtest_0.9-39     utf8_1.2.2        R6_2.5.1         
##  [9] evaluate_0.15     highr_0.9         pillar_1.7.0      rlang_1.0.2      
## [13] curl_4.3.2        rstudioapi_0.13   fracdiff_1.5-1    TTR_0.24.3       
## [17] jquerylib_0.1.4   rmarkdown_2.12    labeling_0.4.2    stringr_1.4.0    
## [21] munsell_0.5.0     compiler_4.1.1    xfun_0.30         pkgconfig_2.0.3  
## [25] urca_1.3-0        htmltools_0.5.2   nnet_7.3-17       tidyselect_1.1.2 
## [29] tibble_3.1.6      quadprog_1.5-8    fansi_1.0.2       crayon_1.5.0     
## [33] dplyr_1.0.8       withr_2.5.0       grid_4.1.1        nlme_3.1-155     
## [37] jsonlite_1.8.0    gtable_0.3.0      lifecycle_1.0.1   DBI_1.1.2        
## [41] magrittr_2.0.2    scales_1.1.1      quantmod_0.4.18   cli_3.2.0        
## [45] stringi_1.7.6     farver_2.1.0      tseries_0.10-49   timeDate_3043.102
## [49] bslib_0.3.1       ellipsis_0.3.2    xts_0.12.1        generics_0.1.2   
## [53] vctrs_0.3.8       tools_4.1.1       glue_1.6.2        purrr_0.3.4      
## [57] parallel_4.1.1    fastmap_1.1.0     yaml_2.3.5        colorspace_2.0-3 
## [61] knitr_1.37        sass_0.4.0