library(fpp2)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## ── Attaching packages ───────────────────────────────────────────────────────────────────────────────────────────── fpp2 2.4 ──
## ✓ ggplot2   3.3.2     ✓ fma       2.4  
## ✓ forecast  8.13      ✓ expsmooth 2.3
## 
library(gridExtra)
library(ggplot2)

Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

a. Explain the differences among these figures. Do they all indicate that the data are white noise?

Answer: All the spikes are within the bands indicating that there is no auto-correlation at any lags.

b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

Answer: The higher the sample size the lower the critical values. The auto-correlations are different as they refer to different simulations of white noise. Sometimes by chance there will be more auto-correlation at different lags, not all white noises draws are the same. What matters though is that there are no significant auto-correlations.

2. A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

ggtsdisplay(ibmclose)

Answer: The price plot shows some upward and downward trend hence it is not stationary.
The ACF shows significant auto-correlations at all lags , hence data comes from a non stationary process. The PACF of almost 1 on lag 1 implies a presence of auto-regressive term AR(1). Presence of AR terms means the process is not stationary.

ggtsdisplay(diff(ibmclose))

Answer: The differenced series is stationary as the autocorrelations (for most part) fall within the critical regions.

3. For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

a. usnetelec

Box.test(usnetelec, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  usnetelec
## X-squared = 52.549, df = 1, p-value = 4.196e-13
lambda <- BoxCox.lambda(usnetelec)
usnetelec_withBoxCox <- BoxCox(usnetelec,lambda)
ggtsdisplay(usnetelec_withBoxCox)

print(c("lamba", lambda))
## [1] "lamba"             "0.516771443964645"
ndiffs(usnetelec_withBoxCox)
## [1] 2
usnetelec_withBoxCox %>% diff() %>% diff() -> usnetelec_withBoxCox_2diff
ggtsdisplay(usnetelec_withBoxCox_2diff)

Box.test(usnetelec_withBoxCox_2diff, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  usnetelec_withBoxCox_2diff
## X-squared = 8.3324, df = 1, p-value = 0.003895

Answer: Despite ndiffs suggesting differencing the BoxCox transformed data twice to render it stationary, the twice difference data is not stationary (marginally failing stationarty).

b. usgdp

Box.test(usgdp, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  usgdp
## X-squared = 233.07, df = 1, p-value < 2.2e-16
lambda <- BoxCox.lambda(usgdp)
usgdp_withBoxCox <- BoxCox(usgdp,lambda)
ggtsdisplay(usgdp_withBoxCox)

print(c("lamba", lambda))
## [1] "lamba"             "0.366352049520934"
ndiffs(usgdp_withBoxCox)
## [1] 1
nsdiffs(usgdp_withBoxCox)
## [1] 0
usgdp_withBoxCox %>% diff() -> usgdp_withBoxCox_1diff
ggtsdisplay(usgdp_withBoxCox_1diff)

Box.test(usgdp_withBoxCox_1diff, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  usgdp_withBoxCox_1diff
## X-squared = 23.874, df = 1, p-value = 1.029e-06
library(urca)
usgdp_withBoxCox_1diff %>% ur.kpss() %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.2013 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

Answer: The ndiff function tells us that we need to difference data once , which we have done. Despite this the Ljung-Box statistic p-value is still very low (H_0 : data comes from a time series which is auto-correlated) which implies that the auto-correlations remain even after differencing. However the KPSS t-statistic is lower than the critical value at 1% indicating the once differenced time series is indeed stationary. Interesting that presence of auto-correlation does not us from having a stationary process.

c. mcopper

Box.test(mcopper, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  mcopper
## X-squared = 539.54, df = 1, p-value < 2.2e-16
lambda <- BoxCox.lambda(mcopper)
mcopper_withBoxCox <- BoxCox(mcopper,lambda)
ggtsdisplay(mcopper_withBoxCox)

print(c("lamba", lambda))
## [1] "lamba"             "0.191904709003829"
ndiffs(mcopper_withBoxCox)
## [1] 1
nsdiffs(mcopper_withBoxCox)
## [1] 0
mcopper_withBoxCox %>% diff() -> mcopper_withBoxCox_1diff
ggtsdisplay(mcopper_withBoxCox_1diff)

Box.test(mcopper_withBoxCox_1diff, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  mcopper_withBoxCox_1diff
## X-squared = 57.517, df = 1, p-value = 3.353e-14
library(urca)
mcopper_withBoxCox_1diff %>% ur.kpss() %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 6 lags. 
## 
## Value of test-statistic is: 0.0573 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

Answer: The ndiff function tells us that we need to difference data once , which we have done. Despite this the Ljung-Box statistic p-value is still very low (H_0 : data comes from a time series which is auto-correlated) which implies that the auto-correlations remain even after differencing. However the KPSS t-statistic is lower than the critical value at 1% indicating the once differenced time series is indeed stationary.

d. enplanements

Box.test(enplanements, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  enplanements
## X-squared = 249.22, df = 1, p-value < 2.2e-16
lambda <- BoxCox.lambda(enplanements)
enplanements_withBoxCox <- BoxCox(enplanements,lambda)
ggtsdisplay(enplanements_withBoxCox)

print(c("lamba", lambda))
## [1] "lamba"              "-0.226946111237065"
ndiffs(enplanements_withBoxCox)
## [1] 1
nsdiffs(enplanements_withBoxCox)
## [1] 1
enplanements_withBoxCox_seasondiff <- diff(enplanements_withBoxCox,lag=3)
enplanements_withBoxCox_seasondiff %>% diff() -> enplanements_withBoxCox_seasondiff_1diff
ggtsdisplay(enplanements_withBoxCox_seasondiff_1diff)

Box.test(enplanements_withBoxCox_seasondiff_1diff, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  enplanements_withBoxCox_seasondiff_1diff
## X-squared = 31.047, df = 1, p-value = 2.519e-08
library(urca)
enplanements_withBoxCox_seasondiff_1diff %>% ur.kpss() %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0115 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

Answer: the ndiffs and nsdiffs function tells us we need a seasonal and non-seasonal differencing. We first seasonally difference and then non seasonally difference. The series post these two transformations is stationary (KPSS test statistic is lower than the critical value , hence we cannot reject the null that the data is stationary. H_0 for KPSS is that data are stationary)

e. visitors

Box.test(visitors, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  visitors
## X-squared = 195.64, df = 1, p-value < 2.2e-16
lambda <- BoxCox.lambda(visitors)
visitors_withBoxCox <- BoxCox(visitors,lambda)
ggtsdisplay(visitors_withBoxCox)

print(c("lamba", lambda))
## [1] "lamba"             "0.277524910835111"
ndiffs(visitors_withBoxCox)
## [1] 1
nsdiffs(visitors_withBoxCox)
## [1] 1
visitors_withBoxCox_seasondiff <- diff(visitors_withBoxCox,lag=3)
visitors_withBoxCox_seasondiff %>% diff(lag=1) -> visitors_withBoxCox_seasondiff_1diff
ggtsdisplay(visitors_withBoxCox_seasondiff_1diff)

Box.test(visitors_withBoxCox_seasondiff_1diff, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  visitors_withBoxCox_seasondiff_1diff
## X-squared = 25.08, df = 1, p-value = 5.499e-07
library(urca)
visitors_withBoxCox_seasondiff_1diff %>% ur.kpss() %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.0092 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

Answer: the ndiffs and nsdiffs function tells us we need a seasonal and non-seasonal differencing. We first seasonally difference (at lag=3) and then non seasonally difference. The series post these two transformations is stationary (KPSS test statistic is lower than the critical value , hence we cannot reject the null that the data is stationary. H_0 for KPSS is that data are stationary)

3.5 For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

setwd("/Users/stepniak/624")
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349335T"],
  frequency=12, start=c(1982,4))
series <- myts

3.6 Use R to simulate and plot some data from simple ARIMA models.

a. Use the following R code to generate data from an AR(1) model with \(\phi_1\) = 0.6 and \(\sigma\)=1. The process starts with \(y_1\)=0.

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]

b. Produce a time plot for the series. How does the plot change as you change \(\phi_1\) ?

set.seed(123)
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
ggtsdisplay(y)

Answer: The higher the value of \(\phi_1\) the more the process resembles white noise, the higher the value of \(\phi_1\) the more the process resembles a random walk.

c. Write your own code to generate data from an MA(1) model with \(\theta_1\) = 0.6 and \(\sigma\)=1.

set.seed(123)
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*e[i-1] + e[i]

d.

set.seed(123)
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*e[i-1] + e[i]
ggtsdisplay(y)

Answer: As the value of $_1 = 0 $ goes down the time series resembles white noise. Increasing the value of $_1 = 1 $ yields a process with auto-correlation at lag = 1. For this case the value of the process at any time is just the error in this time period plus an error in the previous time period hence high auto-correlation observed (at lag = 1).

e. Generate data from an ARMA(1,1) model with \(\phi_1\) = 0.6 and \(\sigma\)=1 and \(\theta_1\) = 0.6

set.seed(123)
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1]+0.6*e[i-1] + e[i]
ggtsdisplay(y)

f. Generate data from an AR(2) model with \(\phi_1\) = -0.8, \(\phi_2\) = 0.3 and \(\sigma\)=1 . (Note that these parameters will give a non-stationary series.)

set.seed(123)
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 3:100)
  y[i] <- -0.8*y[i-1]+0.3*y[i-2]+ e[i]
ggtsdisplay(y)

y %>% ur.kpss() %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.0728 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739
ndiffs(y)
## [1] 0

g. Graph the latter two series and compare them.

Answer: The ARMA(1,1) process looks stationary and hence does not require differencing to make it into one. The AR(2) process, on the other hand slowly but firmly becomes unstable and is unbound. It is interning to note that despite the process being non-stationary (as per exercise note) when ur.kpss() is run the we cannot reject the null that the data is stationary with H_0 for KPSS is that data are stationary. Hence we conclude that the data is stationary. This inconsistency is possibly due to the fact that the mu in the KPSS test is at lag = 4, possibly the test does not extend to the part of time series where the non-stationarity kicks in.

3.7 Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.

a. By studying appropriate graphs of the series in R, find an appropriate ARIMA(p,d,q) model for these data.

ggtsdisplay(wmurders)

ndiffs(wmurders)
## [1] 2
wmurders %>% diff() %>% diff() -> wmurders_2diff
ggtsdisplay(wmurders_2diff)

Answer: The twice differenced series (as per ndiff() function) indicates some auto-correlation at lag 1 & 2, with PACF cut-off at lag 1. Therefore a possible candidate for ARIMA would be (1,2,1) or (2,2,1).

b. Should you include a constant in the model? Explain.

While the data appears to have a cycle, including a constant would impose a drift. The drift is not visible in the undifferenced data , hence it is not needed.

c. Write this model in terms of the backshift operator.

Answer: \((1-\phi_1 B) (1-B)^2 y_t=(1+ \theta_1 B) \epsilon_t\)

d. Fit the model using R and examine the residuals. Is the model satisfactory?

auto.arima(wmurders, approximation = FALSE)
## Series: wmurders 
## ARIMA(1,2,1) 
## 
## Coefficients:
##           ar1      ma1
##       -0.2434  -0.8261
## s.e.   0.1553   0.1143
## 
## sigma^2 estimated as 0.04632:  log likelihood=6.44
## AIC=-6.88   AICc=-6.39   BIC=-0.97
wmurdsers_fit <- auto.arima(wmurders, approximation = FALSE)
plot(wmurdsers_fit)

checkresiduals(wmurdsers_fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,2,1)
## Q* = 12.419, df = 8, p-value = 0.1335
## 
## Model df: 2.   Total lags used: 10

Answer: Residuals are stationary (as per per the plot of fit function, unit roots of the characteristic equation fall inside the unit circle) and there is no auto-correlation present as per the ACF plot and Ljung-Box test (where we fail to reject the null that the data is stationary). Hence given no serial correlation and stationarity of the errors we are comfortable that the model selected in adequate.

e. Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.

Answer:

wmurders_forecast <- forecast(wmurdsers_fit, h = 3)
wmurders_forecast
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005       2.470660 2.194836 2.746484 2.048824 2.892496
## 2006       2.363106 1.986351 2.739862 1.786908 2.939304
## 2007       2.252833 1.765391 2.740276 1.507354 2.998313

f. Create a plot of the series with forecasts and prediction intervals for the next three periods shown.

autoplot(wmurders_forecast)

g. Does auto.arima() give the same model you have chosen? If not, which model do you think is better?

Answer:Both auto-arima and manual selection gave the same result.