Homework 6

Exercise 8.1

Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

a. Explain the differences among these fgures. Do they all indicate that the data are white noise?

The difference between figures is that they show different critical values presented by the blue dashed lines.

x-axis show the correlation between different lags of the series.

All figures suggest that the data is white noise. The correlations of the lags are below the significance level (very close to zero) within the blue lines.

The longer the length of the series, the the smaller the range for the series to show as white noise (95% of the spikes in the ACF to lie within +-2/sqrt(T)), which confirms they are indeed white noise.

b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The critical values are at different distances from the mean zero because the 3 data sets have different number of observations; 36 , 360 and 1000 observations respectively.

In a white noise dataset, the more observations we have, the less white noise appears in the spikes. So for large datasets, in order to check if the data is not white noise, we can make the critical values smaller.

Exercise 8.2

A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

library(fpp2)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

## -- Attaching packages ------------------------------------------------------------------------------------------------------------------------------------------------------------------ fpp2 2.4 --

## v ggplot2   3.3.0     v fma       2.4  
## v forecast  8.13      v expsmooth 2.3

##

ggtsdisplay(ibmclose)

By definition, “A stationary time series is one whose properties do not depend on the time at which the series is observed. Thus, time series with trends, or with seasonality, are not stationary…” (page 223).

Clearly, this is a non-stationary time series, because ACF is decreasing, very slowly, and does not drop quickly to zero. the value \(r_1\) is also large and positive and almost equals 1.

The PACF plot has a significant spike only at lag 1, meaning that all the higher-order autocorrelations are effectively explained by the lag 1 autocorrelation.

PACF value \(r_1\) is almost 1. All other values \(r_i, i>1\) are smaller than the critical value. These are the signs of random walk process which is a non-stationary process. Therefore the data should be differenced in order to apply ARMA model.

In conclusion, this non-stationary dataset needs to be differenced.

Exercise 8.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

usnetelec, usgdp, mcopper, enplanements and visitors

a. usnetelec

#help("usnetelec")
length(usnetelec)

## [1] 55

head(usnetelec)

## Time Series:
## Start = 1949 
## End = 1954 
## Frequency = 1 
## [1] 296.1 334.1 375.3 403.8 447.0 476.3

summary(usnetelec)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   296.1   889.0  2040.9  1972.1  3002.7  3858.5

autoplot(usnetelec)

Box.test(diff(usnetelec), type = "Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  diff(usnetelec)
## X-squared = 0.8508, df = 1, p-value = 0.3563

library(tseries)
kpss.test(diff(usnetelec))

## Warning in kpss.test(diff(usnetelec)): p-value greater than printed p-value

## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(usnetelec)
## KPSS Level = 0.15848, Truncation lag parameter = 3, p-value = 0.1

#check if data is now stationary
ifelse(kpss.test(diff(usnetelec)) $p.value > 0.05, "Stationary", "Not stationary")

## Warning in kpss.test(diff(usnetelec)): p-value greater than printed p-value

## [1] "Stationary"

At first Data was linearly increasing.

Box-Ljung test shows that first differenced usnetelec data can be thought of as a white noise series.

Upon first differenciating, KPSS test shows our data become stationary.

b. usgdp - Quarterly US GDP - Quarterly US GDP. 1947:1 - 2006.1

#help("usnetelec")
length(usgdp)

## [1] 237

head(usgdp)

##        Qtr1   Qtr2   Qtr3   Qtr4
## 1947 1570.5 1568.7 1568.0 1590.9
## 1948 1616.1 1644.6

summary(usgdp)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1568    2632    4552    5168    7130   11404

autoplot(usgdp)

Box.test(diff(usgdp), type = "Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  diff(usgdp)
## X-squared = 39.187, df = 1, p-value = 3.85e-10

library(tseries)
autoplot(diff(usgdp))

ndiffs(usgdp)

## [1] 2

autoplot(diff(diff(usgdp)))

Box.test(diff(diff(usgdp)), type = "Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  diff(diff(usgdp))
## X-squared = 53.294, df = 1, p-value = 2.872e-13

ggAcf(diff(diff(usgdp)))

kpss.test(diff(diff(usgdp)))

## Warning in kpss.test(diff(diff(usgdp))): p-value greater than printed p-value

## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(diff(usgdp))
## KPSS Level = 0.020988, Truncation lag parameter = 4, p-value = 0.1

#check if data is now stationary
ifelse(kpss.test(diff(diff(usgdp))) $p.value > 0.05, "Stationary", "Not stationary")

## Warning in kpss.test(diff(diff(usgdp))): p-value greater than printed p-value

## [1] "Stationary"

kpss test shows that differencing twice with Box-Cox transformation was able to make the data stationary.

In this usgdp data case, even if twice differencing didn’t make the data like white noise series, it made the data stationary.

ACF and PACF of transformed and differenced data drops quickly with few spikes for big values of lags slightly beyond critical values. Therefore, the data can be considered stationary.

c. mcopper

autoplot(mcopper)

lambda_mcopper <- BoxCox.lambda(mcopper)
autoplot(diff(BoxCox(mcopper, lambda_mcopper)))

Box.test(diff(BoxCox(mcopper, lambda_mcopper)),
         type = "Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  diff(BoxCox(mcopper, lambda_mcopper))
## X-squared = 57.517, df = 1, p-value = 3.353e-14

ggAcf(diff(BoxCox(mcopper, lambda_mcopper)))

ggPacf(diff(BoxCox(usgdp, lambda_mcopper)))

kpss.test(diff(BoxCox(mcopper, lambda_mcopper)))

## Warning in kpss.test(diff(BoxCox(mcopper, lambda_mcopper))): p-value greater
## than printed p-value

## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(BoxCox(mcopper, lambda_mcopper))
## KPSS Level = 0.057275, Truncation lag parameter = 6, p-value = 0.1

#check if data is now stationary
ifelse(kpss.test(diff(BoxCox(mcopper, lambda_mcopper), ndiffs(BoxCox(mcopper, lambda_mcopper))))$p.value > 0.05, "Stationary", "Not stationary")

## Warning in kpss.test(diff(BoxCox(mcopper, lambda_mcopper),
## ndiffs(BoxCox(mcopper, : p-value greater than printed p-value

## [1] "Stationary"

But kpss test result shows that differencing with Box-Cox transformation was enough to make the data stationary.

ACF and PACF of transformed and differenced data drops quickly with few spikes for big values of lags a little beyond critical values, therefore the data can be considered stationary

Even if differencing with Box-Cox transformation didn’t make the data like white noise series, it made the data stationary.

d. enplanements - Monthly US domestic enplanements

head(enplanements)

##        Jan   Feb   Mar   Apr   May   Jun
## 1979 21.12 22.92 25.90 24.38 23.41 26.82

summary(enplanements)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   20.14   27.18   34.88   35.67   42.78   56.14

autoplot(enplanements)

lambda_enplanements <- BoxCox.lambda(enplanements)
ndiffs(enplanements)  # = 1

## [1] 1

nsdiffs(enplanements) # = 1

## [1] 1

# Data needs 1 differencing and 1 seasonal differencing. 
#Since data is seasonal, it should be differenced with lag equal to number of seasons (Monthly - so lag = 12)

autoplot(diff(diff(BoxCox(enplanements, lambda_enplanements), lag = 12)))

Box.test(diff(diff(BoxCox(enplanements, lambda_enplanements), lag = 12)), type = "Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  diff(diff(BoxCox(enplanements, lambda_enplanements), lag = 12))
## X-squared = 29.562, df = 1, p-value = 5.417e-08

ggAcf(diff(diff(BoxCox(enplanements, lambda_enplanements),lag = 12)))

ggPacf(diff(diff(BoxCox(enplanements, lambda_enplanements),lag = 12)))

kpss.test(diff(diff(BoxCox(enplanements, lambda_enplanements),lag = 12)))

## Warning in kpss.test(diff(diff(BoxCox(enplanements, lambda_enplanements), : p-
## value greater than printed p-value

## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(diff(BoxCox(enplanements, lambda_enplanements), lag = 12))
## KPSS Level = 0.042424, Truncation lag parameter = 5, p-value = 0.1

#check if data is now stationary
ifelse(kpss.test(diff(BoxCox(enplanements, lambda_enplanements), ndiffs(BoxCox(enplanements, lambda_enplanements))))$p.value > 0.05, "Stationary", "Not stationary")

## Warning in kpss.test(diff(BoxCox(enplanements, lambda_enplanements),
## ndiffs(BoxCox(enplanements, : p-value greater than printed p-value

## [1] "Stationary"

kpss test shows that differencings with Box-Cox transformation was able to make the data stationary.

e. visitors - Monthly Australian overseas vistors

summary(visitors)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    75.4   189.2   303.1   288.2   378.7   593.1

autoplot(visitors)

lambda_visitors <- BoxCox.lambda(visitors)
ndiffs(visitors)

## [1] 1

autoplot(diff(diff(BoxCox(visitors, lambda_visitors),lag = 12)))

Box.test(diff(diff(BoxCox(visitors, lambda_visitors),lag = 12)),type = "Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  diff(diff(BoxCox(visitors, lambda_visitors), lag = 12))
## X-squared = 21.804, df = 1, p-value = 3.02e-06

ggAcf(diff(diff(BoxCox(visitors, lambda_visitors),lag = 12)))

ggPacf(diff(diff(BoxCox(visitors, lambda_visitors),lag = 12)))

kpss.test(diff(diff(BoxCox(visitors, lambda_visitors),lag = 12)))

## Warning in kpss.test(diff(diff(BoxCox(visitors, lambda_visitors), lag = 12))):
## p-value greater than printed p-value

## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(diff(BoxCox(visitors, lambda_visitors), lag = 12))
## KPSS Level = 0.015833, Truncation lag parameter = 4, p-value = 0.1

#check if data is now stationary
ifelse(kpss.test(diff(BoxCox(visitors, lambda_visitors), ndiffs(BoxCox(visitors, lambda_visitors))))$p.value > 0.05, "Stationary", "Not stationary")

## Warning in kpss.test(diff(BoxCox(visitors, lambda_visitors),
## ndiffs(BoxCox(visitors, : p-value greater than printed p-value

## [1] "Stationary"

kpss test shows that differencings with Box-Cox transformation was able to make the data stationary.

Exercise 8.5

For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

library(readxl)
retail_data <- read_excel("retail.xlsx", skip = 1)
myts <- ts(retail_data[, "A3349398A"], frequency = 12, start = c(1982, 4))
autoplot(myts) + ggtitle("Retail Sales Data Time Series")

# Data have increasing trend and strong seasonality. Will use first differencing and a seasonal differencing.

# how many differencing
print(paste0('Number of differencing: ', ndiffs(myts)))

## [1] "Number of differencing: 1"

# how many seasonal differencing
print(paste0('Number of seasonal differencing: ', nsdiffs(myts)))

## [1] "Number of seasonal differencing: 1"

#tewst with Box-Cox transformation
kpss.test(diff(diff(BoxCox(myts, BoxCox.lambda(myts)),lag = 12)))

## Warning in kpss.test(diff(diff(BoxCox(myts, BoxCox.lambda(myts)), lag = 12))):
## p-value greater than printed p-value

## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(diff(BoxCox(myts, BoxCox.lambda(myts)), lag = 12))
## KPSS Level = 0.016483, Truncation lag parameter = 5, p-value = 0.1

#check if data is now stationary
ifelse(kpss.test(diff(BoxCox(myts, BoxCox.lambda(myts)), ndiffs(BoxCox(myts, BoxCox.lambda(myts)))))$p.value > 0.05, "Data is Stationary", "Data is Not stationary")

## Warning in kpss.test(diff(BoxCox(myts, BoxCox.lambda(myts)),
## ndiffs(BoxCox(myts, : p-value greater than printed p-value

## [1] "Data is Stationary"

In ordet to make data stationary, we performed Box-Cox transformation, 1 first differencing and 1 seasonal differencing.

Exercise 8.6

Use R to simulate and plot some data from simple ARIMA models.

a. Use the following R code to generate data from an AR(1) model with and. The process starts with \(\phi_1 = 0.6\) ** and ** \(\sigma^2 = 1\). The process starts with \(y_1 = 0\).

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6 * y[i-1] + e[i]

b. Produce a time plot for the series. How does the plot change as you change θ1 ?

AR1model <- function(phi1){
  y <- ts(numeric(100))
  e <- rnorm(100)
  for(i in 2:100){
    y[i] <- phi1*y[i-1] + e[i]
  }
  return(y)
}

c. Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1

autoplot(AR1model(0.3), series = "0.3") +
  geom_line(size = 1, colour = "red") +
  autolayer(y, series = "0.6", size = 1) +
  autolayer(AR1model(0.9), size = 1, series = "0.9") +
  ylab("AR(1) models") +
  guides(colour = guide_legend(title = "Phi1"))

When phi increases, the variation of y increases.

MA1Model <- function(theta1){
  y <- ts(numeric(100))
  e <- rnorm(100)
  for(i in 2:100){
    y[i] <- theta1*e[i-1] + e[i]
  }
  return(y)
}

MA(1) model with input theta1.

d. Produce a time plot for the series. How does the plot change as you change θ1 ?

autoplot(MA1Model(0.3), series = "0.3") +
  geom_line(size = 1, colour = "red") +
  autolayer(y, series = "0.6", size = 1) +
  autolayer(AR1model(0.9), size = 1, series = "0.9") +
  ylab("MA(1) models") +
  guides(colour = guide_legend(title = "Theta1"))

When theta increases, the variation of y increased.

e. Generate data from an ARMA(1,1) model with ϕ1=0.6 , θ1=0.6 and σ2=1 ?

ARMA.1.1 <- ts(numeric(50))
e <- rnorm(50)
for(i in 2:50){
   ARMA.1.1[i] <- 0.6*ARMA.1.1[i-1] + 0.6*e[i-1] + e[i]
}

f. Generate data from an AR(2) model with ϕ1=−0.8, ϕ2=0.3 and σ2=1 . (Note that these parameters will give a non-stationary series.)

ARMA2 <- ts(numeric(50))
e <- rnorm(50)
for(i in 3:50){
   ARMA2[i] <- -0.8*ARMA2[i-1] + 0.3*ARMA2[i-2] + e[i]
}

h. Graph the latter two series and compare them.

autoplot(ARMA.1.1, series = "ARMA(1, 1)") +
  autolayer(ARMA2, series = "AR(2)") +
  ylab("y") +
  guides(colour = guide_legend(title = "Models"))

autoplot(ARMA.1.1)

AR(2) model was fluctuating and not stationary, but data from an ARMA(1, 1) model were stationary.

Exercise 8.7

Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.

a. By studying appropriate graphs of the series in R, find an appropriate ARIMA(p,d,q ) model for these data.

autoplot(wmurders)

Data does not need seasonal differencing or Box-Cox transformation

autoplot(diff(wmurders))

# number of differencing needed (in this case = 2)
ndiffs(wmurders)

## [1] 2

Therefore will take 2 differencings to become stationary.

kpss.test(diff(wmurders, differences = 2))

## Warning in kpss.test(diff(wmurders, differences = 2)): p-value greater than
## printed p-value

## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(wmurders, differences = 2)
## KPSS Level = 0.045793, Truncation lag parameter = 3, p-value = 0.1

Clearly, by using parameter differences = 2, we make our data stationary.

diff(wmurders, differences = 2) %>% ggtsdisplay()

We can see that ARIMA(0, 2, 2) is preferred to model the data.

b. Should you include a constant in the model? Explain.

A constant will be excluded in the model as ARIMA model of the data includes twice differencing.

c. Write this model in terms of the backshift operator.

\[(1−B)^2∗yt=(1+theta1∗B+theta2B^2)∗et\]

d. Fit the model using R and examine the residuals. Is the model satisfactory?

wmurders.ARMA2.2 <- Arima(wmurders, order = c(0, 2, 2))
checkresiduals(wmurders.ARMA2.2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,2,2)
## Q* = 11.764, df = 8, p-value = 0.1621
## 
## Model df: 2.   Total lags used: 10

The residuals of the model appear to be white noise series.

e. Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.

FC.wmurders.ARMA2.2 <- forecast(wmurders.ARMA2.2, h = 3)

FC.wmurders.ARMA2.2$mean

## Time Series:
## Start = 2005 
## End = 2007 
## Frequency = 1 
## [1] 2.480525 2.374890 2.269256

FC.wmurders.ARMA2.2$model

## Series: wmurders 
## ARIMA(0,2,2) 
## 
## Coefficients:
##           ma1     ma2
##       -1.0181  0.1470
## s.e.   0.1220  0.1156
## 
## sigma^2 estimated as 0.04702:  log likelihood=6.03
## AIC=-6.06   AICc=-5.57   BIC=-0.15

years <- length(wmurders)
e <- FC.wmurders.ARMA2.2$residuals
fc1 <- 2*wmurders[years] - wmurders[years - 1] - 1.0181*e[years] + 0.1470*e[years - 1]
fc2 <- 2*fc1 - wmurders[years] + 0.1470*e[years]
fc3 <- 2*fc2 - fc1

c(fc1, fc2, fc3)

## [1] 2.480523 2.374887 2.269252

The manual forecasts are very similar to ARIMA function.

f. Create a plot of the series with forecasts and prediction intervals for the next three periods shown.

autoplot(FC.wmurders.ARMA2.2)

g. Does auto.arima() give the same model you have chosen? If not, which model do you think is better?

FC.wmurders_autoarima <- forecast(auto.arima(wmurders), h = 3)

accuracy(FC.wmurders.ARMA2.2)

##                      ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.0113461 0.2088162 0.1525773 -0.2403396 4.331729 0.9382785
##                     ACF1
## Training set -0.05094066

accuracy(FC.wmurders_autoarima)

##                       ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.01065956 0.2072523 0.1528734 -0.2149476 4.335214 0.9400996
##                    ACF1
## Training set 0.02176343

ARIMA(0, 2, 2) is better than ARIMA(1, 2, 1)

FC.wmurders_autoarima2 <- forecast(auto.arima(wmurders, stepwise = FALSE, approximation = FALSE), h = 3)
accuracy(FC.wmurders_autoarima2)

##                       ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.01336585 0.2016929 0.1531053 -0.3332051 4.387024 0.9415259
##                     ACF1
## Training set -0.03193856

Using Auto Arima, errors do not show better performance or significant improvement.

ggtsdisplay(diff(wmurders, differences = 2))

ARIMA(0, 2, 2) rather than ARIMA(0, 2, 3).

checkresiduals(FC.wmurders.ARMA2.2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,2,2)
## Q* = 11.764, df = 8, p-value = 0.1621
## 
## Model df: 2.   Total lags used: 10

checkresiduals(FC.wmurders_autoarima2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,2,3)
## Q* = 10.706, df = 7, p-value = 0.152
## 
## Model df: 3.   Total lags used: 10

Looking a the residual results, it seems that ARIMA(0, 2, 2) is the preferred one.

Data624 - HW6 - Arima Models

Abdelmalek Hajjam