Week 8 - ARIMA - Homework

C. Rosemond 10.18.20

library(fpp2)
library(urca)
library(readxl)


8.11.1

Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

knitr::include_graphics("week8_8111.png")

a. Explain the differences among these figures. Do they all indicate that the data are white noise?

Series X1, X2, and X3 include 36 random numbers, 360 random numbers, and 1,000 random numbers, respectively. The ACF plots show white noise bounds of different widths that decrease in width as the number of random numbers increases. The plots suggest that each series is essentially white noise, though each series includes at least some correlations that approach or just exceed its bounds.


b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The width of the bounds for white noise varies based on the 95% critical values \(\pm 1.96 / \sqrt{T}\), where \(T =\) series length (e.g. 36, 360, 1,000). As T increases, the width of the bounds decreases. X3 is longer in length (T), so its bounds are lesser in width than those for X2 and X1. The actual correlations are different for each series given each series consists of random numbers and is white noise.



8.11.2

A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

autoplot(ibmclose, main="Daily closing IBM stock prices")

A plot of the series shows different price levels over time, increasing from approximately \(\$450\) to \(\$600\) before eventually to nearly \(\$300\). The changes in price suggest the series is non-stationary and should be differenced.

Acf(ibmclose, main="ACF: Daily closing IBM stock prices")

Plotting the correlations shows high positive correlations starting from \(r1\) and decreasing steadily with an increase in lags. This pattern indicates a series trend, which suggests the series is non-stationary and should be differenced.


Pacf(ibmclose, main="PACF: Daily closing IBM stock prices")

Plotting partial ACF shows a high, significant initial correlation followed by a series of insignificant ones. This pattern suggests the series is non-stationary and should be differenced.



8.11.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

a. usnetelec

#?usnetelec - Annual US net electricity generation (billion kwh) for 1949-2003
autoplot(usnetelec, main="Annual US net electricity generation")

Acf(usnetelec, main="ACF: Annual US net electricity generation")

Pacf(usnetelec, main="PACF: Annual US net electricity generation")

Plotting the series and its associated autocorrelation and partial autocorrelation plots reveals an upward trend and thus non-stationary data.


BoxCox.lambda(usnetelec)
## [1] 0.5167714

The suggested Box-Cox \(\lambda\) is approximately 0.52, which suggests a square root transformation (\(\lambda =\) 0.5) could be used to ease interpretation.


usnetelec_bc <- BoxCox(usnetelec, lambda=0.5)
ndiffs(usnetelec_bc)
## [1] 2

Applying a unit root test to the transformed series results in a suggestion of 2 differences.


usnetelec_bc_d <- diff(diff(usnetelec_bc))
autoplot(usnetelec_bc_d, main="Annual US net electricity generation, after Box-Cox and differencing")

Acf(usnetelec_bc_d)

Pacf(usnetelec_bc_d)

After differencing twice, the series trend is removed, and the autocorrelation and partial autocorrelation plots show essentially white noise relative to the correlations for non-transformed, non-differenced series. There are, however, still several correlations that fall outside the bounds.


usnetelec_bc_d %>% ur.kpss() %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.0707 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

After transformation and differencing, the KPSS test test-statistic is approximately 0.07, which is well within the critical value bounds. The transformed and differenced series is stationary.


b. usgdp

#?usgdp - Quarterly US GDP (1947.1 - 2006.1)
autoplot(usgdp, main="Quarterly US GDP")

Acf(usgdp, main="Quarterly US GDP")

Pacf(usgdp, main="Quarterly US GDP")

Plotting the series and its associated autocorrelation and partial autocorrelation plots reveals an upward trend and thus non-stationary data.


BoxCox.lambda(usgdp)
## [1] 0.366352

The suggested Box-Cox \(\lambda\) is approximately 0.37, which suggests a \(\lambda\) of 0.33 could be used to ease interpretation.


usgdp_bc <- BoxCox(usgdp, lambda=0.33)
ndiffs(usgdp_bc)
## [1] 1

Applying a unit root test to the transformed series results in a suggestion of 1 difference.


usgdp_bc_d <- diff(usgdp_bc)
autoplot(usgdp_bc_d, main="Quarterly US GDP, after Box-Cox and differencing")

Acf(usgdp_bc_d)

Pacf(usgdp_bc_d)

After differencing, the series trend is removed, and the autocorrelation and partial autocorrelation plots show essentially white noise relative to the correlations for non-transformed, non-differenced series. There are, however, still several correlations--notably, lags 1, 2, and 12--that fall on or outside the bounds.


usgdp_bc_d %>% ur.kpss() %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.1476 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

After transformation and differencing, the KPSS test test-statistic is approximately 0.15, which is within the critical value bounds. The transformed and differenced series is stationary.


c. mcopper

#?mcopper - Monthly copper prices
autoplot(mcopper, main="Monthly copper prices")

Acf(mcopper, main="Monthly copper prices")

Pacf(mcopper, main="Monthly copper prices")

Plotting the series and its associated autocorrelation and partial autocorrelation plots reveals an upward trend--amplified by a large increase in price starting in 2005--and thus non-stationary data. There is also a roughly flat period from 1962 to 1964.


BoxCox.lambda(mcopper)
## [1] 0.1919047

The suggested Box-Cox \(\lambda\) is approximately 0.19, which suggests a \(\lambda\) of 0.20 could be used to ease interpretation.


mcopper_bc <- BoxCox(mcopper, lambda=0.2)
ndiffs(mcopper_bc)
## [1] 1

Applying a unit root test to the transformed series results in a suggestion of 1 difference.


mcopper_bc_d <- diff(mcopper_bc)
autoplot(mcopper_bc_d, main="Monthly copper prices, after Box-Cox and differencing")

Acf(mcopper_bc_d)

Pacf(mcopper_bc_d)

After differencing, the series trend is removed, and the autocorrelation and partial autocorrelation plots show essentially white noise relative to the correlations for non-transformed, non-differenced series. There are, however, still several correlations that fall outside the bounds.


mcopper_bc_d %>% ur.kpss() %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 6 lags. 
## 
## Value of test-statistic is: 0.0581 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

After transformation and differencing, the KPSS test test-statistic is approximately 0.06, which is well within the critical value bounds. The transformed and differenced series is stationary.


d. enplanements

#?enplanements - Monthly US domestic enplanements (millions) (1996-2000)
autoplot(enplanements, main="Monthly US domestic enplanements")

Acf(enplanements, main="Monthly US domestic enplanements")

Pacf(enplanements, main="Monthly US domestic enplanements")

Plotting the series and its associated autocorrelation and partial autocorrelation plots reveals an upward trend, monthly seasonality, and thus non-stationary data. There is a large decrease around 2004 that tempers the trend.


BoxCox.lambda(enplanements)
## [1] -0.2269461

The suggested Box-Cox \(\lambda\) is approximately -0.23, which suggests a \(\lambda\) of -0.20 could be used to ease interpretation.


enplanements_bc <- BoxCox(enplanements, lambda=-0.2)
nsdiffs(enplanements_bc) #seasonal
## [1] 1
ndiffs(diff(enplanements_bc, lag=12))
## [1] 1

Given its monthly seasonality, the transformed series requires a seasonal difference followed by a single difference.


enplanements_bc_d <- diff(diff(enplanements_bc, lag=12))
autoplot(enplanements_bc_d, main="Monthly US domestic enplanements, after Box-Cox and differencing")

Acf(enplanements_bc_d)

Pacf(enplanements_bc_d)

After differencing twice, the series trend is removed, and the autocorrelation and partial autocorrelation plots show essentially white noise relative to the correlations for non-transformed, non-differenced series. There are, however, still several correlations--notably, lags 1 and 12--that fall outside the bounds.


enplanements_bc_d %>% ur.kpss() %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0427 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

After transformation and differencing, the KPSS test test-statistic is approximately 0.04, which is well within the critical value bounds. The transformed and differenced series is stationary.


e. visitors

#?visitors - Monthly Australian short-term overseas visitors (May 1985 - April 2005)
autoplot(visitors, main="Monthly Australian short-term overseas visitors")

Acf(visitors, main="Monthly Australian short-term overseas visitors")

Pacf(visitors, main="Monthly Australian short-term overseas visitors")

Plotting the series and its associated autocorrelation and partial autocorrelation plots reveals an upward trend, monthly seasonality, and thus non-stationary data. Variation increases with the series.


BoxCox.lambda(visitors)
## [1] 0.2775249

The suggested Box-Cox \(\lambda\) is approximately 0.28, which suggests a \(\lambda\) of 0.25 could be used to ease interpretation.


visitors_bc <- BoxCox(visitors, lambda=0.25)
nsdiffs(visitors_bc) #seasonal
## [1] 1
ndiffs(diff(visitors_bc, lag=12))
## [1] 1

Given its monthly seasonality, the transformed series requires a seasonal difference followed by a single difference.


visitors_bc_d <- diff(diff(visitors_bc, lag=12))
autoplot(visitors_bc_d, main="Monthly Australian overseas visitors, after Box-Cox and differencing")

Acf(visitors_bc_d)

Pacf(visitors_bc_d)

After differencing twice, the series trend is removed, and the autocorrelation and partial autocorrelation plots show essentially white noise relative to the correlations for non-transformed, non-differenced series. There are, however, several correlations--notably, lags 1, 12, and 24--that fall well outside the bounds.


visitors_bc_d %>% ur.kpss() %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.0159 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

After transformation and differencing, the KPSS test test-statistic is approximately 0.02, which is well within the critical value bounds. The transformed and differenced series is stationary.



8.11.5

For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

retaildata <- read_excel("retail.xlsx", skip=1) # 381 x 19
myts <- ts(retaildata[,4], frequency=12, start=c(1982,4)) # 4th column ("A3349338X")
autoplot(myts, main="Retail - A3349338x")

The retail series shows both an upward trend and monthly seasonality, meaning that is not stationary.


Acf(myts, main="Retail - A3349338x")

Pacf(myts, main="Retail - A3349338x")

Autocorrelation plots provide further evidence of trend, seasonality, and non-stationarity.


BoxCox.lambda(myts)
## [1] -0.2536078

The suggested Box-Cox \(\lambda\) is approximately -0.25, which is already easily interpretable.


myts_bc <- BoxCox(myts, lambda=-0.25)
nsdiffs(myts_bc) #seasonal
## [1] 1
ndiffs(diff(myts_bc, lag=12))
## [1] 1

The transformed series shows monthly seasonality and requires a seasonal difference, which is then followed by a single difference.


myts_bc_d <- diff(diff(myts_bc, lag=12))
autoplot(myts_bc_d, main="Retail - A3349338x, after Box-Cox and differencing")

Acf(myts_bc_d)

Pacf(myts_bc_d)

After transformation and differencing, the series continues to display significant autocorrelation, particularly at lags of 12 and 24.


myts_bc_d %>% ur.kpss() %>% summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0219 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

After transformation and differencing, the KPSS test test-statistic is approximately 0.02, which is well within the critical value bounds. The transformed and differenced series is stationary.



8.11.6

Use R to simulate and plot some data from simple ARIMA models.

a. Use the following R code to generate data from an AR(1) model with ϕ1=0.6 and σ2=1. The process starts with y1=0.

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]


b. Produce a time plot for the series. How does the plot change as you change ϕ1?

autoplot(y, main="AR(1) simulation, phi = 0.6")

The simulated series, with \(\phi_1 =\) 0.6 and $_2 =$1, appears stationary, with no apparent trend or seasonality. Variation is inconsistent across the series.


y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.1*y[i-1] + e[i]
autoplot(y, main="AR(1) simulation, phi = 0.1")

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- -0.6*y[i-1] + e[i]
autoplot(y, main="AR(1) simulation, phi = -0.6")

Changing the \(\phi\) value results in a shift in series pattern. As \(\phi\) approaches zero, the series approximates white noise. As \(\phi\) falls below zero, the series shows greater variation and oscillation about the mean.


c. Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1.

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*e[i-1] + e[i]


d. Produce a time plot for the series. How does the plot change as you change θ1?

autoplot(y, main="MA(1) simulation, theta = 0.6")


y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0*e[i-1] + e[i]
autoplot(y, main="MA(1) simulation, theta = 0")


y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- -0.6*e[i-1] + e[i]
autoplot(y, main="MA(1) simulation, theta = -0.6")

As \(\theta_1\) decreases towards -1, the series shows increased variation, and it oscillates with greater frequency about the mean.


e. Generate data from an ARMA(1,1) model with ϕ1=0.6, θ1=0.6 and σ2=1.

y1 <- ts(numeric(100))
e1 <- rnorm(100, sd=1)
for(i in 2:100)
  y1[i] <- 0.6*y1[i-1] + 0.6*e1[i-1] + e1[i]


f. Generate data from an AR(2) model with ϕ1=−0.8, ϕ2=0.3 and σ2=1. (Note that these parameters will give a non-stationary series.)

y2 <- ts(numeric(100))
e2 <- rnorm(100)
for(i in 3:100)
  y2[i] <- -0.8*y2[i-1] + 0.3*y2[i-2] + e2[i]


g. Graph the latter two series and compare them.

autoplot(y1, main="ARMA(1,1) simulation, phi_1 = 0.6, theta_1 = 0.6, sigma^2 = 1")

autoplot(y2, main="AR(2) simulation, phi_1 = -0.8, phi_2 = 0.3, sigma^2 = 1")

The ARMA(1,1) model appears stationary with non-patterned variation about zero. Notably, there is a large dip, past -5.0, at around time 13. The AR(2) model displays a clear pattern of increasing variation--from approximately zero to approximately 4000--as time increases.



8.11.7

Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.

#?wmurders - Annual female murder rate (per 100,000 standard population) in the USA (1950-2004)


a. By studying appropriate graphs of the series in R, find an appropriate ARIMA(p,d,q) model for these data.

autoplot(wmurders, main="Annual female murder rate per 100,000 in USA (1950-2004)")

Acf(wmurders)

Pacf(wmurders)

The plots suggest that the series shows a weak upward trend, no seasonality, and apparent stationarity.


BoxCox.lambda(wmurders)
## [1] -0.09529835
ndiffs(wmurders)
## [1] 2

The suggested \(\lambda\) for a Box-Cox transformation is approximately -0.09, or roughly zero, which would mean a logarithmic transformation. Based upon the series plot, the series' limited variation does not seem to warrant a log-transform.

The suggested number of differences is 2.


wmurders_d <- diff(diff(wmurders))
autoplot(wmurders_d, main="Annual female murder rate, after differencing")

Acf(wmurders_d)

Pacf(wmurders_d)

After differencing, both autocorrelation plots show particularly large, significant spikes at lag 1. The ACF (p) and PACF (q), then, suggest p = 1 and q = 1. Considering d = 2, an ARIMA(1,2,1) seems appropriate.


b. Should you include a constant in the model? Explain.

No. Based on the non-differenced series and its slight trend, adding a constant and thus drift to the model could be appropriate. However, the difference of 2 suggests a constant should not be included in order to avoid higher-order polynomials.


c. Write this model in terms of the backshift operator.

Removing the constant c = 0,

\((1 - \phi_1B)(1 - B)^2y_t = (1 + \theta_1B)\epsilon_t\)


d. Fit the model using R and examine the residuals. Is the model satisfactory?

(m1 <- Arima(wmurders, order=c(1,2,1)))
## Series: wmurders 
## ARIMA(1,2,1) 
## 
## Coefficients:
##           ar1      ma1
##       -0.2434  -0.8261
## s.e.   0.1553   0.1143
## 
## sigma^2 estimated as 0.04632:  log likelihood=6.44
## AIC=-6.88   AICc=-6.39   BIC=-0.97
checkresiduals(m1)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,2,1)
## Q* = 12.419, df = 8, p-value = 0.1335
## 
## Model df: 2.   Total lags used: 10

Fitting an ARIMA(1,2,1) model with no constant term results in residuals that are not significantly different from zero per a Ljung-Box test (p-value ~.1335 on 8 degrees of freedom) and thus white noise. The residuals are also relatively symmetrically distributed, if not quite normal. The model appears satisfactory to move forward.


e. Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.

f1 <- forecast(m1, h=3)
f1$mean
## Time Series:
## Start = 2005 
## End = 2007 
## Frequency = 1 
## [1] 2.470660 2.363106 2.252833

Using a reduced and simplified version of the equation above, my forecast calculations are approximately the same as those produced by R.


f. Create a plot of the series with forecasts and prediction intervals for the next three periods shown.

autoplot(f1)

Visualized, the forecast shows a relatively linear decrease in murder rate. The linearity reflects, in the short-term, that long-term forecasts follow a straight line when c = 0 and d = 2.


g. Does auto.arima() give the same model you have chosen? If not, which model do you think is better?

(m2 <- auto.arima(wmurders, stepwise=FALSE, approximation=FALSE))
## Series: wmurders 
## ARIMA(0,2,3) 
## 
## Coefficients:
##           ma1     ma2      ma3
##       -1.0154  0.4324  -0.3217
## s.e.   0.1282  0.2278   0.1737
## 
## sigma^2 estimated as 0.04475:  log likelihood=7.77
## AIC=-7.54   AICc=-6.7   BIC=0.35
checkresiduals(m2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,2,3)
## Q* = 10.706, df = 7, p-value = 0.152
## 
## Model df: 3.   Total lags used: 10

The documentation for auto.arima() suggests setting the stepwise and approximation arguments to FALSE given the analysis focuses on a single series. Doing so results in a suggested ARIMA(0,2,3) compared with the chosen ARIMA(1,2,1). The former shows a slightly higher log-likelihood (approximately 7.77 versus 6.44). Its residuals also appear to check out.


f2 <- forecast(m2, h=3)
f2$mean
## Time Series:
## Start = 2005 
## End = 2007 
## Frequency = 1 
## [1] 2.408817 2.365555 2.290976

In addition, its forecast is more of an arc compared to a straight line. Ultimately, the auto.arima() version seems better given my understanding that the set arguments add stringency to the conditions for modeling.