DATA 624 Homework 6

Question 8.1

Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise?

The ACF bands become narrower as the number of numbers increase. There is no discernable pattern in the bars of the ACF charts and the bars are within the bands, which indicates the data are white noise.

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

It’s related to the law of large numbers. As the number of observations increase, the number of large outliers from the mean decreases. We are more certain that a large observation is really an outlier with more data.

Question 8.2

A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

ggtsplot <- function(ts, title){
  # A ggplot2 version of tsdisplay(df)
  # Args:
  #    ts (Time-Series): The time series we want to plot
  #    title (str): The title of the graph
  grid.arrange( 
    autoplot(ts) +
      ggtitle(title) +
      theme(axis.title = element_blank()),
    grid.arrange(
      ggAcf(ts) + 
        ggtitle(element_blank()), 
      ggPacf(ts) + 
        ggtitle(element_blank()), ncol = 2), nrow = 2)
}

ggtsplot(ibmclose, "IBM Closing Stock Price")

Time series with trends or with seasonality are not stationary. It is clear from the top plot that there is a trend in IBM’s stock price. The ACF will drop to zero quickly for a stationary time series. This ACF does not have that property. The gradual decrease is a tell that it is not stationary. The first PACF is almost one which is another sign that the data is not stationary.

Question 8.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

usnetelec

autoplot(usnetelec, main = "US Net Electricity Generation") +
  theme(axis.title = element_blank())

The trend looks nearly linear. It will probably only need to be differenced to be stationary. I will use the ndiffs function to determine the order of differencing.*

ggtsplot(diff(usnetelec, ndiffs(usnetelec)), paste0("Differenced US Net Electricity Generation (lag=", ndiffs(usnetelec), ")"))

It needed to be differenced by 1, or a first order difference. Now to check if the data is stationary.

is_stationary <- function(data){
  results <- kpss.test(data)
  if (results$p.value > 0.05){
    "Data IS Stationary"
  } else {
    "Data IS NOT Stationary"
  }
}

cat(is_stationary(diff(usnetelec, ndiffs(usnetelec))))

Data IS Stationary

The p-value of 0.36 indicates the data are stationary.

usgdp

autoplot(usgdp, main = "Quarterly US Gross Domestic Product") +
  theme(axis.title = element_blank())

Since this series is dollars we probably need to log transform it. It also looks like it needs to be differenced.

usgdp_lambda <- BoxCox.lambda(usgdp)
bc_usgdp <- BoxCox(usgdp, lambda = usgdp_lambda)
ggtsplot(diff(bc_usgdp, ndiffs(bc_usgdp)), paste0("Differenced Quarterly US GDP (lag=", ndiffs(bc_usgdp), ", lambda=",round(usgdp_lambda,2),")"))

The Box Cox transformation with a lambda of 0.37 and a differencing of 1.

cat(is_stationary(diff(bc_usgdp, ndiffs(bc_usgdp))))

Data IS Stationary

mcopper

autoplot(mcopper, main = "Monthly Grade A Copper Prices") +
  theme(axis.title = element_blank())

This has the spike in copper prices that happened late in the the 2000’s.

mcopper_lambda <- BoxCox.lambda(mcopper)
bc_mcopper <- BoxCox(mcopper, lambda = mcopper_lambda)
ggtsplot(diff(bc_mcopper, ndiffs(bc_mcopper)), paste0("Differenced Monthly Grade A Copper Prices (lag=", ndiffs(bc_mcopper), ", lambda=",round(mcopper_lambda,2),")"))

The Box Cox transformation with a lambda of 0.19 and a differencing of 1.

cat(is_stationary(diff(bc_mcopper, ndiffs(bc_mcopper))))

Data IS Stationary

enplanements

autoplot(enplanements, main = "Monthly US Domestic Enplanements") +
  theme(axis.title = element_blank())

The variance is not constant so this series needs a Box-Cox transformation and differencing to be stationary.

enplanements_lambda <- BoxCox.lambda(enplanements)
bc_enplanements <- BoxCox(enplanements, lambda = enplanements_lambda)
ggtsplot(diff(bc_enplanements, ndiffs(bc_enplanements)), paste0("Differenced Monthly US Domestic Enplanements (lag=", ndiffs(bc_enplanements), ", lambda=",round(enplanements_lambda,2),")"))

The Box Cox transformation with a lambda of -0.23 and a differencing of 1 (1st order).

cat(is_stationary(diff(bc_enplanements, ndiffs(bc_enplanements))))

Data IS Stationary

visitors

autoplot(visitors, main = "Monthly Australian Overseas Visitors") +
  theme(axis.title = element_blank())

This series also exhibits an increasing variance and trend so Box-Cox and differencing will be needed.

visitors_lambda <- BoxCox.lambda(visitors)
bc_visitors <- BoxCox(visitors, lambda = visitors_lambda)
ggtsplot(diff(bc_visitors, ndiffs(bc_visitors)), paste0("Differenced Monthly Australian Overseas Visitors (lag=", ndiffs(bc_visitors), ", lambda=",round(visitors_lambda,2),")"))

The Box Cox transformation had a lambda of 0.28 and a differencing of 1 (1st order).

cat(is_stationary(diff(bc_visitors, ndiffs(bc_visitors))))

Data IS Stationary

Question 8.5

For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

retaildata <- read_excel("retail.xlsx", skip = 1)
retail <- ts(retaildata[, "A3349873A"], frequency = 12, start = c(1982, 4))
retail_lambda <- BoxCox.lambda(retail)
bc_retail <- BoxCox(retail, lambda = retail_lambda)
ggtsplot(diff(bc_retail, ndiffs(bc_retail)), paste0("Differenced Monthly Retail Sales (lag=", ndiffs(bc_retail), ", lambda=",round(retail_lambda,2),")"))

The data is differenced with a first order difference and adjusted with a Box Cox transformation with a lambda of 0.13. The result:

cat(is_stationary(diff(bc_retail, ndiffs(bc_retail))))

Data IS Stationary

Question 8.6

Use R to simulate and plot some data from simple ARIMA models.

Use the following R code to generate data from an AR(1) model with \(ϕ_1 = 0.6\) and \(σ_2 = 1\). The process starts with \(y_1=0\).

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]

Produce a time plot for the series. How does the plot change as you change \(ϕ_1\)?

In order to see the effects of a change in \(ϕ_1\) we will need to modify the code a little bit:

sim_AR1 <- function(phi){
  set.seed(42)
  y <- ts(numeric(100))
  e <- rnorm(100)
  for(i in 2:100){
    y[i] <- phi*y[i-1] + e[i]
  }
  return(y)
}

p <- autoplot(sim_AR1(0.6))
for(phi in seq(0.1, 0.9, 0.1)){
  p <- p + autolayer(sim_AR1(phi), series = paste(phi))
}
p +
  labs(title="The effects of changing Phi", color = "Phi") +
  theme(axis.title = element_blank(), legend.position = "bottom") +
  scale_color_brewer(palette = "Set1")

Now with a fixed random number seed we can see the effect. As \(ϕ_1\) increases, the distance from zero increases. It increases the autocorrelation with the preceeding value.

Write your own code to generate data from an MA(1) model with \(θ_1=0.6\) and \(σ_2=1\).

sim_MA1 <- function(theta){
  set.seed(42)
  y <- ts(numeric(100))
  e <- rnorm(100)
  e[1] <- 0
  for(i in 2:100){
    y[i] <- theta*e[i-1] + e[i]
  }
  return(y)
}

Produce a time plot for the series. How does the plot change as you change \(θ_1\)?

p <- autoplot(sim_MA1(0.6))
for(theta in seq(0.1, 0.9, 0.1)){
  p <- p + autolayer(sim_MA1(theta), series = paste(theta))
}

p +
  labs(title="The effects of changing Theta", color = "Theta") +
  theme(axis.title = element_blank(), legend.position = "bottom") +
  scale_color_brewer(palette = "Set1")

This is a bit harder to see but as \(ϕ_1\) increases, the distance from zero again increases. The \(θ_1\) deals with the error so it is not as pronounced as a change in \(ϕ_1\) though.

Generate data from an ARMA(1,1) model with \(ϕ_1=0.6\), \(θ_1=0.6\) and \(σ_2=1\).

sim_ARMA <- function(phi, theta){
  set.seed(42)
  y <- ts(numeric(100))
  e <- rnorm(100)
  e[1] <- 0
  for(i in 2:100)
    y[i] <- phi*y[i-1] + theta*e[i-1] + e[i]
  return(y)
}

Generate data from an AR(2) model with \(ϕ_1=−0.8\), \(ϕ_2=0.3\) and \(σ_2=1\). (Note that these parameters will give a non-stationary series.)

sim_AR2 <- function(phi_1, phi_2){
  set.seed(42)
  y <- ts(numeric(100))
  e <- rnorm(100)
  for(i in 3:100)
    y[i] <- phi_1*y[i-1] + phi_2*y[i-2] + e[i]
  return(y)
}

Graph the latter two series and compare them.

autoplot(sim_ARMA(0.6, 0.6), series = "ARMA(1,1)") +
  autolayer(sim_AR2(-0.8, 0.3), series = "AR(2)") +
  theme(axis.title = element_blank(), legend.position = "bottom", legend.title = element_blank()) +
  scale_color_brewer(palette = "Set1")

Question 8.7

Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.

By studying appropriate graphs of the series in R, find an appropriate ARIMA(p,d,q) model for these data.

autoplot(wmurders) +
  ggtitle("Women Murdered in the U.S.") +
  theme(axis.title = element_blank())

The trends (both up and down) suggest that the series should be differenced before moving forward. ndiffs estimates that 2 differened are required to make this time series stationary.

ggtsplot(diff(wmurders, ndiffs(wmurders)), "Differenced Women Murdered in the U.S.")

The first lag of the PACF is the only spike which indicates the value of p in the ARIMA model should be 1. The high first value in the ACF indicates that the d is also one.

Should you include a constant in the model? Explain.

No. Based on the blog post by the text book author, since the differencing term (d) is 2 if c = 0 the eventual forecast function (EFF) will follow a straight line with intercept and slope determined by the last few observations. If c != 0 then the EFF will follow a quadratic trend. Opting for the simplier model We should not include it in the model.

Write this model in terms of the backshift operator.

\((1-\phi_1B) (1-B)^2y_t = c + (1 + \theta_1B)e_t\)

Fit the model using R and examine the residuals. Is the model satisfactory?

fit <- arima(wmurders, order = c(1, 2, 1))
checkresiduals(fit)


    Ljung-Box test

data:  Residuals from ARIMA(1,2,1)
Q* = 12.419, df = 8, p-value = 0.1335

Model df: 2.   Total lags used: 10

The ACF show all autocorrelations are withing acceptable limits, indicating residuals are white noise. The p-value of the Ljung-Box test also indicate the residuals are white noise.

Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.

forecast(fit, h=3) %>%
  kable() %>%
  kable_styling()

	Point Forecast	Lo 80	Hi 80	Lo 95	Hi 95
2005	2.470660	2.200091	2.741229	2.056861	2.884459
2006	2.363106	1.993529	2.732684	1.797886	2.928327
2007	2.252833	1.774677	2.730989	1.521557	2.984110

Create a plot of the series with forecasts and prediction intervals for the next three periods shown.

autoplot(forecast(fit, h=3))

Does auto.arima() give the same model you have chosen? If not, which model do you think is better?

auto.arima(wmurders)

Series: wmurders 
ARIMA(1,2,1) 

Coefficients:
          ar1      ma1
      -0.2434  -0.8261
s.e.   0.1553   0.1143

sigma^2 estimated as 0.04632:  log likelihood=6.44
AIC=-6.88   AICc=-6.39   BIC=-0.97

Yes auto.arima() gives the same model that I have chosen.