hw 6 624

require(fpp2)
require(urca)

8.1 Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers, and 1,000 random numbers.

a. Explain the differences among these figures. Do they all indicate that the data are white noise?

Yes, they all indicate that the data are white noise. In each figure, most of the lags do not cross the blue, dotted line representing the bounds. We are looking for at least 95% of the lags to stay within the boundaries.

The graphs are all different because of the length of each series. The first has much broader boundaries because its \(T = 36\), meaning the boundaries should be within \(\frac{\pm2}{\sqrt{36}} = \pm \frac{1}{3}\), while the last one has a much larger \(T = 1,000\), meaning that its fraction will be far closer to 0. \(\frac{\pm2}{\sqrt{1000}} \approx \pm 0.0632\)

b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The different critical values ended up being answered above. As to the autocorrelation difference despite all three being white noise, it is likely because they are all utilizing different random numbers, and of course the size of the random numbers are increasing for each T. So the troughs and peaks will be different for each graph. Yes, they are all white noise, but their actual graphs will have a different look, and so will their autocorrelations since none use the same data.

8.2 A classic example of a non-stationary series is the daily closing IBM stock price series (data set `ibmclose`). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

data("ibmclose")

autoplot(ibmclose)

ggAcf(ibmclose)

ggPacf(ibmclose)

We know that the ibmclose dataset is non-stationary because we can see a very obvious downward trend to the plot illustrated very clearly in the correlogram. For the PACF plot, we see that r1 is very, very large and positive, while the rest also meets the criteria for a white noise series.

8.3 For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

a. `usnetelec`

data("usnetelec")
l1 <- BoxCox.lambda(usnetelec)
t1 <- BoxCox(usnetelec, l1)

autoplot(t1)

l1

## [1] 0.5167714

We see that even with an ideal lambda of 0.518, usnetelec still requires some differencing in order for it to become truly stationary.

autoplot(diff(t1))

Now the trend has been removed.

b. `usgdp`

data("usgdp")
l2 <- BoxCox.lambda(usgdp)
t2 <- BoxCox(usgdp, l2)

autoplot(t2)

l2

## [1] 0.366352

Again, the same holds true for usgdp. We still have a very large, noticable trend, despite using an optimum lambda of 0.366. We are going to need a first difference.

autoplot(diff(t2))

First difference seems to be enough to make the data appear stationary.

c. `mcopper`

data("mcopper")

l3 <- BoxCox.lambda(mcopper)
t3 <- BoxCox(usgdp, l3)

autoplot(t3)

l3

## [1] 0.1919047

autoplot(diff(t3))

A single difference might not be enough for a stationary transformation. We can test this using the ur.kpss function.

mcopper %>%
  diff() %>%
  ur.kpss() %>%
  summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 6 lags. 
## 
## Value of test-statistic is: 0.1843 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

The test statistic is small, but not close enough to zero, so we’ll take a second difference.

autoplot(diff(diff(t3)))

mcopper %>%
  diff() %>%
  diff() %>%
  ur.kpss() %>%
  summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 6 lags. 
## 
## Value of test-statistic is: 0.0237 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

A test statistic of 0.0237 is much better.

d. `enplanements`

data("enplanements")

l4 <- BoxCox.lambda(enplanements)
t4 <- BoxCox(enplanements, l4)

autoplot(t4)

l4

## [1] -0.2269461

autoplot(diff(t4))

e. `visitors`

data("visitors")

l5 <- BoxCox.lambda(visitors)
t5 <- BoxCox(visitors, l5)

autoplot(t5)

l5

## [1] 0.2775249

autoplot(diff(t5))

A single difference is enough to make the visitors dataset appear stationary.

8.5 For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

So, as usual we will need to select a particular column from the Excel data set.

retaildata <- readxl::read_excel("retail.xlsx", skip = 1)
myts <- ts(retaildata[, "A3349337W"], frequency = 12, start = c(1982, 4))

autoplot(myts)

ggAcf(myts)

ggPacf(myts)

From the correlogram, we see that the data has not only a trend, but a seasonality. From the actual plot, we can see the trend is upward in direction. However the PACF shows that this particular data doesn’t resemble white noise. This suggests that this retail data might not be random and uncorrelated with previous days.

Let’s examine a different column.

myts2 <- ts(retaildata[, "A3349335T"], frequency = 12, start = c(1982, 4))

autoplot(myts2)

ggAcf(myts2)

ggPacf(myts2)

The results are the same. Let’s try to make the first set stationary and see what happens.

l6 <- BoxCox.lambda(myts)
t6 <- BoxCox(myts, l6)

cbind(
  "Retail" = myts,
  "Box Cox" = t6,
  "Difference" = diff(t6)) %>%
  autoplot(facets = TRUE)

myts %>%
  diff() %>%
  ur.kpss() %>%
  summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0257 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

Well, after only one difference we have a test statistic of 0.0257, and our plot certainly looks stationary.

8.6 Use R to simulate and plot some data from simple ARIMA models.

a. Use the following R code to generate data from an AR(1) model with \(\phi_{1} = 0.6\) and \(\sigma^2 = 1\). The process starts with \(y_1 = 0\).

y <- ts(numeric(100))
e <- rnorm(100)

y1 <- y
e1 <- e
for(i in 2:100){
  y1[i] <- 0.6*y1[i-1] + e1[i]
}

b. Produce a time plot for the series. How does the plot change as you change \(\phi_1\)?

autoplot(y1) +
  ggtitle("Original")

y2 <- y
e2 <- e
for(i in 2:100){
  y2[i] <- 0.8*y2[i-1] + e2[i]
}

autoplot(y2) +
  ggtitle("Phi = 0.8")

y3 <- y
e3 <- e
for(i in 2:100){
  y3[i] <- 0.99*y3[i-1] + e3[i]
}

autoplot(y3) +
  ggtitle("Phi = 0.99")

y4 <- y
e4 <- e
for(i in 2:100){
  y4[i] <- 2*y4[i-1] + e3[i]
}

autoplot(y4) +
  ggtitle("Phi = 1.5")

We see that as we increase \(\phi_1\), the series starts to normalize into a line, while the rightmost end begins to take on an upwards trend. It turns into a simple curve once we go beyond whole numbers.

y5 <- y
e5 <- e
for(i in 2:100){
  y5[i] <- 0.3*y5[i-1] + e5[i]
}

autoplot(y5) +
  ggtitle("Phi = 0.3")

y6 <- y
e6 <- e
for(i in 2:100){
  y6[i] <- 0.1*y6[i-1] + e6[i]
}

autoplot(y6) +
  ggtitle("Phi = 0.1")

y7 <- y
e7 <- e
for(i in 2:100){
  y7[i] <- -0.1*y7[i-1] + e7[i]
}

autoplot(y7) +
  ggtitle("Phi = -0.1")

As we decrease \(\phi_1\), the time series begins to look more stationary.

c. Write your own code to generate data from an MA(1) model with \(\theta_1 = 0.6\) and \(\sigma^2 = 1\).

y.ma <- ts(numeric(100))
e.ma <- rnorm(100)

y.ma1 <- y.ma
e.ma1 <- e.ma
for(i in 2:100){
  y.ma1[i] <- (0.6^i-1)*y.ma1[i-1] + e[i]
}

d. Produce a time plot for the series. How does the plot change as you change \(\theta_1\)?

autoplot(y.ma1) +
  ggtitle("Original")

y.ma2 <- y.ma
e.ma2 <- e.ma
for(i in 2:100){
  y.ma2[i] <- (0.8^i-1)*y.ma2[i - 1] + e[i]
}

autoplot(y.ma2) +
  ggtitle("Theta = 0.8")

y.ma3 <- y.ma
e.ma3 <- e.ma
for(i in 2:100){
  y.ma3[i] <- (0.99^i-1)*y.ma3[i - 1] + e[i]
}

autoplot(y.ma3) +
  ggtitle("Theta = 0.99")

As \(\theta_1\) increases towards 1, the graph starts to straighten out more.

y.ma4 <- y.ma
e.ma4 <- e.ma

for(i in 2:100){
  y.ma4[i] <- (0.4^i-1)*y.ma4[i - 1] + e[i]
}

y.ma5 <- y.ma
e.ma5 <- e.ma

for(i in 2:100){
  y.ma5[i] <- (0.2^i-1)*y.ma5[i - 1] + e[i]
}

y.ma6 <- y.ma
e.ma6 <- e.ma

for(i in 2:100){
  y.ma6[i] <- (0.001^i-1)*y.ma6[i - 1] + e[i]
}

autoplot(y.ma4) +
  ggtitle("Theta = 0.4")

autoplot(y.ma5) +
  ggtitle("Theta = 0.2")

autoplot(y.ma6) +
  ggtitle("Theta = 0.001")

As theta decreases, there doesn’t seem to be much change in the graphs.

e. Generate data from an ARMA(1,1) model with \(\phi_1 = 0.6\), \(theta_1 = 0.6\), and \(\sigma^2 = 1\).

To change things up, and make things slightly simpler, let’s utilize the arima.sim function to generate the ARMA(1,1) model.

arma1 <- arima.sim(model = list(ar = 0.6, ma = 0.6), n = 100)

f. Generate data from an AR(2) model with \(\theta_1 = -0.8\), \(\theta_2 = 0.3\), and \(\sigma^2 = 1\). (Note that these parameters will give a non-stationary series).

Because this will give a non-stationary series, we can’t use arima.sim, so I hope my interpretation is correct.

y.ar2 <- ts(numeric(100))
e.ar2 <- rnorm(100)
for(i in 3:100){
  y.ar2[i] <- -0.8*y.ar2[i-1] + 0.3*y.ar2[i-2] + e.ar2[i]
}

g. Graph the latter two series and compare them.

autoplot(arma1) +
  ggtitle("ARMA(1,1)")

autoplot(y.ar2) +
  ggtitle("AR(2)")

They look drastically different, at least from the seed when this was originally run. ARMA(1,1) seems like a typical, stationary, almost white noise graph we’ve come to see in this chapter, while AR(2) starts like a line that gradually starts to grow with oscillation, like a seismograph that’s picked up an earthquake.

8.7 Consider `wmurders`, the number of women murdered each year (per 100,000 standard population) in the United States.

a. By studying appropriate graphs of the series in R, find an appropriate ARMIA(p,d,q) model for these data.

data('wmurders')
autoplot(wmurders)

ggAcf(wmurders)

ggPacf(wmurders)

We can see from the original time-series plot that there is no seasonality to be found within the data. In looking at the ACF graph, we can see that the correlogram shows an exponentially decaying pattern, and that there is a significant lag at p in the PACF graph, which would be 1. This seems to suggest that we’re looking at an ARIMA(p,d,0) model. However, this doesn’t appear to be stationary, so we would need to take a diff.

wmurders %>%
  diff() %>%
  ggtsdisplay(main = "")

Now this is looking more like an ARIMA(0,d,q) model, where our q = 2.

b. Should you include a constant in the model? Explain.

In an ideal world, we’d like the long-term forecasts to tend towards zero, however realistically this won’t be the case. Instead, we’d like it to go towards some non-zero constant, preferably low. According to Hyndman, that would happen if c = 0 and d = 1, so we won’t be adding a constant.

As of now, the model looks like it should be ARIMA(0,1,2)

c. Write this model in terms of the backshift operator.

d. Fit the model using R and examine the residuals. Is the model satisfactory?

fit <- Arima(wmurders, order = c(0,1,2))
fit

## Series: wmurders 
## ARIMA(0,1,2) 
## 
## Coefficients:
##           ma1     ma2
##       -0.0660  0.3712
## s.e.   0.1263  0.1640
## 
## sigma^2 estimated as 0.0422:  log likelihood=9.71
## AIC=-13.43   AICc=-12.95   BIC=-7.46

plot(fit$residuals)

Acf(fit$residuals)

Our model appears satisfactory. The residuals are all within the bounds. The plot vaguely resembles white noise.

e. Forecast three times ahead. Check your forecasts by hand to make sure that you know periods shown.

fcast <- forecast(fit, h=3)
fcast

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005       2.458450 2.195194 2.721707 2.055834 2.861066
## 2006       2.477101 2.116875 2.837327 1.926183 3.028018
## 2007       2.477101 1.979272 2.974929 1.715738 3.238464

To calculate this by hand, we’ll be utilizing the equations from above for AR and MA, where \(y_{t} = c + y_{t-1} + e_{t}\). Although for the prediction, we set \(\epsilon_{T+1} = 0\).

c <- fit$coef[1]
yt <- wmurders[55]
et <- 0
f2005 <- c + yt + et
f2006 <- c + f2005
f2007 <- c + f2006

c(f2005, f2006, f2007)

##      ma1      ma1      ma1 
## 2.523390 2.457397 2.391403

Except the calculations don’t line up with the model. Presumably, while we know \(y_t\), as it is the last known value from wmurders, we are missing the last, corresponding \(\epsilon_{T}\) value.

f. Create a plot of the series with forecasts and prediction intervals for the next three periods shown.

autoplot(fcast)

g . Does `auto.arima()` give the same model you have chosen? If not, which model do you think is better?

autofit <- auto.arima(wmurders, seasonal = FALSE,
                      stepwise = FALSE, approximation = FALSE)
autofit

## Series: wmurders 
## ARIMA(0,2,3) 
## 
## Coefficients:
##           ma1     ma2      ma3
##       -1.0154  0.4324  -0.3217
## s.e.   0.1282  0.2278   0.1737
## 
## sigma^2 estimated as 0.04475:  log likelihood=7.77
## AIC=-7.54   AICc=-6.7   BIC=0.35

autoplot(autofit$residuals)

Truthfully, given that I’m still very new to ARIMA, auto.arima() is likely the better model. For another thing, speaking more objectively, the AIC and BIC for both are much lower in the auto.arima() than our guess.

hw 6 624

Iden Watanabe

March 12, 2019

8.1 Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers, and 1,000 random numbers.

a. Explain the differences among these figures. Do they all indicate that the data are white noise?

b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

8.2 A classic example of a non-stationary series is the daily closing IBM stock price series (data set `ibmclose`). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

8.3 For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

a. `usnetelec`

b. `usgdp`

c. `mcopper`

d. `enplanements`

e. `visitors`

8.5 For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

8.6 Use R to simulate and plot some data from simple ARIMA models.

a. Use the following R code to generate data from an AR(1) model with \(\phi_{1} = 0.6\) and \(\sigma^2 = 1\). The process starts with \(y_1 = 0\).

b. Produce a time plot for the series. How does the plot change as you change \(\phi_1\)?

c. Write your own code to generate data from an MA(1) model with \(\theta_1 = 0.6\) and \(\sigma^2 = 1\).

d. Produce a time plot for the series. How does the plot change as you change \(\theta_1\)?

e. Generate data from an ARMA(1,1) model with \(\phi_1 = 0.6\), \(theta_1 = 0.6\), and \(\sigma^2 = 1\).

f. Generate data from an AR(2) model with \(\theta_1 = -0.8\), \(\theta_2 = 0.3\), and \(\sigma^2 = 1\). (Note that these parameters will give a non-stationary series).

g. Graph the latter two series and compare them.

8.7 Consider `wmurders`, the number of women murdered each year (per 100,000 standard population) in the United States.

a. By studying appropriate graphs of the series in R, find an appropriate ARMIA(p,d,q) model for these data.

b. Should you include a constant in the model? Explain.

c. Write this model in terms of the backshift operator.

d. Fit the model using R and examine the residuals. Is the model satisfactory?

e. Forecast three times ahead. Check your forecasts by hand to make sure that you know periods shown.

f. Create a plot of the series with forecasts and prediction intervals for the next three periods shown.

g . Does `auto.arima()` give the same model you have chosen? If not, which model do you think is better?

hw 6 624

Iden Watanabe

March 12, 2019

8.1 Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers, and 1,000 random numbers.

a. Explain the differences among these figures. Do they all indicate that the data are white noise?

b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

8.2 A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

8.3 For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

a. usnetelec

b. usgdp

c. mcopper

d. enplanements

e. visitors

8.5 For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

8.6 Use R to simulate and plot some data from simple ARIMA models.

a. Use the following R code to generate data from an AR(1) model with \(\phi_{1} = 0.6\) and \(\sigma^2 = 1\). The process starts with \(y_1 = 0\).

b. Produce a time plot for the series. How does the plot change as you change \(\phi_1\)?

c. Write your own code to generate data from an MA(1) model with \(\theta_1 = 0.6\) and \(\sigma^2 = 1\).

d. Produce a time plot for the series. How does the plot change as you change \(\theta_1\)?

e. Generate data from an ARMA(1,1) model with \(\phi_1 = 0.6\), \(theta_1 = 0.6\), and \(\sigma^2 = 1\).

f. Generate data from an AR(2) model with \(\theta_1 = -0.8\), \(\theta_2 = 0.3\), and \(\sigma^2 = 1\). (Note that these parameters will give a non-stationary series).

g. Graph the latter two series and compare them.

8.7 Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.

a. By studying appropriate graphs of the series in R, find an appropriate ARMIA(p,d,q) model for these data.

b. Should you include a constant in the model? Explain.

c. Write this model in terms of the backshift operator.

d. Fit the model using R and examine the residuals. Is the model satisfactory?

e. Forecast three times ahead. Check your forecasts by hand to make sure that you know periods shown.

f. Create a plot of the series with forecasts and prediction intervals for the next three periods shown.

g . Does auto.arima() give the same model you have chosen? If not, which model do you think is better?

8.2 A classic example of a non-stationary series is the daily closing IBM stock price series (data set `ibmclose`). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

a. `usnetelec`

b. `usgdp`

c. `mcopper`

d. `enplanements`

e. `visitors`

8.7 Consider `wmurders`, the number of women murdered each year (per 100,000 standard population) in the United States.

g . Does `auto.arima()` give the same model you have chosen? If not, which model do you think is better?