require(fpp2)

8.1. Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

a. Explain the differences among these figures. Do they all indicate that the data are white noise?

Figure 8.31: Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers.

The difference between these figures is that they lag at different instances. They all indicate that the data is white noise just at different levels.

b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The reason why they are different is because they each have a different number of observations, also the reason why the critical values are at different distances from the mean of zero. The more observations it has the closer the distance between critical values is.

8.2. A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

autoplot(ibmclose)

This plot indicates that the data is non-stationary because it is trending downward.

ggAcf(ibmclose)

this plot is showing that it it is decreasing slowly which also indicates that it is non-stationary

ggPacf(ibmclose)

The first lag is at 1 which indicates that it is non-stationary

8.3. For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

a. usnetelec

autoplot(usnetelec)

(a<- BoxCox.lambda(usnetelec))
## [1] 0.5167714
ggtsdisplay(BoxCox(usnetelec,a))

The plot shows that it is non-stationary, and it is mostly linear. One ordered difference should work.

ggtsdisplay(diff(usnetelec))

Looking at this we can see that it is stationary and the ACF and the PACF is within the critical values.

b. usgdp

autoplot(usgdp)

(b <- BoxCox.lambda(usgdp))
## [1] 0.366352
ggtsdisplay(BoxCox(usgdp, b))

The plot shows that it is non-stationary, and it is mostly linear. One ordered difference should work.

ggtsdisplay(diff(BoxCox(usgdp, b)))

Looking at this we can see that it is stationary and the ACF and the PACF is within the critical values.

c. mcopper

autoplot(mcopper)

(c <- BoxCox.lambda(mcopper))
## [1] 0.1919047
ggtsdisplay(BoxCox(mcopper, c))

ggtsdisplay(diff(BoxCox(mcopper, c)))

looking at this shows that the difference worked because ACF and PACF is close to 1 and it is within the critical values.

d. enplanements

autoplot(enplanements)

(d <- BoxCox.lambda(enplanements))
## [1] -0.2269461
d1<-BoxCox(enplanements, d)
ggtsdisplay(d1)

Based on the ACF and the PACF there seems to be a lag of 12 that is seasonal and monthly as well how it goes up at a certain time and down at a certain time

d1 %>% diff(lag=12) %>% ggtsdisplay()

This looks like it should be differenced one more time.

d1 %>% diff(lag=12) %>% diff() %>% ggtsdisplay()

This looks like a big improvement since the ACF and the PACF lags are closer to 0

e. visitors

autoplot(visitors)

(e <- BoxCox.lambda(visitors))
## [1] 0.2775249
e1<-BoxCox(visitors, e)
ggtsdisplay(e1)

Based on the ACF and the PACF there seems to be a lag of 12 that is seasonal and monthly as well how it goes up at a certain time and down at a certain time

e1 %>% diff(lag=12) %>% ggtsdisplay()

This looks liks it should be differenced one more time

e1 %>% diff(lag=12) %>% diff() %>% ggtsdisplay()

This looks like a big improvement since the ACF and the PACF lags are closer to 0

8.5. For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

retaildata <- readxl::read_excel("retail.xlsx", skip = 1)
myts <- ts(retaildata[,"A3349352V"], frequency=12, start=c(1982,4))
autoplot(myts) 

ggtsdisplay(myts)

(five <- BoxCox.lambda(myts))
## [1] 0.1738794
fiver<- BoxCox(myts,five)
ggtsdisplay(fiver)

Since this data is seasonal we will take a seasonl approach with a lag of 12 since the lag seems to be by every 12

fiver %>% diff(lag=12) %>% ggtsdisplay()

Lets see how it would look if it was differenced again

fiver %>% diff(lag=12) %>% diff() %>% ggtsdisplay()

This looks like a big improvement since the ACF and the PACF lags are closer to 0

8.6. Use R to simulate and plot some data from simple ARIMA models.

a. Use the following R code to generate data from an AR(1) model with AR(1) model with ϕ1= 0.6 and σ^2 = 1. The process starts with y_1=0 .

set.seed(123)
tsAR1 <- function(phi) {
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
  return (y)
} 

b. Produce a time plot for the series. How does the plot change as you change ϕ1?

When you change the value the time series changes as well.

ggtsdisplay(tsAR1(0.6))

ggtsdisplay(tsAR1(0.1))

ggtsdisplay(tsAR1(0.8))

ggtsdisplay(tsAR1(0.9))

c. Write your own code to generate data from an MA(1) model with ϕ1= 0.6 and σ^2 = 1.

tsMA1 <- function(theta) {
    y <- ts(numeric(100))
    e <- rnorm(100)
    for(i in 2:100)
      y[i] <- theta*e[i-1] + e[i]
    return (y)
}

d. Produce a time plot for the series. How does the plot change as you change ϕ1 .

When you change the value the time series changes as well.

ggtsdisplay(tsMA1(0.6))

ggtsdisplay(tsMA1(0.1))

ggtsdisplay(tsMA1(0.8))

ggtsdisplay(tsMA1(1))

e. Generate data from an ARMA(1,1) model with ϕ1 = 0.6, θ1 = 0.6, and σ^2=1 .

tsARMA1 <- function(phi, theta) {
    y <- ts(numeric(100))
    e <- rnorm(100)
    for(i in 2:100)
      y[i] <- phi*y[i-1] + theta*e[i-1] + e[i]
    return (y)
}

f. Generate data from an AR(2) model with ϕ1 =-0.8, ϕ2 = 0.3 , and σ^2=1? (Note that these parameters will give a non-stationary series.)

tsAR2 <- function(phi_1, phi_2) {
    y <- ts(numeric(100))
    e <- rnorm(100)
    for(i in 3:100)
      y[i] <- phi_1*y[i-1] + phi_2*y[i-2] + e[i]
    return (y)
}

g. Graph the latter two series and compare them.

ggtsdisplay(tsARMA1(.6,.6))

ggtsdisplay(tsAR2(-0.8,0.3))

Both show that they are around the zero mean, the second one stays along 0 and then over time it increases.

8.7. Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.

a. By studying appropriate graphs of the series in R, find an appropriate ARIMA(p,d,q) model for these data.

autoplot(wmurders)

ggtsdisplay(wmurders)

ndiffs(wmurders)
## [1] 2

The ndiff is 2

ggtsdisplay(diff(wmurders))

ggtsdisplay(diff(wmurders))

Looking at the ACF and the PACF the model looks like a AR(2) model so the (p,d,q) can be either (2,1,0) or (0,1,2)

b. Should you include a constant in the model? Explain.

A constant does not need to be included because there is no drift in the series.

c. Write this model in terms of the backshift operator.

\[(1 - B)y_{t}\]

d. Fit the model using R and examine the residuals. Is the model satisfactory?

(fit1 <- Arima(wmurders, order = c(2,1,0)))
## Series: wmurders 
## ARIMA(2,1,0) 
## 
## Coefficients:
##           ar1     ar2
##       -0.0572  0.2967
## s.e.   0.1277  0.1275
## 
## sigma^2 estimated as 0.04265:  log likelihood=9.48
## AIC=-12.96   AICc=-12.48   BIC=-6.99
(fit2 <- Arima(wmurders, order = c(0,1,2)))
## Series: wmurders 
## ARIMA(0,1,2) 
## 
## Coefficients:
##           ma1     ma2
##       -0.0660  0.3712
## s.e.   0.1263  0.1640
## 
## sigma^2 estimated as 0.0422:  log likelihood=9.71
## AIC=-13.43   AICc=-12.95   BIC=-7.46

The best one is (0,1,2)

checkresiduals(fit2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,2)
## Q* = 9.7748, df = 8, p-value = 0.2812
## 
## Model df: 2.   Total lags used: 10

e. Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.

forecast(fit2, h=3)
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005       2.458450 2.195194 2.721707 2.055834 2.861066
## 2006       2.477101 2.116875 2.837327 1.926183 3.028018
## 2007       2.477101 1.979272 2.974929 1.715738 3.238464

Checking forecast by hand for ARIMA(0,1,2) and T+1

\[y_{t} = c + \varepsilon_t + \theta_{1}\varepsilon_{t-1} + \theta_{2}\varepsilon_{t-2}\] \[y_{T+h} = c + \varepsilon_{T+h} + \theta_{1}\varepsilon_{T+h-1} + \theta_{2}\varepsilon_{T+h-2}\]

\[y_{T+1} = 0 + 0 + \theta_{1}\varepsilon_{T} + \theta_{2}\varepsilon_{T-1}\]

res <- resid(fit2)
len <- length(res)
et <- res[len]
et_1 <- res[len - 1]
ma1 <- coef(fit2)["ma1"]
ma2 <- coef(fit2)["ma2"]

\[y_{T+1} = (-0.066)*(0.0502) + (0.3712)*(-0.3438)\]

yt1 <- wmurders[length(wmurders)] + ma1 * et + ma2 * et_1
yt2 <- yt1 + ma2 * et
yt3 <- yt2

h <- data.frame(c(yt1,yt2,yt3))
colnames(h) <- c("Forecasts h=3")
h
##   Forecasts h=3
## 1      2.458450
## 2      2.477101
## 3      2.477101

f. Create a plot of the series with forecasts and prediction intervals for the next three periods shown.

autoplot(forecast(fit2, h=3))

g. Does auto.arima() give the same model you have chosen? If not, which model do you think is better?

auto.arima(wmurders)
## Series: wmurders 
## ARIMA(1,2,1) 
## 
## Coefficients:
##           ar1      ma1
##       -0.2434  -0.8261
## s.e.   0.1553   0.1143
## 
## sigma^2 estimated as 0.04632:  log likelihood=6.44
## AIC=-6.88   AICc=-6.39   BIC=-0.97
(fit2 <- Arima(wmurders, order = c(0,1,2)))
## Series: wmurders 
## ARIMA(0,1,2) 
## 
## Coefficients:
##           ma1     ma2
##       -0.0660  0.3712
## s.e.   0.1263  0.1640
## 
## sigma^2 estimated as 0.0422:  log likelihood=9.71
## AIC=-13.43   AICc=-12.95   BIC=-7.46
autoplot(forecast(auto.arima(wmurders)))

Looking at the AIC the model given with the auto.arima is not the best option the chosen model is better.