624_6

1

Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
#### a. Explain the differences among these figures. Do they all indicate that the data are white noise?
Yes, the figures all seem to indicate white noise as they falk within critical values. We can see that as we increase data size, critical values seem to decrease. #### b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?
Both ACF and PACF have the same critical values measured of \[\pm1.96 \sqrt{T}\], and as T increases, we should see a decrease in the critical values.

2

A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

ggtsdisplay(ibmclose)

As evident from the time series plot, there are trends and changing levels that are present in the data indicating that the series are not stationary. Further, ACF plot shows r values that are large and in general the plot decreases slowly, which would be opposite in non-stationary data. Lastly, PACF plot might indicate that the data may follow ARIMA(0,d,q) due to its sinusoidal nature. Differencing would help stabilizing the mean by removing aforementioned changes in levels and trend.

3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data. #### a. usnetelec

lambda <- BoxCox.lambda(usnetelec)
transformed <- BoxCox(usnetelec,lambda)
a <- autoplot(usnetelec, series = 'Original') 
b <- autoplot(transformed, series='Transformed')
grid.arrange(a, b, ncol = 1)

transformed %>% diff() %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.4315 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

The lambda for appropriate Box-Cox is 0.5167714 and 1 number of differences is required to make the data stationary as also confirmed by KPSS test.

b. `usgdp`

lambda <- BoxCox.lambda(usgdp)
transformed <- BoxCox(usgdp,lambda)
a <- autoplot(usgdp, series = 'Original') 
b <- autoplot(transformed, series='Transformed')
grid.arrange(a, b, ncol = 1)

transformed %>% diff() %>% diff() %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.012 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

The lambda for appropriate Box-Cox is 0.366352 and 2 number of differences is required to make the data stationary as also confirmed by KPSS test.

c. `mcopper`

lambda <- BoxCox.lambda(mcopper)
transformed <- BoxCox(mcopper,lambda)
a <- autoplot(mcopper, series = 'Original') 
b <- autoplot(transformed, series='Transformed')
grid.arrange(a, b, ncol = 1)

transformed %>% diff() %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 6 lags. 
## 
## Value of test-statistic is: 0.0573 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

The lambda for appropriate Box-Cox is 0.1919047 and 1 number of differences is required to make the data stationary as also confirmed by KPSS test.

d. `enplanements`

lambda <- BoxCox.lambda(enplanements)
transformed <- BoxCox(enplanements,lambda)
a <- autoplot(enplanements, series = 'Original') 
b <- autoplot(transformed, series='Transformed')
grid.arrange(a, b, ncol = 1)

transformed %>% diff() %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0151 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

The lambda for appropriate Box-Cox is -0.2269461 and 1 number of differences is required to make the data stationary as also confirmed by KPSS test.

e. `visitors`

lambda <- BoxCox.lambda(visitors)
transformed <- BoxCox(visitors,lambda)
a <- autoplot(visitors, series = 'Original') 
b <- autoplot(transformed, series='Transformed')
grid.arrange(a, b, ncol = 1)

transformed %>% diff() %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.0519 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

The lambda for appropriate Box-Cox is 0.2775249 and 1 number of differences is required to make the data stationary as also confirmed by KPSS test.

5

For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

download.file('https://otexts.com/fpp2/extrafiles/retail.xlsx','retail.xlsx')
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349398A"],
  frequency=12, start=c(1982,4))
ggtsdisplay(myts)

The variance seems to increase as the time series increase so we will Box-Cox transform the retail data as previously using lambda equal to 0.1231563. Further, because of the seasonality in the data, in addition to regular differencing, we will also perform seasonal differencing.

lambda <- BoxCox.lambda(myts)
transformed <- BoxCox(myts,lambda)
ndiffs(transformed)

## [1] 1

transformed %>% diff() %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0117 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

Using r functionaly and ndiffs() function, there are 1 number of differencing required.

ggtsdisplay(transformed, main='Transformed')

ggtsdisplay(diff(transformed), main='Differencing')

After performing the above, however, the series still appear to be non-stationary despite KPSS test indicating that the data are stationary. Trying a different approach, which involves log transforming, and performing both, seasonal and ordinary differencing, seems to achieve a better outcome, as evident from the bottom chart below that look like white noise.

cbind('Original' = myts,
      'Logs' = log(myts),
      'Seasonally adjusted logs' = diff(log(myts),12),
      'Double differenced logs' = diff(diff(log(myts),12),1)) %>%
  autoplot(facets=TRUE) +
  xlab("year") + ylab("")

6

a. Use R to simulate and plot some data from simple ARIMA models.

Use the following R code to generate data from an AR(1) model with ϕ1=0.6 and σ2=1. The process starts with y1=0.
Autoregression model uses linear combination of past values of the variable and can be written as \(y_{t} = c + \phi_{1}y_{t-1} + \phi_{2}y_{t-2} + ... + \phi_{p}y_{t-p} + e_{t}\)

set.seed(1)
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]

b. Produce a time plot for the series. How does the plot change as you change ϕ1?

a <- autoplot(y, main='Phi = 0.6')
set.seed(1)
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0*y[i-1] + e[i]
b <- autoplot(y, main='Phi = 0')
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- -0.6*y[i-1] + e[i]
c <- autoplot(y, main='Phi = -0.6')
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.9*y[i-1] + e[i]
d <- autoplot(y, main='Phi=0.9')
grid.arrange(a, b, c, d, ncol = 1)

\(\phi\) of zero is equivalent to a white noise since essentially no weight given to lags. The higher value of \(\phi\) is close to a random walk due to lag’s weight being almost equivalent to 1, i.e., original observation. Negative value of \(\phi\) keep sbouncing up and down as every subsequent value would have the opposite sign.

c. Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1.

Moving average model uses past forecast errors in a regression-like model and can be written as \(y_{t} = c + e_{t} + \theta_{1}e_{t-1} + \theta_{2}e_{t-2} + ... + \theta_{q}e_{t-q}\)

y <- ts(numeric(100))
e <- rnorm(100)
theta=0.6
for(i in 2:100){
  y[i] <- theta*e[i-1] + e[i]
}

d. Produce a time plot for the series. How does the plot change as you change θ1 ?

a <- autoplot(y, main=theta)

y <- ts(numeric(100))
e <- rnorm(100)
theta=-0.4
for(i in 2:100)
  y[i] <- theta*e[i-1] + e[i]
b <- autoplot(y, main=theta)

y <- ts(numeric(100))
e <- rnorm(100)
theta=0.9
for(i in 2:100)
  y[i] <- theta*e[i-1] + e[i]
c <- autoplot(y, main=theta)
grid.arrange(a, b,c, ncol = 1)

e. Generate data from an ARMA(1,1) model with ϕ1=0.6 , θ1=0.6 and σ2=1.

y <- ts(numeric(100))
e <- rnorm(100, sd=1)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + 0.6*e[i-1] + e[i]

f. Generate data from an AR(2) model with ϕ1=−0.8 , ϕ2=0.3 and σ2=1 . (Note that these parameters will give a non-stationary series.)

y2 <- ts(numeric(100))
e2 <- rnorm(100, sd=1)
for(i in 3:100)
  y2[i] <- -0.8*y2[i-1] + 0.3*y2[i-2] + e2[i]

g. Graph the latter two series and compare them.

a <- ggtsdisplay(y, main="ARMA(1,1): Phi = 0.6, Theta = 0.6")

b <- ggtsdisplay(y2, main="AR(2): Phi1 = -0.8, Phi2 = 0.3")

grid.arrange(a, b, ncol = 1)

ARMA(1,1) model doesn’t have any obvious patterns and in general appears to be stationary, which cannot be said about AR(2) model that keeps oscilating (most likely due to a negative value of \(\phi_{1}\)).

7

Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.
#### a. By studying appropriate graphs of the series in R, find an appropriate ARIMA( p,d,q) model for these data.

ggtsdisplay(wmurders)

wmurders %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.6331 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

Because the data exhibit trends and different levels, before applying ARMA, it requires differencing to make data more stationary. Although first order differencing could be sufficient, for better results, we’ll apply second order as it achieves lower KPSS p-value.

ndiffs(wmurders)

## [1] 2

wmurders %>% diff() %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.4697 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

wmurders %>% diff() %>% diff() %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.0458 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

autoplot(wmurders, main="Differencing") +
  autolayer(diff(wmurders), series = 'First order') +
  autolayer(diff(diff(wmurders)), series = 'Second order')

differenced <- wmurders %>% diff() %>% diff()
ggtsdisplay(differenced)

There are two significant spikes in ACF and one in PACF, there is also a sligtly less significant spike at lag 5 in PACF which we can ignore as it’s not in the first few lags. This suggest an AR(2) as PACF decreases, so we will try ARIMA(2,2,0).

b. Should you include a constant in the model? Explain.

No, because we’re using second-order differencing, including a constant will results in a long-term forecast following a quadratic trend.

c. Write this model in terms of the backshift operator.

\((1−ϕ_{1}B−ϕ_{2}B)(1−B)^2y_{t} = c+(1+\theta_{1}B)e_{t}\) where last part is equivalent to \(c+1\) due to q=0.

d. Fit the model using R and examine the residuals. Is the model satisfactory?

(fit <- Arima(wmurders, order=c(2,2,0)))

## Series: wmurders 
## ARIMA(2,2,0) 
## 
## Coefficients:
##           ar1      ar2
##       -0.8289  -0.2246
## s.e.   0.1346   0.1353
## 
## sigma^2 estimated as 0.05292:  log likelihood=3.34
## AIC=-0.68   AICc=-0.19   BIC=5.23

checkresiduals(fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,2,0)
## Q* = 21.05, df = 8, p-value = 0.007015
## 
## Model df: 2.   Total lags used: 10

ACF plot shows autocorrelations within the threshold limits, however, residuals appear to have non-constant variance, Ljung_Box statistic also show very small p-value indicating that we should reject null hypothesis that residuals are independent. After adjusting the model to include q of 1, the model appears to be satisfactory, so we will use ARIMA(2,2,1) instead.

(fit <- Arima(wmurders, order=c(2,2,1)))

## Series: wmurders 
## ARIMA(2,2,1) 
## 
## Coefficients:
##           ar1     ar2      ma1
##       -0.1247  0.2348  -0.9082
## s.e.   0.1612  0.1560   0.0866
## 
## sigma^2 estimated as 0.04518:  log likelihood=7.56
## AIC=-7.12   AICc=-6.28   BIC=0.76

checkresiduals(fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,2,1)
## Q* = 10.519, df = 7, p-value = 0.161
## 
## Model df: 3.   Total lags used: 10

e. Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.

forecast(fit, h=3)

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005       2.494109 2.221720 2.766497 2.077526 2.910691
## 2006       2.416338 2.037396 2.795280 1.836797 2.995880
## 2007       2.331118 1.812864 2.849371 1.538517 3.123718

Rewriting the second model in terms of backshift operator, we get: \((1−ϕ_{1}B−ϕ_{2}B)(1−B)^2y_{t} = c+(1+\theta_{1}B)e_{t}\) where \(\phi_{1}=-0.1247\), \(\phi{2}=0.2348\) and \(\theta{1}=-0.9082\).

We expand the equation to arrive at: \(y_{t} = y_{t-1}(2+\phi_{1})-y_{t-2}(1+2\phi_{1}-\phi_{2})+y_{t-3}(\phi_{1}-2\phi_{2}) + \phi_{2}y_{t-4}+e_{t}+\theta_{1}e_{t}\)

Replacing t with T+1 and assuming \(e_{T+1}=0\): \(y_{T+1} = 1.8753y_{t} - 0.5158y_{t-1} - 0.5943y_{t-2}+0.2348y_{t-3}-0.9082e_{t}\)

tail(wmurders, 4)

## Time Series:
## Start = 2001 
## End = 2004 
## Frequency = 1 
## [1] 3.304467 2.797697 2.662227 2.589383

tail(residuals(fit), 1)

## Time Series:
## Start = 2004 
## End = 2004 
## Frequency = 1 
## [1] 0.1121171

Substituting historical values into the above formula, point forecast for 2005 is 2.4939 which is very close to the forecast generated by R (so assume some rounding error).

f. Create a plot of the series with forecasts and prediction intervals for the next three periods shown.

autoplot(forecast(fit, h=3))

g. Does `auto.arima()` give the same model you have chosen? If not, which model do you think is better?

(fit <- auto.arima(wmurders, seasonal=F, stepwise=F, approximation=F))

## Series: wmurders 
## ARIMA(0,2,3) 
## 
## Coefficients:
##           ma1     ma2      ma3
##       -1.0154  0.4324  -0.3217
## s.e.   0.1282  0.2278   0.1737
## 
## sigma^2 estimated as 0.04475:  log likelihood=7.77
## AIC=-7.54   AICc=-6.7   BIC=0.35

No, auto.arima suggests to use ARIMA(0,2,3). In this particular case, auto.arima seems to perform better as it achieves lower AIC value of -7.54 vs manually chosen model’ of ARIMA(2,2,1)’s AIC of -7.12.

624_6

Albina Gallyavova

3/22/2020

1

2

3

b. `usgdp`

c. `mcopper`

d. `enplanements`

e. `visitors`

5

6

a. Use R to simulate and plot some data from simple ARIMA models.

b. Produce a time plot for the series. How does the plot change as you change ϕ1?

c. Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1.

d. Produce a time plot for the series. How does the plot change as you change θ1 ?

e. Generate data from an ARMA(1,1) model with ϕ1=0.6 , θ1=0.6 and σ2=1.

f. Generate data from an AR(2) model with ϕ1=−0.8 , ϕ2=0.3 and σ2=1 . (Note that these parameters will give a non-stationary series.)

g. Graph the latter two series and compare them.

7

b. Should you include a constant in the model? Explain.

c. Write this model in terms of the backshift operator.

d. Fit the model using R and examine the residuals. Is the model satisfactory?

e. Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.

f. Create a plot of the series with forecasts and prediction intervals for the next three periods shown.

g. Does `auto.arima()` give the same model you have chosen? If not, which model do you think is better?

624_6

Albina Gallyavova

3/22/2020

1

2

3

b. usgdp

c. mcopper

d. enplanements

e. visitors

5

6

a. Use R to simulate and plot some data from simple ARIMA models.

b. Produce a time plot for the series. How does the plot change as you change ϕ1?

c. Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1.

d. Produce a time plot for the series. How does the plot change as you change θ1 ?

e. Generate data from an ARMA(1,1) model with ϕ1=0.6 , θ1=0.6 and σ2=1.

f. Generate data from an AR(2) model with ϕ1=−0.8 , ϕ2=0.3 and σ2=1 . (Note that these parameters will give a non-stationary series.)

g. Graph the latter two series and compare them.

7

b. Should you include a constant in the model? Explain.

c. Write this model in terms of the backshift operator.

d. Fit the model using R and examine the residuals. Is the model satisfactory?

e. Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.

f. Create a plot of the series with forecasts and prediction intervals for the next three periods shown.

g. Does auto.arima() give the same model you have chosen? If not, which model do you think is better?

b. `usgdp`

c. `mcopper`

d. `enplanements`

e. `visitors`

g. Does `auto.arima()` give the same model you have chosen? If not, which model do you think is better?