1 1. Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

1.1 a. Explain the differences among these figures. Do they all indicate that the data are white noise?

Ketiga gambar tersebut menunjukkan bahwa data tersebut adalah white noise, karena semua batang ACF berada di dalam garis putus-putus yang menunjukkan nilai kritis agar ACF dianggap signifikan secara statistik. Untuk data dengan jumlah sampel yang lebih kecil, batang ACF lebih tinggi daripada data dengan jumlah sampel yang lebih banyak. Karena data dibuat secara acak, mereka terdistribusi secara independen dan identik, dan oleh karena itu harus memiliki autokorelasi nol untuk semua kelambatannya. Hal ini ditunjukkan oleh gambar - semakin banyak sampel yang dihasilkan, autokorelasi cenderung ke arah nol.

1.2 b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

Garis putus-putus ini diperkirakan menggunakan ± 1.96 / N −− √ dengan pusat nol. Secara matematis, semakin besar N, nilai absolut dari nilai kritis menjadi semakin kecil. Secara statistik, ini berarti “lebih mudah” untuk kumpulan data yang lebih kecil menunjukkan korelasi secara kebetulan daripada kumpulan data yang lebih besar. Jadi untuk mengimbanginya, nilai absolut dari nilai kritis lebih besar untuk kumpulan data yang lebih kecil - Anda harus memiliki autokorelasi yang lebih tinggi untuk menolak hipotesis nol bahwa autokorelasi adalah nol (yaitu autokorelasi yang diamati hanya karena kebetulan).

2 2. A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

ggtsdisplay(gafa_stock$Close)

ndiffs(gafa_stock$Close)
## [1] 1
Gafadiff1 <- diff(gafa_stock$Close)

ggtsdisplay(Gafadiff1)

3 3. For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

3.1 a. Turkish GDP from global_economy.

TurkishGDP <- global_economy %>% filter(Country == "Turkey")
ggtsdisplay(TurkishGDP$GDP)

Global_BC <- BoxCox(TurkishGDP$GDP, lambda = BoxCox.lambda(TurkishGDP$GDP))
ndiffs(Global_BC)
## [1] 1
Global1 <- diff(Global_BC)

ggtsdisplay(Global1)

3.2 b. Accommodation takings in the state of Tasmania from aus_accommodation.

Tasmania1 <- aus_accommodation %>% filter(State == "Tasmania")
ggtsdisplay(Tasmania1$Takings)

Tasmania_BC <- BoxCox(Tasmania1$Takings, lambda = BoxCox.lambda(Tasmania1$Takings))
ndiffs(Tasmania_BC)
## [1] 1
Tasmaniadiff <- diff(Tasmania_BC)

ggtsdisplay(Tasmaniadiff)

3.3 c. Monthly sales from souvenirs.

Souvenirs1 <- souvenirs
ggtsdisplay(Souvenirs1$Sales)

Souvenirs_BC <- BoxCox(Souvenirs1$Sales, lambda = BoxCox.lambda(Souvenirs1$Sales))
ndiffs(Souvenirs_BC)
## [1] 1
Souvenirsdiff <- diff(Souvenirs_BC)

ggtsdisplay(Souvenirsdiff)

4 4 For the souvenirs data, write down the differences you chose above using backshift operator notation.`

5 5. For your retail data (from Exercise 8 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

retaildata <- readxl::read_excel("Retail.xlsx")
myts <- ts(retaildata[,"InvoiceDate"],frequency = 12, start = c(1982,4))
autoplot(myts)

myts_BC <- BoxCox(myts, lambda = BoxCox.lambda(myts))
ggtsdisplay(myts_BC)

myts_BC1 <- myts_BC %>% diff()

myts_BC1 %>% ndiffs()
## [1] 1
myts_BC1 %>% ggtsdisplay()

myts_BC2 <- myts_BC1 %>% diff(lag = 12)

myts_BC2 %>% ggtsdisplay()

ndiffs(myts_BC2)
## [1] 0
nsdiffs(myts_BC2)
## [1] 0

6 6. Simulate and plot some data from simple ARIMA models.

6.1 a. Use the following R code to generate data from an AR(1) model with \(phi_1=0.6\) and \(sigma^2=1\). The process starts with \(y_1=0\)

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

6.2 b. Produce a time plot for the series. How does the plot change as you change \(phi_1\)?

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
p1 <- autoplot(y) +ggtitle("phi=0.6")


for(i in 2:100)
  y[i] <- 0.1*y[i-1] + e[i]
p2<- autoplot(y)+ggtitle(" phi=0.1")




for(i in 2:100)
  y[i] <- 1.0*y[i-1] + e[i]
p3<- autoplot(y)+ggtitle("phi= 1.0")

p1

p2

p3

6.3 c. Write your own code to generate data from an MA(1) model with \(theta_1=0.6\) and \(sigma^2=1\).

ma.1 <- function(theta, sd=1, n=100){
  y <- ts(numeric(n))
  e <- rnorm(n, sd=sd)
  e[1] <- 0
  for(i in 2:n)
    y[i] <- theta*e[i-1] + e[i]
  return(y)
}

6.4 d. Produce a time plot for the series. How does the plot change as you change \(theta_1\)?

data.list <- list()
i <- 0
theta.vec <- c(0.0000001, 0.0001, 0.1) * 6
for (theta in theta.vec){
  i <- i + 1
  data.list[[i]] <- ma.1(theta)
}
gen.data <- do.call(cbind, data.list)
colnames(gen.data) <- paste('theta=', theta.vec, sep = '')
autoplot(gen.data) + ylab('Data')

6.5 e. Generate data from an ARMA(1,1) model with \(phi_1=0.6\), \(theta_1=0.6\) and \(sigma^2=1\)

y1 <- ts(numeric(100))
e <- rnorm(100, sd=1)
for(i in 2:100)
  y1[i] <- 0.6*y1[i-1] + 0.6*e[i-1] + e[i]
autoplot(y1) +
  ggtitle('ARMA(1,1) Generated Data')

6.6 f. Generate data from an AR(2) model with \(theta_1=-0.8\), \(theta_2=0.3\) and \(sigma^2=1\). (Note that these parameters will give a non-stationary series.)

y2 <- ts(numeric(100))
e <- rnorm(100, sd=1)
for(i in 3:100)
  y2[i] <- -0.8*y2[i-1] + 0.3*y2[i-2] + e[i]
autoplot(y2) +
  ggtitle('AR(2) Generated Data')

## g. Graph the latter two series and compare them.

par(mfrow=c(1,2))
acf(y1, main='ARMA(1,1) Generated Data')
acf(y2, main='AR(2) Generated Data')

# 7. Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011. ## a. Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods. ## b. Write the model in terms of the backshift operator. ## c. Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a. ## d. Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens. ## e. Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

7 8. For the United States GDP series (from global_economy):

data(global_economy)
plot(global_economy)

7.1 a. if necessary, find a suitable Box-Cox transformation for the data;

lam <- BoxCox.lambda(global_economy$GDP)
global.bcx <- BoxCox(global_economy$GDP, lambda = lam)
tsdisplay(global.bcx)

7.2 b. fit a suitable ARIMA model to the transformed data using ARIMA();

fit <- auto.arima(global_economy$GDP, trace = TRUE, ic ="aic", lambda = lam)
## 
##  Fitting models using approximations to speed things up...
## 
##  ARIMA(2,0,2) with non-zero mean : Inf
##  ARIMA(0,0,0) with non-zero mean : 87508.09
##  ARIMA(1,0,0) with non-zero mean : 45033.15
##  ARIMA(0,0,1) with non-zero mean : 36186.37
##  ARIMA(0,0,0) with zero mean     : 124043.8
##  ARIMA(1,0,1) with non-zero mean : 7587.691
##  ARIMA(2,0,1) with non-zero mean : Inf
##  ARIMA(1,0,2) with non-zero mean : Inf
##  ARIMA(0,0,2) with non-zero mean : 29097
##  ARIMA(2,0,0) with non-zero mean : 45122.56
##  ARIMA(1,0,1) with zero mean     : Inf
## 
##  Now re-fitting the best model(s) without approximations...
## 
##  ARIMA(1,0,1) with non-zero mean : 46053.52
## 
##  Best model: ARIMA(1,0,1) with non-zero mean
fit
## Series: global_economy$GDP 
## ARIMA(1,0,1) with non-zero mean 
## Box Cox transformation: lambda= 0.04769931 
## 
## Coefficients:
##          ar1     ma1     mean
##       0.9828  0.0621  44.7575
## s.e.  0.0015  0.0089   0.8371
## 
## sigma^2 estimated as 2.781:  log likelihood=-23022.76
## AIC=46053.52   AICc=46053.53   BIC=46084.03

7.3 c. try some other plausible models by experimenting with the orders chosen;

fit2 <- Arima(global_economy$GDP, order = c(0,1,2), lambda = lam)
fit3 <- Arima(global_economy$GDP, order = c(1,1,2), lambda = lam)

accuracy(fit)
##                        ME         RMSE          MAE       MPE     MAPE     MASE
## Training set 103909788575 1.051557e+12 165545654179 -3023.578 3039.788 1.542086
##                   ACF1
## Training set 0.2316001
accuracy(fit2)
##                       ME         RMSE          MAE       MPE     MAPE      MASE
## Training set 19358979256 958061637739 103255710294 -3301.808 3318.423 0.9618445
##                      ACF1
## Training set -0.005642225
accuracy(fit3)
##                       ME         RMSE          MAE       MPE     MAPE      MASE
## Training set 14526564856 953953513558 100406947882 -3297.973 3314.469 0.9353078
##                     ACF1
## Training set -0.01186029

7.4 d. choose what you think is the best model and check the residual diagnostics;

plot(fit$residuals)

7.5 e. produce forecasts of your fitted model. Do the forecasts look reasonable?

fcst <- forecast(fit, h = 10)
plot(fcst)

7.6 f. compare the results with what you would obtain using ETS() (with no transformation).

fit4 <- ets(global_economy$GDP); fit4
## Warning in ets(global_economy$GDP): Missing values encountered. Using longest
## contiguous portion of time series
## ETS(M,N,N) 
## 
## Call:
##  ets(y = global_economy$GDP) 
## 
##   Smoothing parameters:
##     alpha = 0.9999 
## 
##   Initial states:
##     l = 44237333811.1732 
## 
##   sigma:  0.6476
## 
##      AIC     AICc      BIC 
## 24919.44 24919.50 24931.65
fcst2 <- forecast(fit4, h = 10)
plot(fcst2)

8 9. Consider aus_arrivals, the quarterly number of international visitors to Australia from several countries for the period 1981 Q1 – 2012 Q3.

8.1 a. Select one country and describe the time plot.

8.2 b. Use differencing to obtain stationary data.

8.3 c. What can you learn from the ACF graph of the differenced data?

8.4 d. What can you learn from the PACF graph of the differenced data?

8.5 e. What model do these graphs suggest?

8.6 f.Does ARIMA() give the same model that you chose? If not, which model do you think is better?

8.7 g. Write the model in terms of the backshift operator, then without using the backshift operator.

9 10. Choose a series from us_employment, the total employment in different industries in the United States.

9.1 a.Produce an STL decomposition of the data and describe the trend and seasonality.

9.2 b. Do the data need transforming? If so, find a suitable transformation.

9.3 c. Are the data stationary? If not, find an appropriate differencing which yields stationary data.

9.4 d. Identify a couple of ARIMA models that might be useful in describing the time series. Which of your models is the best according to their AICc values?

9.5 e. Estimate the parameters of your best model and do diagnostic testing on the residuals. Do the residuals resemble white noise? If not, try to find another ARIMA model which fits better.

9.6 f. Forecast the next 3 years of data. Get the latest figures from https://fred.stlouisfed.org/categories/11 to check the accuracy of your forecasts.

9.7 g. Eventually, the prediction intervals are so wide that the forecasts are not particularly useful. How many years of forecasts do you think are sufficiently accurate to be usable?

10 11. Choose one of the following seasonal time series: the Australian production of electricity, cement, or gas (from aus_production).

10.1 a. Do the data need transforming? If so, find a suitable transformation.

10.2 b. Are the data stationary? If not, find an appropriate differencing which yields stationary data.

10.3 c. Identify a couple of ARIMA models that might be useful in describing the time series. Which of your models is the best according to their AIC values?

10.4 d. Estimate the parameters of your best model and do diagnostic testing on the residuals. Do the residuals resemble white noise? If not, try to find another ARIMA model which fits better.

10.5 e. Forecast the next 24 months of data using your preferred model.

10.6 f. Compare the forecasts obtained using ETS().

11 12. For the same time series you used in the previous exercise, try using a non-seasonal model applied to the seasonally adjusted data obtained from STL. Compare the forecasts with those obtained in the previous exercise. Which do you think is the best approach?

12 13. For the Australian tourism data (from tourism):

12.1 a. Fit ARIMA models for each time series.

12.2 b. Produce forecasts of your fitted models.

12.3 c. Check the forecasts for the “Snowy Mountains” and “Melbourne” regions. Do they look reasonable?

13 14. For your retail time series (Exercise 5 above):

13.1 a. develop an appropriate seasonal ARIMA model;

13.2 b. compare the forecasts with those you obtained in earlier chapters;

13.3 c. Obtain up-to-date retail data from the ABS website (Cat 8501.0, Table 11), and compare your forecasts with the actual numbers. ## d. How good were the forecasts from the various models?

14 15. Consider the number of Snowshoe Hare furs traded by the Hudson Bay Company between 1845 and 1935 (data set pelt).

15 16. The population of Switzerland from 1960 to 2017 is in data set global_economy.

16 17 Before doing this exercise, you will need to install the Quandl package in R using

16.1 Select a time series from Quandl. Then copy its short URL and import the data using

y <- as_tsibble(Quandl::Quandl("?????"), index = Date)
(Replace ????? with the appropriate code.)

16.2 Plot graphs of the data, and try to identify an appropriate ARIMA model.

autoplot(ibmclose) + ylab("Close Price") + xlab("Year")

ibmA <- auto.arima(ibmclose)
ibmA
## Series: ibmclose 
## ARIMA(0,1,0) 
## 
## sigma^2 estimated as 52.62:  log likelihood=-1251.37
## AIC=2504.74   AICc=2504.75   BIC=2508.64

16.3 Do residual diagnostic checking of your ARIMA model. Are the residuals white noise?

checkresiduals(ibmA)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,0)
## Q* = 14.151, df = 10, p-value = 0.1662
## 
## Model df: 0.   Total lags used: 10

16.4 Use your chosen ARIMA model to forecast the next four years.

forecast(arima(ibmclose))
##     Point Forecast    Lo 80    Hi 80   Lo 95    Hi 95
## 370       478.4688 370.6839 586.2538 313.626 643.3117
## 371       478.4688 370.6839 586.2538 313.626 643.3117
## 372       478.4688 370.6839 586.2538 313.626 643.3117
## 373       478.4688 370.6839 586.2538 313.626 643.3117
## 374       478.4688 370.6839 586.2538 313.626 643.3117
## 375       478.4688 370.6839 586.2538 313.626 643.3117
## 376       478.4688 370.6839 586.2538 313.626 643.3117
## 377       478.4688 370.6839 586.2538 313.626 643.3117
## 378       478.4688 370.6839 586.2538 313.626 643.3117
## 379       478.4688 370.6839 586.2538 313.626 643.3117
plot(forecast(arima(ibmclose)))

16.5 Now try to identify an appropriate ETS model.

ibm2 <- ets(ibmclose, model="ZZZ", lambda="auto")
ibm2
## ETS(A,N,N) 
## 
## Call:
##  ets(y = ibmclose, model = "ZZZ", lambda = "auto") 
## 
##   Box-Cox transformation: lambda= 1.9999 
## 
##   Smoothing parameters:
##     alpha = 0.9999 
## 
##   Initial states:
##     l = 109276.6822 
## 
##   sigma:  3191.536
## 
##      AIC     AICc      BIC 
## 8139.453 8139.518 8151.185
autoplot(ibm2)

16.6 Do residual diagnostic checking of your ETS model. Are the residuals white noise?

checkresiduals(ibm2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,N,N)
## Q* = 16.112, df = 8, p-value = 0.04081
## 
## Model df: 2.   Total lags used: 10

16.7 Use your chosen ETS model to forecast the next four years.

ibm2F <- forecast.ets(ibm2)
ibm2F
##     Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 370       356.9995 345.3475 368.2831 339.0172 374.1185
## 371       356.9995 340.4051 372.8561 331.2843 380.9830
## 372       356.9995 336.5634 376.3275 325.2259 386.1678
## 373       356.9995 333.2903 379.2294 320.0292 390.4853
## 374       356.9995 330.3797 381.7678 315.3798 394.2500
## 375       356.9995 327.7260 384.0482 311.1167 397.6229
## 376       356.9995 325.2667 386.1334 307.1441 400.6995
## 377       356.9995 322.9607 388.0642 303.3998 403.5421
## 378       356.9995 320.7798 389.8690 299.8405 406.1938
## 379       356.9995 318.7033 391.5683 296.4346 408.6860
autoplot(ibm2F) + ylab("Stock Close") + xlab("Year")

16.8 Which of the two models do you prefer?

ETS(A,N,N)