Load libraries

library(tidyverse)
library(fpp3)

Exercises 9.1

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers

  1. Explain the differences among these figures. Do they all indicate that the data are white noise?

In the first figure on the left it indicates the 36 numbers have a lag that are between .05 and -.05. In the second figure the lags of the 360 random numbers are lower than .05 and -.05. In the third figure the lags are very close to zero. All three figures show the data is white noise as they are below the dotted blue lines.

  1. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The critical values are different distances due to the size of the sample. As the number increases the critical values decreases. The autocorrelation is different in each figure but since it’s below the dotted blue noise they are alll white noise. The autocorrelations are closer to zero as the number random numbers increases.

Exercises 9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

gafa_stock %>%
  filter(Symbol == "AMZN") %>%
  gg_tsdisplay(Close, plot_type = "partial") +
  labs(title = "Amazon Closing Price")
## Warning: Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.

The time series plot shows an upward trend which indicates it is non-stationary. The ACF chart showing the lags are slowly decaying. For the PACF there are only few spikes.

Exercises 9.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

  1. Turkish GDP from global_economy.

Conducted a box-cox transformation to get lambda and find out the number of difference needed. The data shows it is non-stationary since it trends upwards and ACF is slowing going down. After the box-cox transformation and differencing it shows stationary data for the time series plot. ACF is no longer showing it is slowing decay and PACF is slowly getting closer to zero.

lambda <-global_economy %>%
  filter(Country == "Turkey") %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

global_economy %>%
  filter(Country == "Turkey") %>%
  features(box_cox(GDP,lambda), unitroot_ndiffs)
## # A tibble: 1 × 2
##   Country ndiffs
##   <fct>    <int>
## 1 Turkey       1
global_economy %>%
  filter(Country == "Turkey") %>%
  gg_tsdisplay(GDP, plot_type = "partial")+
  labs(title = "Turkey GDP")

global_economy %>%
  filter(Country == "Turkey") %>%
  gg_tsdisplay(difference(box_cox(GDP,lambda)), plot_type = "partial", lag_max = 20) +
  labs(title = "Box-Cox differencing for Turkey GDP")
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

  1. Accommodation takings in the state of Tasmania from aus_accommodation.

In the initial data the time series plot, ACF and PACF are showing indication of non-stationary data.Had to take a second differencing to remove the trend and added a lag 12 to receive the seasonality.

lambda_aus <- aus_accommodation %>%
  filter(State == "Tasmania") %>%
  features(Takings, features = guerrero) %>%
  pull(lambda_guerrero)

aus_accommodation %>%
  filter(State == "Tasmania") %>%
  features(box_cox(Takings,lambda_aus), unitroot_ndiffs)
## # A tibble: 1 × 2
##   State    ndiffs
##   <chr>     <int>
## 1 Tasmania      1
aus_accommodation %>%
  filter(State == "Tasmania") %>%
  gg_tsdisplay(Takings, plot_type = "partial")+
  labs(title = "Tasmania Takings")

aus_accommodation %>%
  filter(State == "Tasmania") %>%
  gg_tsdisplay(difference(difference(box_cox(Takings,lambda_aus)), lag = 12), plot_type = "partial", lag_max = 24) +
  labs(title = "Box-Cox differencing for Tasmania Takings")
## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_point()`).

c. Monthly sales from souvenirs

The data in the original data shows the data was non- stationary and after conducting the box-transformation and difference with lag 12 it looks like it was stationary.

lambda_sou <- souvenirs %>%
  features(Sales, features = guerrero) %>%
  pull(lambda_guerrero)

souvenirs %>%
  features(box_cox(Sales,lambda_sou), unitroot_ndiffs)
## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      1
souvenirs %>%
  gg_tsdisplay(Sales, plot_type = "partial")+
  labs(title = "Monthly Sales")

souvenirs %>%
  gg_tsdisplay(difference(box_cox(Sales,lambda_sou), lag = 12), plot_type = "partial", lag_max = 12) +
  labs(title = "Box-Cox differencing for Montly Sales")
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

Exercies 9.5

For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

Conducted a box-cox transformation and difference to get the data to show stationary data.

set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries %>%
  gg_tsdisplay(Turnover, plot_type = "partial")+
  labs(title = "Turnover data")

lambda_ser <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

myseries %>%
  features(box_cox(Turnover,lambda_ser), unitroot_ndiffs)
## # A tibble: 1 × 3
##   State              Industry                                            ndiffs
##   <chr>              <chr>                                                <int>
## 1 Northern Territory Clothing, footwear and personal accessory retailing      1
myseries %>%
  gg_tsdisplay(difference(box_cox(Turnover,lambda_ser), lag = 12), plot_type = "partial", lag_max = 24) +
  labs(title = "Box-Cox differencing Turnover")
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

Exercises 9.6

Simulate and plot some data from simple ARIMA models.

  1. Use the following R code to generate data from an AR(1) model with
    phi=0.6 and sigma2=1. The process starts with y1=0.
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

# Plot the simulated AR(1) time series
sim %>%
  autoplot(y) + 
  labs(title = "AR(1) model with phi = .06")

b. Produce a time plot for the series. How does the plot change as you change phi1?

As the plot phi changes to closer to 1 it will show a positive autocorrelation. As the number reaches 1 it will be a random walk.

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.9*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

# Plot the simulated AR(1) time series
sim %>%
  autoplot(y) + 
  labs(title = "AR(1) model with phi = .09")

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.3*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

# Plot the simulated AR(1) time series
sim %>%
  autoplot(y) + 
  labs(title = "AR(1) model with phi = .03")

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 1*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

# Plot the simulated AR(1) time series
sim %>%
  autoplot(y) + 
  labs(title = "AR(1) model with phi = 1")

c Write your own code to generate data from an MA(1) model with phi1=0.6 and sigma2 = 1.

for(i in 2:100)
  y[i] <- e[i] + 0.6 * e[i-1]

sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim %>%
  autoplot(y) + 
  labs(title = "AR(1) model with phi = .06 and sigma2 = 1")

Exercises 9.7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

  1. Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

The model selected was ARIMA(0,2,1) and the residuals looks like white noise.

aus_fit <-aus_airpassengers %>%
  filter(Year <= 2011) %>%
  model(ARIMA(Passengers))

report(aus_fit)
## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8756
## s.e.   0.0722
## 
## sigma^2 estimated as 4.671:  log likelihood=-87.8
## AIC=179.61   AICc=179.93   BIC=182.99
gg_tsresiduals(aus_fit)

aus_fit %>%
  forecast(h = 10) %>%
  autoplot(aus_airpassengers) + 
  labs(title = "Air Transport Passengers Australia from 1970-2011")

  1. Write the model in terms of the backshift operator. yt = -0.8756(e) t-1 +(e)t (1-B)^2 yt = (1-01 B)(e) t

Could not get the file to knit with the special characters.

  1. Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

The model in a is slightly better with an AICc of 179.93 while the one in this model has 182.48. The line is above the middle compared to a which is below the middle.

aus_fit2 <- aus_airpassengers %>%
  filter(Year <= 2011) %>%
  model(ARIMA(Passengers ~ pdq(0,1,0)))

report(aus_fit2)
## Series: Passengers 
## Model: ARIMA(0,1,0) w/ drift 
## 
## Coefficients:
##       constant
##         1.3669
## s.e.    0.3319
## 
## sigma^2 estimated as 4.629:  log likelihood=-89.08
## AIC=182.17   AICc=182.48   BIC=185.59
aus_fit2 %>%
  forecast(h = 10) %>%
  autoplot(aus_airpassengers) + 
  labs(title = "Air Transport Passengers Australia from 1970-2011 ARIMA (0,1,0) with drift")

  1. Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

This model is closer to the middle compared to a and c. The AICc is higher than the previous model in c. 

aus_fit3 <- aus_airpassengers %>%
  filter(Year <= 2011) %>%
  model(ARIMA(Passengers ~ pdq(2,1,2)))

report(aus_fit3)
## Series: Passengers 
## Model: ARIMA(2,1,2) w/ drift 
## 
## Coefficients:
##          ar1      ar2      ma1     ma2  constant
##       1.4694  -0.5103  -1.5736  0.6780    0.0650
## s.e.  0.3780   0.3558   0.3081  0.2688    0.0294
## 
## sigma^2 estimated as 4.748:  log likelihood=-87.74
## AIC=187.47   AICc=189.94   BIC=197.75
aus_fit3 %>%
  forecast(h = 10) %>%
  autoplot(aus_airpassengers) + 
  labs(title = "Air Transport Passengers Australia from 1970-2011 ARIMA (2,1,2) with drift")

  1. Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

The forecast model with a constant changes the forecast to lie closer to the mean.

aus_fit4 <- aus_airpassengers %>%
  filter(Year <= 2011) %>%
  model(ARIMA(Passengers ~ pdq(0,1,2) + 1))

report(aus_fit4)
## Series: Passengers 
## Model: ARIMA(0,1,2) w/ drift 
## 
## Coefficients:
##           ma1     ma2  constant
##       -0.0400  0.0509    1.3771
## s.e.   0.1576  0.1969    0.3370
## 
## sigma^2 estimated as 4.858:  log likelihood=-89.02
## AIC=186.05   AICc=187.16   BIC=192.9
aus_fit4 %>%
  forecast(h = 10) %>%
  autoplot(aus_airpassengers) + 
  labs(title = "Air Transport Passengers Australia from 1970-2011 ARIMA (0,1,2) with constant")

Exercises 9.8

For the United States GDP series (from global_economy):

  1. if necessary, find a suitable Box-Cox transformation for the data;

A box cox transformation is needed as there is an increasing trend on the data.

global_economy %>%
  filter(Code == "USA") %>%
  autoplot(GDP)

global_economy %>%
  filter(Code == "USA") %>%
  gg_tsdisplay(GDP, plot_type = 'partial')

lambda_eco <- global_economy %>%
  filter(Code == "USA") %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)
  1. fit a suitable ARIMA model to the transformed data using ARIMA();

The ARIMA model used was ARIMA(1,1,0).

eco_fit <- global_economy %>%
  filter(Code == "USA") %>%
  model(ARIMA(box_cox(GDP,lambda_eco)))

report(eco_fit)
## Series: GDP 
## Model: ARIMA(1,1,0) w/ drift 
## Transformation: box_cox(GDP, lambda_eco) 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78
eco_fit %>%
  forecast(h = 10) %>%
  autoplot(global_economy) + 
  labs(title = "ARIMA Box-Cox transformation for USA GDP")

  1. try some other plausible models by experimenting with the orders chosen;
#find the number of difference
global_economy %>%
  filter(Code == "USA") %>%
  features(box_cox(GDP, lambda), unitroot_ndiffs)
## # A tibble: 1 × 2
##   Country       ndiffs
##   <fct>          <int>
## 1 United States      2
eco_fit2 <- global_economy %>%
  filter(Code == "USA") %>%
  model(
    arima_111 = ARIMA(box_cox(GDP, lambda_eco) ~ pdq(1,1,1)),
    arima_212 = ARIMA(box_cox(GDP, lambda_eco) ~ pdq(2,1,2)),
    arima_012 = ARIMA(box_cox(GDP, lambda_eco) ~ pdq(0,1,2)))

report(eco_fit2)
## Warning in report.mdl_df(eco_fit2): Model reporting is only supported for
## individual models, so a glance will be shown. To see the report for a specific
## model, use `select()` and `filter()` to identify a single model.
## # A tibble: 3 × 9
##   Country       .model    sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
##   <fct>         <chr>      <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
## 1 United States arima_111  5580.   -325.  659.  659.  667. <cpl [1]> <cpl [1]>
## 2 United States arima_212  5734.   -325.  662.  664.  674. <cpl [2]> <cpl [2]>
## 3 United States arima_012  5634.   -326.  659.  660.  667. <cpl [0]> <cpl [2]>
  1. choose what you think is the best model and check the residual diagnostics;

The best model is the ARIMA(1,1,0) as the AICc is the lowest.

eco_fit3 <- global_economy %>%
  filter(Code == "USA") %>%
  model(arima_111 = ARIMA(box_cox(GDP, lambda_eco) ~ pdq(1,1,0)))

gg_tsresiduals(eco_fit3)

  1. produce forecasts of your fitted model. Do the forecasts look reasonable?

The forecast looks reasonable as the trend is continuing to move upwards.

eco_fit3 %>%
  forecast(h = 10) %>%
  autoplot(global_economy) + 
  labs(title = "ARIMA Box-Cox transformation for USA GDP")

  1. compare the results with what you would obtain using ETS() (with no transformation).

The ARIMA model is better as it has a lower RMSE.

ets <- global_economy %>%
  filter(Code == "USA") %>%
  model(ETS(GDP))

report(ets)
## Series: GDP 
## Model: ETS(M,A,N) 
##   Smoothing parameters:
##     alpha = 0.9990876 
##     beta  = 0.5011949 
## 
##   Initial states:
##          l[0]        b[0]
##  448093333334 64917355687
## 
##   sigma^2:  7e-04
## 
##      AIC     AICc      BIC 
## 3190.787 3191.941 3201.089
ets %>%
  forecast(h=10) %>%
  autoplot(global_economy)

accuracy(ets)
## # A tibble: 1 × 11
##   Country     .model .type      ME    RMSE     MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <fct>       <chr>  <chr>   <dbl>   <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 United Sta… ETS(G… Trai… 2.10e10 1.67e11 1.05e11 0.484  1.91 0.307 0.409 0.153
accuracy(eco_fit3)
## # A tibble: 1 × 11
##   Country  .model .type      ME    RMSE     MAE     MPE  MAPE  MASE RMSSE   ACF1
##   <fct>    <chr>  <chr>   <dbl>   <dbl>   <dbl>   <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 United … arima… Trai… -2.94e9 1.47e11 8.75e10 -0.0121  1.52 0.257 0.361 0.0232