library(fpp3)
library(seasonal)
library(urca)

Exercise 9.1

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

part a

Explain the differences among these figures. Do they all indicate that the data are white noise?

As the sample size increases, the autocorrection values fluctate closer to zero. The confidence intervals (blue dotted lines) gets more narrow as the sample size increases because more data gives more precise estimates . For all three ACF plots, the autocorrection values are randomly scattered (no pattern) within the confidence interval and close to zero. Thus they are all white noise.

part b

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The distance of the critical values from zero depends on the sample size. The confidence intervals (blue dotted lines) gets more narrow as the sample size increases because more data gives more precise estimates.

Exercise 9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

amazon <- gafa_stock %>%
  filter(Symbol == "AMZN")

amazon %>%
  autoplot(Close) + 
  ggtitle("Amazon Daily Closing Stock Prices")

The line plot above shows an overall upward trend of the the stock price from 2014 to 2018. Since it shows a time series with trend, then it is a non-stationary series.

amazon %>%
  ACF(Close) %>%
  autoplot() + 
  ggtitle("ACF Plot of Amazon Daily Closing Stock Prices")

## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.

The ACF plot shows ACF decreasing slowly. The textbook states that ACF will drop to zero very quickly for stationary time series. In addition, the autocorrection go beyonds the confidence bands. Thus, this shows a non-stationary series.

amazon %>%
  PACF(Close) %>%
  autoplot() + 
  ggtitle("PACF Plot of Amazon Daily Closing Stock Prices")

## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.

amazon %>%
  features(Close, unitroot_ndiffs)

## # A tibble: 1 × 2
##   Symbol ndiffs
##   <chr>   <int>
## 1 AMZN        1

The ACF plot shows high autocorection at lag 1 and its value are outside the confidence bands. This shows a non-stationary series. One difference is required to make this data stationary.

Exercise 9.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

part a

Turkish GDP from global_economy

turkey <- global_economy %>% 
  filter(Country == "Turkey")

turkey %>%
  autoplot(GDP) + 
  ggtitle("Turkey GDP")

The line plot above shows an overall upward trend of the GDP from 1960 to 2020. Since it shows a time series with trend, then it is a non-stationary series.

turkey %>%
  ACF(GDP) %>%
  autoplot() + 
  ggtitle("ACF Plot of Turkey GDP")

The ACF plot shows ACF decreasing slowly. The textbook states that ACF will drop to zero very quickly for stationary time series. Thus, this shows a non-stationary series.

lambda <- turkey %>% 
  features(GDP, features = guerrero) %>% 
  pull(lambda_guerrero)

turkey <- turkey %>%
  mutate(transformed_GDP = box_cox(GDP, lambda))

Since lambda is close to 0, we can perform log transformation on the data or just use the lambda.

turkey %>%
  autoplot(box_cox(GDP, lambda)) + 
  ggtitle("Box-Cox Transformed Turkish GDP")

turkey %>%
  autoplot(difference(transformed_GDP, lag = 1)) + 
  ggtitle("First Difference of Box-Cox Transformed Turkish GDP")

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

turkey %>%
  ACF(difference(transformed_GDP, lag = 1)) %>%
  autoplot() + 
  ggtitle("ACF Plot of First Order Difference in Box-Cox Transformed Turkish GDP")

After differencing once, the ACF drop quickly to 0. No need for second difference.

turkey %>%
  mutate(transformed_GDP = box_cox(GDP, lambda)) %>%
  features(transformed_GDP, unitroot_ndiffs)

## # A tibble: 1 × 2
##   Country ndiffs
##   <fct>    <int>
## 1 Turkey       1

A first difference is required to make this data stationary. Since this is annual data, no seasonal difference is needed.

part b

Accommodation takings in the state of Tasmania from aus_accommodation

tasmania <- aus_accommodation %>%
  filter(State == "Tasmania")

tasmania %>%
  autoplot(Takings) + 
  ggtitle("Quarterly Accommodation Takings in Tasmania")

The line plot above shows an overall upward trend of the quarterly accommodation takings from 2000 to 2015 with seasonality. Since it shows a time series with trend, then it is a non-stationary series.

tasmania %>%
  ACF(Takings) %>%
  autoplot() + 
  ggtitle("ACF Plot of Accommodation Takings in Tasmania")

The ACF plot shows all ACF bars are posiitve, large, and outside of confidance bands with slight decay. The textbook states that ACF will drop to zero very quickly for stationary time series. Thus, this shows a non-stationary series.

lambda <- tasmania %>% 
  features(Takings, features = guerrero)  %>% 
  pull(lambda_guerrero)

tasmania <- tasmania %>%
  mutate(transformed_Takings = box_cox(Takings, lambda))

Since lamda is close to 0, we can perform log transformation on the data or just use the lambda.

tasmania %>%
  autoplot(box_cox(Takings, lambda)) + 
  ggtitle("Box-Cox Transformed Accommodation Takings in Tasmania")

tasmania %>%
  autoplot(difference(transformed_Takings, lag = 4)) + 
  ggtitle("First Difference of Box-Cox Transformed Accommodation Takings in Tasmania")

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

tasmania %>%
  ACF(difference(transformed_Takings, lag = 4)) %>%
  autoplot() + 
  ggtitle("ACF plot of First Difference in Box-Cox Transformed Accommodation Takings in Tasmania")

The ACF looks more stationary with some lags still out of the confidance band. Autocorrection starts relatively high but decay quickly with values after lag 3 falling within the confidence bands.

tasmania %>%
  features(transformed_Takings, unitroot_nsdiffs)

## # A tibble: 1 × 2
##   State    nsdiffs
##   <chr>      <int>
## 1 Tasmania       1

tasmania %>%
  features(difference(transformed_Takings, lag = 4), unitroot_ndiffs)

## # A tibble: 1 × 2
##   State    ndiffs
##   <chr>     <int>
## 1 Tasmania      0

One seasonal difference is required to make this data stationary. No further regular difference is needed.

part c

Monthly sales from souvenirs

souvenirs %>%
  autoplot(Sales) + 
  ggtitle("Monthly Souvenirs sales")

The line plot above shows an overall upward trend in monthly sales with seasonality present. Since it shows a time series with trend, then it is a non-stationary series.

souvenirs %>%
  ACF(Sales) %>%
  autoplot() + 
  ggtitle("ACF Plot of Monthly Souvenirs sales")

The ACF plot shows several lags outside of confidance bands. A very high value at lag 12 suggest a seasonal component with period 12 is still present and not fully removed. Thus, this shows a non-stationary series.

lambda <- souvenirs %>% 
  features(Sales, features = guerrero) %>% 
  pull(lambda_guerrero)

souvenirs <- souvenirs %>%
  mutate(transformed_Sales = box_cox(Sales, lambda))

Since lambda is close to 0, we can perform log transformation on the data or just use the lambda.

souvenirs %>%
  autoplot(box_cox(Sales, lambda)) + 
  ggtitle("Box-Cox Transformed Monthly Souvenirs sales")

souvenirs %>%
  autoplot(difference(transformed_Sales, lag = 12)) + 
  ggtitle("First Difference of Box-Cox Transformed Monthly Souvenirs sales")

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

souvenirs %>%
  ACF(difference(transformed_Sales)) %>%
  autoplot() + 
  ggtitle("ACF plot of first difference in Box-Cox Transformed Monthly Souvenirs sales")

After differencing once, the ACF drop quickly to 0. No need for second difference.

souvenirs %>%
  features(transformed_Sales, unitroot_nsdiffs)

## # A tibble: 1 × 1
##   nsdiffs
##     <int>
## 1       1

souvenirs %>%
  features(difference(transformed_Sales, lag = 12), unitroot_ndiffs)

## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      0

One seasonal difference is required to make this data stationary. No further regular difference is needed.

Exercise 9.5

For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries %>%
  autoplot(Turnover) + 
  ggtitle("Retail Turnover")

The line plot above shows an overall upward trend of the monthly turnover. Since it shows a time series with trend, then it is a non-stationary series.

myseries %>%
  ACF(Turnover) %>%
  autoplot() + 
  ggtitle("ACF Plot of Monthly Retail Turnover")

The ACF plot shows all ACF bars are posiitve, large, and outside of confidance bands. The textbook states that ACF will drop to zero very quickly for stationary time series. Thus, this shows a non-stationary series.

lambda <- myseries %>% 
  features(Turnover, features = guerrero)  %>% 
  pull(lambda_guerrero)

myseries <- myseries %>%
  mutate(transformed_Turnover = box_cox(Turnover, lambda))

Since lamda is close to 0, we can perform log transformation on the data or just use the lambda.

myseries %>%
  autoplot(box_cox(Turnover, lambda)) + 
  ggtitle("Box-Cox Transformed Retail Turnover")

myseries %>%
  autoplot(difference(transformed_Turnover, lag = 12)) + 
  ggtitle("First Difference of Box-Cox Transformed Retail Turnover")

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

myseries %>%
  ACF(difference(transformed_Turnover, lag = 12)) %>%
  autoplot() + 
  ggtitle("ACF plot of First Difference in Box-Cox Transformed Retail Turnover")

The ACF looks more stationary with some lags still out of the confidance band. Autocorrection starts relatively high but decay quickly with values after lag 10 falling within the confidence bands.

myseries %>%
  features(transformed_Turnover, unitroot_nsdiffs)

## # A tibble: 1 × 3
##   State              Industry                                            nsdiffs
##   <chr>              <chr>                                                 <int>
## 1 Northern Territory Clothing, footwear and personal accessory retailing       1

myseries %>%
  features(difference(transformed_Turnover, lag = 12), unitroot_ndiffs)

## # A tibble: 1 × 3
##   State              Industry                                            ndiffs
##   <chr>              <chr>                                                <int>
## 1 Northern Territory Clothing, footwear and personal accessory retailing      0

Exercise 9.6

Simulate and plot some data from simple ARIMA models.

part a

Use the following R code to generate data from an AR(1) model with

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

part b

Produce a time plot for the series. How does the plot change as you change
ϕ1?

As ϕ1 gets closer to 1 or -1, it becomes more non-stationary as when ϕ1 = 1, the model becomes a Random Walk. As ϕ1 gets closer to 0, it becomes more stationary. When ϕ1 is negative, it exhibit more oscillation about the mean.

phi1 = 0.6

sim %>%
  autoplot() + 
  ggtitle("AR(1) model with phi1 = 0.6 ")

## Plot variable not specified, automatically selected `.vars = y`

sim %>%
  gg_tsdisplay(y, plot_type='partial') + 
  ggtitle("ACF Plot of AR(1) model with phi1 = 0.6")

## Warning: `gg_tsdisplay()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsdisplay()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

phi1 = 0

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim %>%
  autoplot() + 
  ggtitle("AR(1) model with phi1 = 0 ")

## Plot variable not specified, automatically selected `.vars = y`

sim %>%
  gg_tsdisplay(y, plot_type='partial') + 
  ggtitle("ACF Plot of AR(1) model with phi1 = 0")

phi1 = 0.3

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.3*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim %>%
  autoplot() + 
  ggtitle("AR(1) model with phi1 = 0.3 ")

## Plot variable not specified, automatically selected `.vars = y`

sim %>%
  gg_tsdisplay(y, plot_type='partial') + 
  ggtitle("ACF Plot of AR(1) model with phi1 = 0.3")

phi1 = 0.5

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.5*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim %>%
  autoplot() + 
  ggtitle("AR(1) model with phi1 = 0.5 ")

## Plot variable not specified, automatically selected `.vars = y`

sim %>%
  gg_tsdisplay(y, plot_type='partial') + 
  ggtitle("ACF Plot of AR(1) model with phi1 = 0.5")

phi1 = 0.8

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.8*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim %>%
  autoplot() + 
  ggtitle("AR(1) model with phi1 = 0.8 ")

## Plot variable not specified, automatically selected `.vars = y`

sim %>%
  gg_tsdisplay(y, plot_type='partial') + 
  ggtitle("ACF Plot of AR(1) model with phi1 = 0.8")

phi1 = 1

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 1*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim %>%
  autoplot() + 
  ggtitle("AR(1) model with phi1 = 1 ")

## Plot variable not specified, automatically selected `.vars = y`

sim %>%
  gg_tsdisplay(y, plot_type='partial') + 
  ggtitle("ACF Plot of AR(1) model with phi1 = 1")

phi1 = -0.3

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- -0.3*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim %>%
  autoplot() + 
  ggtitle("AR(1) model with phi1 = -0.3 ")

## Plot variable not specified, automatically selected `.vars = y`

sim %>%
  gg_tsdisplay(y, plot_type='partial') + 
  ggtitle("ACF Plot of AR(1) model with phi1 = -0.3")

phi1 = -0.5

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- -0.5*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim %>%
  autoplot() + 
  ggtitle("AR(1) model with phi1 = -0.5 ")

## Plot variable not specified, automatically selected `.vars = y`

sim %>%
  gg_tsdisplay(y, plot_type='partial') +
  ggtitle("ACF Plot of AR(1) model with phi1 = -0.5")

phi1 = -0.9

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- -0.9*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim %>%
  autoplot() + 
  ggtitle("AR(1) model with phi1 = -0.9 ")

## Plot variable not specified, automatically selected `.vars = y`

sim %>%
  gg_tsdisplay(y, plot_type='partial') +
  ggtitle("ACF Plot of AR(1) model with phi1 = -0.9")

part c

Write your own code to generate data from an MA(1) model

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <-  e[i] + 0.6*e[i-1]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

part d

Produce a time plot for the series. How does the plot change as you change
θ1?

When θ1 is negative, it exhibit more oscillation about the mean.

theta1 = 1

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <-  e[i] +1*e[i-1]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim %>%
  autoplot() + 
  ggtitle("MA(1) model with theta1 = 1")

## Plot variable not specified, automatically selected `.vars = y`

sim %>%
  gg_tsdisplay(y, plot_type='partial') + 
  ggtitle("MA(1) model with theta1 = 1 ")

theta1 = 0.8

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <-  e[i] +0.8*e[i-1]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim %>%
  autoplot() + 
  ggtitle("MA(1) model with theta1 = 0.8")

## Plot variable not specified, automatically selected `.vars = y`

sim %>%
  gg_tsdisplay(y, plot_type='partial') + 
  ggtitle("MA(1) model with theta1 = 0.8 ")

theta1 = 0.5

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <-  e[i] +0.5*e[i-1]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim %>%
  autoplot() + 
  ggtitle("MA(1) model with theta1 = 0.5")

## Plot variable not specified, automatically selected `.vars = y`

sim %>%
  gg_tsdisplay(y, plot_type='partial') + 
  ggtitle("MA(1) model with theta1 = 0.5 ")

theta1 = 0.3

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <-  e[i] +0.3*e[i-1]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim %>%
  autoplot() + 
  ggtitle("MA(1) model with theta1 = 0.3")

## Plot variable not specified, automatically selected `.vars = y`

sim %>%
  gg_tsdisplay(y, plot_type='partial') + 
  ggtitle("MA(1) model with theta1 = 0.3 ")

theta1 = 0

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <-  e[i] +0*e[i-1]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim %>%
  autoplot() + 
  ggtitle("MA(1) model with theta1 = 0")

## Plot variable not specified, automatically selected `.vars = y`

sim %>%
  gg_tsdisplay(y, plot_type='partial') + 
  ggtitle("MA(1) model with theta1 = 0 ")

theta1 = -0.3

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <-  e[i] - 0.3*e[i-1]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim %>%
  autoplot() + 
  ggtitle("MA(1) model with theta1 = -0.3")

## Plot variable not specified, automatically selected `.vars = y`

sim %>%
  gg_tsdisplay(y, plot_type='partial') + 
  ggtitle("MA(1) model with theta1 = -0.3 ")

theta1 = -0.5

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <-  e[i] -0.5*e[i-1]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim %>%
  autoplot() + 
  ggtitle("MA(1) model with theta1 = -0.5")

## Plot variable not specified, automatically selected `.vars = y`

sim %>%
  gg_tsdisplay(y, plot_type='partial') + 
  ggtitle("MA(1) model with theta1 = -0.5 ")

theta1 = -0.8

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <-  e[i] -0.8*e[i-1]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim %>%
  autoplot() + 
  ggtitle("MA(1) model with theta1 = -0.8")

## Plot variable not specified, automatically selected `.vars = y`

sim %>%
  gg_tsdisplay(y, plot_type='partial') + 
  ggtitle("MA(1) model with theta1 = -0.8 ")

theta1 = -1

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <-  e[i] -1*e[i-1]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim %>%
  autoplot() + 
  ggtitle("MA(1) model with theta1 = -1")

## Plot variable not specified, automatically selected `.vars = y`

sim %>%
  gg_tsdisplay(y, plot_type='partial') + 
  ggtitle("MA(1) model with theta1 = -1 ")

part e

Generate data from an ARMA(1,1) model with

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i] + 0.6*e[i-1]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim %>%
  autoplot() + 
  ggtitle("AR(1,1) model with theta1 = 0.6, theta2 = 0.6, and sigma2 = 1 ")

## Plot variable not specified, automatically selected `.vars = y`

part f

Generate data from an AR(2) model with
. (Note that these parameters will give a non-stationary series.)

y <- numeric(100)
e <- rnorm(100)
for(i in 3:100)
  y[i] <- -0.8*y[i-1] + e[i] + 0.3*y[i-2]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim %>%
  autoplot() + 
  ggtitle("AR(2) model with phi1 = -0.8, theta2 = 0.3, and sigma2 = 1 ")

## Plot variable not specified, automatically selected `.vars = y`

part g

Graph the latter two series and compare them.

The ARMA(1,1) model is stationary while the AR(2) exhibit non-stationary with its exposive growth at the end

Exercise 9.7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

part a

Use ARIMA() to find an appropriate ARIMA model. What model was selected? Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

aus_2011 <- aus_airpassengers %>%
  filter(Year < 2012)

aus_2011 %>%
  autoplot()

## Plot variable not specified, automatically selected `.vars = Passengers`

fit <- aus_2011 %>%
  model(ARIMA(Passengers))

report(fit)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8756
## s.e.   0.0722
## 
## sigma^2 estimated as 4.671:  log likelihood=-87.8
## AIC=179.61   AICc=179.93   BIC=182.99

ARIMA(0,2,1) was chosen as the model.

fit %>%
  gg_tsresiduals()

## Warning: `gg_tsresiduals()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsresiduals()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

The ACF plot shows all autocorrections are within the threshold limit so the residuals are behaving like white noises.

fit %>%
  forecast(h=10) %>%
  autoplot(aus_2011)

part b

Write the model in terms of the backshift operator.

(1-B)^2*y_t = (1-0.8756B)e_t

part c

Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

fit <- aus_2011 %>%
  model(arima021 = ARIMA(Passengers),
    arima010 = ARIMA(Passengers ~ pdq(0,1,0) ))

report(fit)

## Warning in report.mdl_df(fit): Model reporting is only supported for individual
## models, so a glance will be shown. To see the report for a specific model, use
## `select()` and `filter()` to identify a single model.

## # A tibble: 2 × 8
##   .model   sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
##   <chr>     <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
## 1 arima021   4.67   -87.8  180.  180.  183. <cpl [0]> <cpl [1]>
## 2 arima010   4.63   -89.1  182.  182.  186. <cpl [0]> <cpl [0]>

fit %>%
  select(arima010) %>%
  gg_tsresiduals()

fit %>%
  forecast(h=10) %>%
  autoplot(aus_2011, level = NULL)

The ARIMA(0,2,1) model has a higher forecast than the ARIMA(0,1,0) model with drift. The ARIMA(0,2,1) model has a lower AIC too.

part d

Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

fit <- aus_2011 %>%
  model(arima021 = ARIMA(Passengers),
    arima010 = ARIMA(Passengers ~ pdq(0,1,0)),
    arima212 = ARIMA(Passengers ~ pdq(2,1,2)))

report(fit)

## Warning in report.mdl_df(fit): Model reporting is only supported for individual
## models, so a glance will be shown. To see the report for a specific model, use
## `select()` and `filter()` to identify a single model.

## # A tibble: 3 × 8
##   .model   sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
##   <chr>     <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
## 1 arima021   4.67   -87.8  180.  180.  183. <cpl [0]> <cpl [1]>
## 2 arima010   4.63   -89.1  182.  182.  186. <cpl [0]> <cpl [0]>
## 3 arima212   4.75   -87.7  187.  190.  198. <cpl [2]> <cpl [2]>

fit %>%
  forecast(h=10) %>%
  autoplot(aus_2011, level = NULL)

part e

Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

fit <- aus_2011 %>%
  model(arima021 = ARIMA(Passengers),
    arima010 = ARIMA(Passengers ~ pdq(0,1,0)),
    arima212 = ARIMA(Passengers ~ pdq(2,1,2)),
    arima021_constant = ARIMA(Passengers ~ 1 + pdq(0,2,1) ))

## Warning: Model specification induces a quadratic or higher order polynomial trend. 
## This is generally discouraged, consider removing the constant or reducing the number of differences.

report(fit)

## Warning in report.mdl_df(fit): Model reporting is only supported for individual
## models, so a glance will be shown. To see the report for a specific model, use
## `select()` and `filter()` to identify a single model.

## # A tibble: 4 × 8
##   .model            sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
##   <chr>              <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
## 1 arima021            4.67   -87.8  180.  180.  183. <cpl [0]> <cpl [1]>
## 2 arima010            4.63   -89.1  182.  182.  186. <cpl [0]> <cpl [0]>
## 3 arima212            4.75   -87.7  187.  190.  198. <cpl [2]> <cpl [2]>
## 4 arima021_constant   4.09   -85.7  177.  178.  183. <cpl [0]> <cpl [1]>

fit %>%
  forecast(h=10) %>%
  autoplot(aus_2011, level = NULL)

It resulted a higher forecast than the previous models. However, there was a warning of that the “Model specification induces a quadratic or higher order polynomial trend”, which is usually not desired for modeling.

Exercise 9.8

For the United States GDP series (from global_economy):

part a

if necessary, find a suitable Box-Cox transformation for the data;

usa <- global_economy %>% 
  filter(Country == "United States")

usa %>%
  autoplot(GDP) + 
  ggtitle("US GDP")

The line plot above shows an overall upward trend of the GDP from 1960 to 2020. Since it shows a time series with trend, then it is a non-stationary series.

usa %>%
  ACF(GDP) %>%
  autoplot() + 
  ggtitle("ACF Plot of US GDP")

The ACF plot shows ACF decreasing slowly. The textbook states that ACF will drop to zero very quickly for stationary time series. Thus, this shows a non-stationary series.

lambda <- usa %>% 
  features(GDP, features = guerrero) %>% 
  pull(lambda_guerrero)

usa <- usa %>%
  mutate(transformed_GDP = box_cox(GDP, lambda))

Lambda = 0.28194

usa %>%
  autoplot(box_cox(GDP, lambda)) + 
  ggtitle("Box-Cox Transformed Turkish GDP")

part b

fit a suitable ARIMA model to the transformed data using ARIMA();

fit <- usa %>%
  model(ARIMA(transformed_GDP))

report(fit)

## Series: transformed_GDP 
## Model: ARIMA(1,1,0) w/ drift 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

ARIMA(1,1,0) with drift was chosen as the model.

fit %>%
  gg_tsresiduals()

The ACF plot shows all autocorrections are within the threshold limit so the residuals are behaving like white noises.

part c

try some other plausible models by experimenting with the orders chosen;

fit <- usa %>%
  model(arima110 = ARIMA(transformed_GDP~ pdq(1,1,0)),
        arima111 = ARIMA(transformed_GDP~ pdq(1,1,1)),
        arima112 = ARIMA(transformed_GDP~ pdq(1,1,2)),
    arima011 = ARIMA(transformed_GDP ~ pdq(0,1,1)),
    arima212 = ARIMA(transformed_GDP ~ pdq(2,1,2)),
    arima021 = ARIMA(transformed_GDP ~ pdq(0,2,1)),
    arima021_constant = ARIMA(transformed_GDP ~ 1+pdq(0,2,1))
    
    )

## Warning: Model specification induces a quadratic or higher order polynomial trend. 
## This is generally discouraged, consider removing the constant or reducing the number of differences.

report(fit)

## Warning in report.mdl_df(fit): Model reporting is only supported for individual
## models, so a glance will be shown. To see the report for a specific model, use
## `select()` and `filter()` to identify a single model.

## # A tibble: 7 × 9
##   Country       .model        sigma2 log_lik   AIC  AICc   BIC ar_roots ma_roots
##   <fct>         <chr>          <dbl>   <dbl> <dbl> <dbl> <dbl> <list>   <list>  
## 1 United States arima110       5479.   -325.  657.  657.  663. <cpl>    <cpl>   
## 2 United States arima111       5580.   -325.  659.  659.  667. <cpl>    <cpl>   
## 3 United States arima112       5630.   -325.  660.  661.  670. <cpl>    <cpl>   
## 4 United States arima011       5689.   -326.  659.  659.  665. <cpl>    <cpl>   
## 5 United States arima212       5734.   -325.  662.  664.  674. <cpl>    <cpl>   
## 6 United States arima021       6020.   -323.  650.  650.  654. <cpl>    <cpl>   
## 7 United States arima021_con…  6115.   -323.  652.  652.  658. <cpl>    <cpl>

part d

choose what you think is the best model and check the residual diagnostics;

The model with the lowest AIC or BIC is the best model. The ARIMA(0,2,1) is the best by far. This model contains second difference and 1 MA term.

fit %>%
  select(arima110) %>%
  gg_tsresiduals()

fit %>%
  select(arima011) %>%
  gg_tsresiduals()

fit %>%
  select(arima212) %>%
  gg_tsresiduals()

fit %>%
  select(arima021) %>%
  gg_tsresiduals()

fit %>%
  select(arima021_constant) %>%
  gg_tsresiduals()

part e

produce forecasts of your fitted model. Do the forecasts look reasonable?

fit <- usa %>%
  model(
    arima021 = ARIMA(transformed_GDP ~ pdq(0,2,1))
    
    )

fit %>%
    forecast(h=10)%>%
  autoplot(usa)

The forecast looks reasonable.

part f

compare the results with what you would obtain using ETS() (with no transformation).

fit <- usa %>%
  model(
    ETS = ETS(GDP)
    
    )

report(fit)

## Series: GDP 
## Model: ETS(M,A,N) 
##   Smoothing parameters:
##     alpha = 0.9990876 
##     beta  = 0.5011949 
## 
##   Initial states:
##          l[0]        b[0]
##  448093333334 64917355687
## 
##   sigma^2:  7e-04
## 
##      AIC     AICc      BIC 
## 3190.787 3191.941 3201.089

fit %>%
  gg_tsresiduals()

fit %>%
    forecast(h=10)%>%
  autoplot(usa)

The ETS model has a higher AIC than the previous models. It does forecast the overall increasing trend of the GDP It is expected that ARIMA model with box cox transformation can better handle non-stationary data with strong autocorrection.

DATA 624 Homework 6

Susanna Wong

2025-10-12

Exercise 9.1

part a

part b

Exercise 9.2

Exercise 9.3

part a

part b

part c

Exercise 9.5

Exercise 9.6

part a

part b

phi1 = 0.6

phi1 = 0

phi1 = 0.3

phi1 = 0.5

phi1 = 0.8

phi1 = 1

phi1 = -0.3

phi1 = -0.5

phi1 = -0.9

part c

part d

theta1 = 1

theta1 = 0.8

theta1 = 0.5

theta1 = 0.3

theta1 = 0

theta1 = -0.3

theta1 = -0.5

theta1 = -0.8

theta1 = -1

part e

part f

part g

Exercise 9.7

part a

part b

part c

part d

part e

Exercise 9.8

part a

part b

part c

part d

part e

part f