HW6 Data 624 Predictive Analytics 2024 Spring Term

Question 9.1
Question 9.2
Question 9.3
Question 9.5
Question 9.6
Question 9.7
Question 9.8

Do the exercises 9.1, 9.2, 9.3, 9.5, 9.6, 9.7, 9.8 in Hyndman. (https://otexts.com/fpp3/)

library(fpp3)
library(tidyverse)
library(gridExtra)
library(urca)

Question 9.1

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise? The variations among these figures lie in the fact that the significant thresholds (the horizontal dashed lines) differ between the models. All of them indicate white noise because, for white noise series, we anticipate each autocorrelation to be close to zero, which these are.
Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise? Critical values are at varying distances from the zero mean because white noise data is expected to fall within the bounds of +-2/(sqrt(T)). As T increases (36, 360, 1000), the bounds or limits shrink, leading to smaller critical values.

Question 9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

gafa_stock_Amazon <- gafa_stock %>%
  filter(Symbol == "AMZN")

gafa_stock_Amazon %>%
  autoplot(Close)+
  labs(y="Close", title="Closing Prices for Amazon Stock")

p1 <- gafa_stock_Amazon %>%
  ACF(Close) %>%
  autoplot() +
  labs(title="ACF Closing Prices for Amazon Stock")

p2 <-gafa_stock_Amazon %>%
  PACF(Close) %>%
  autoplot() +
  labs(title="PACF Closing Prices for Amazon Stock")

grid.arrange(p1, p2,  nrow = 1)

In the graph depicting the closing prices for Amazon stock, a positive trend is evident. In the ACF graph, a gradual decrease of data towards zero is observed. The PACF graph exhibits a prominent initial spike. These observations indicate non-stationary series, implying that they need to be differenced to attain stationarity.

Question 9.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Turkish GDP from global_economy.

global_economy_Turkey <- global_economy %>%
  filter(Country == "Turkey")

global_economy_Turkey %>%
  autoplot(GDP)+
  labs(title="Turkish GDP")

lambda_Turkey <- global_economy_Turkey %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

global_economy_Turkey  %>%
  autoplot(box_cox(GDP, lambda_Turkey)) +
  labs(y = "",title = latex2exp::TeX(paste0(
         "Transformed Turkish GDP with $\\lambda = ",
         round(lambda_Turkey,2))))

global_economy_Turkey  %>%
  features(box_cox(GDP, lambda_Turkey), unitroot_ndiffs)

## # A tibble: 1 × 2
##   Country ndiffs
##   <fct>    <int>
## 1 Turkey       1

Accommodation takings in the state of Tasmania from aus_accommodation.

aus_accommodation_Tasmania <-  aus_accommodation%>%
  filter(State == "Tasmania")

aus_accommodation_Tasmania %>%
  autoplot(Takings)+
  labs(title="Accommodation Takings in the State of Tasmania")

lambda_Tasmania <- aus_accommodation_Tasmania %>%
  features(Takings, features = guerrero) %>%
  pull(lambda_guerrero)

aus_accommodation_Tasmania  %>%
  autoplot(box_cox(Takings, lambda_Tasmania)) +
  labs(y = "",title = latex2exp::TeX(paste0(
         "Transformed Accommodation Takings in the State of Tasmania with $\\lambda = ",
         round(lambda_Tasmania,2))))

aus_accommodation_Tasmania   %>%
  features(box_cox(Takings, lambda_Tasmania), unitroot_ndiffs)

## # A tibble: 1 × 2
##   State    ndiffs
##   <chr>     <int>
## 1 Tasmania      1

Monthly sales from souvenirs.

souvenirs_sales <- souvenirs 

souvenirs_sales %>%
  autoplot(Sales)+
  labs(title="Sales from Souvenirs")

lambda_sales <- souvenirs_sales %>%
  features(Sales, features = guerrero) %>%
  pull(lambda_guerrero)

souvenirs_sales  %>%
  autoplot(box_cox(Sales, lambda_sales)) +
  labs(y = "",title = latex2exp::TeX(paste0(
         "Transformed Sales from Souvenirs with $\\lambda = ",
         round(lambda_sales,2))))

souvenirs_sales   %>%
  features(box_cox(Sales, lambda_sales), unitroot_ndiffs)

## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      1

Question 9.5

For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

set.seed(12345678)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

autoplot(myseries,Turnover)

lambda_turnover <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

myseries  %>%
  autoplot(box_cox(Turnover, lambda_turnover)) +
  labs(y = "",title = latex2exp::TeX(paste0(
         "Transformed with $\\lambda = ",
         round(lambda_turnover,2))))

myseries %>%
  features(box_cox(Turnover, lambda_turnover), unitroot_ndiffs)

## # A tibble: 1 × 3
##   State              Industry                                            ndiffs
##   <chr>              <chr>                                                <int>
## 1 Northern Territory Clothing, footwear and personal accessory retailing      1

myseries %>%
  autoplot(difference(box_cox(Turnover, lambda_turnover), 12))

p3 <- myseries %>%
  ACF(difference(box_cox(Turnover, lambda_turnover), 12)) %>%
  autoplot() +
  labs(title="ACF")

p4 <- myseries %>%
  PACF(difference(box_cox(Turnover, lambda_turnover), 12)) %>%
  autoplot() +
  labs(title="PACF")

grid.arrange(p3, p4, nrow = 1)

Question 9.6

Simulate and plot some data from simple ARIMA models.

Use the following R code to generate data from an AR(1) model with phi_1 = 0.6 and sigma^2 = 1. The process starts with y_1 = 0.

set.seed(24)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

Produce a time plot for the series. How does the plot change as you change phi_1?

set.seed(24)
generate_tsibble <- function(phi) {
  y <- numeric(100)
  e <- rnorm(100)
  y[1] <- e[1]
  for(i in 2:100) {
    y[i] <- phi * y[i-1] + e[i]
  }
  tsibble(idx = seq_len(100), y = y, index = idx)
}

phi_1 <- generate_tsibble(0.6)
phi_2 <- generate_tsibble(-0.6)
phi_3 <- generate_tsibble(0.01)
phi_4 <- generate_tsibble(0.99)
phi_5 <- generate_tsibble(0)

set.seed(24)
phi_values <- c(0.6, -0.6, 0.01, 0.99, 0)
plot_list <- list()

for (i in seq_along(phi_values)) {
  plot_list[[i]] <- generate_tsibble(phi_values[i]) %>%
    autoplot(y) +
    labs(paste("Phi =", phi_values[i]))
}

p5 <- plot_list[[1]]
p6 <- plot_list[[2]]
p7 <- plot_list[[3]]
p8 <- plot_list[[4]]
p9 <- plot_list[[5]]


grid.arrange(p5, p6, p7, p8, p9, nrow = 2)

When phi is close to or equal to zero, the graph resembles white noise. As phi approaches one, the graph resembles a random walk, and when phi is less than zero, it tends to oscillate around the mean.

Write your own code to generate data from an MA(1) model with theta_1 = 0.6 and sigma^2 = 1.

set.seed(24)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- e[i] + 0.6 * e[i-1]
sim_ma <- tsibble(idx = seq_len(100), y = y, index = idx)

Produce a time plot for the series. How does the plot change as you change theta_1?

set.seed(24)
generate_tsibble2 <- function(theta) {
  y <- numeric(100)
  e <- rnorm(100)
  y[1] <- e[1]
  for(i in 2:100) {
    y[i] <- e[i] + theta * e[i-1]
  }
  tsibble(idx = seq_len(100), y = y, index = idx)
}

theta_1 <- generate_tsibble2(0.6)
theta_2 <- generate_tsibble2(-0.6)
theta_3 <- generate_tsibble2(0.01)
theta_4 <- generate_tsibble2(0.99)
theta_5 <- generate_tsibble2(0)

set.seed(24)
theta_values <- c(0.6, -0.6, 0.01, 0.99, 0)
plot_list2 <- list()

for (i in seq_along(theta_values)) {
  plot_list2[[i]] <- generate_tsibble2(theta_values[i]) %>%
    autoplot(y) +
    labs(paste("Theta =", theta_values[i]))
}

p10 <- plot_list2[[1]]
p11 <- plot_list2[[2]]
p12 <- plot_list2[[3]]
p13 <- plot_list2[[4]]
p14 <- plot_list2[[5]]


grid.arrange(p10, p11, p12, p13, p14, nrow = 2)

When theta is close to zero or zero itself, the MA(1) model behaves more like white noise. When theta_1 is positive, the plot might exhibit a dampened oscillatory pattern. When theta_1 is negative, the plot may show an amplified oscillatory behavior.

Generate data from an ARMA(1,1) model with phi_1 = 0.6, theta_1 = 0.6, and sigma^2 = 1.

set.seed(24)
y <- numeric(100)
e <- rnorm(100)
for(i in 3:100){
  y[i] <- 0.6*y[i-1] + 0.6*e[i-1] + e[i]
}

arme <- tsibble(idx = seq_len(100), y = y, index = idx)

Generate data from an AR(2) model with phi_1 = - 0.8 , phi_2 = 0.3 and sigma^2 = 1. (Note that these parameters will give a non-stationary series.)

set.seed(24)
y <- numeric(100)
e <- rnorm(100)
for(i in 3:100){
  y[i] <- -0.8*y[i-1] + 0.3*y[i-2] + e[i]
}

ar2 <- tsibble(idx = seq_len(100), y = y, index = idx)

Graph the latter two series and compare them.

autoplot(arme) +
  labs(title="ARMA(1,1) Model")

autoplot(ar2) +
  labs(title="AR(2) Model")

Question 9.7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

autoplot(aus_airpassengers)

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

fit_passengers <- aus_airpassengers %>%
  model(ARIMA(Passengers))

report(fit_passengers)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

fit_passengers %>% gg_tsresiduals()

fit_passengers %>%
  forecast(h = 10) %>%
  autoplot(aus_airpassengers)

b. Write the model in terms of the backshift operator. ARIMA(0,2,1)the backshift operator as: (1 - (theta)B^2)Y_t = (error term at time t)

Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

fit_passengers2 <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ 1 + pdq(0,1,0)))

report(fit_passengers2)

## Series: Passengers 
## Model: ARIMA(0,1,0) w/ drift 
## 
## Coefficients:
##       constant
##         1.4191
## s.e.    0.3014
## 
## sigma^2 estimated as 4.271:  log likelihood=-98.16
## AIC=200.31   AICc=200.59   BIC=203.97

fit_passengers2 %>%
  forecast(h = 10) %>%
  autoplot(aus_airpassengers)

The forecasts of part a and c appear quite similar, showing an upward trend.

Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

fit_passengers3 <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ 1 + pdq(2,1,2)))

report(fit_passengers3)

## Series: Passengers 
## Model: ARIMA(2,1,2) w/ drift 
## 
## Coefficients:
##           ar1      ar2     ma1     ma2  constant
##       -0.5518  -0.7327  0.5895  1.0000    3.2216
## s.e.   0.1684   0.1224  0.0916  0.1008    0.7224
## 
## sigma^2 estimated as 4.031:  log likelihood=-96.23
## AIC=204.46   AICc=206.61   BIC=215.43

fit_passengers3 %>%
  forecast(h = 10) %>%
  autoplot(aus_airpassengers)

This forecast seems to incorporate some curvature, unlike parts A and C where only a positive linear trend is present. Nonetheless, it remains very similar to the other forecasts.

fit_passengers4 <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))

report(fit_passengers4)

## Series: Passengers 
## Model: NULL model 
## NULL model

When the constant is removed, the model becomes non-stationary and null.

Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

fit_passengers5 <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))

## Warning: Model specification induces a quadratic or higher order polynomial trend. 
## This is generally discouraged, consider removing the constant or reducing the number of differences.

report(fit_passengers5)

## Series: Passengers 
## Model: ARIMA(0,2,1) w/ poly 
## 
## Coefficients:
##           ma1  constant
##       -1.0000    0.0571
## s.e.   0.0585    0.0213
## 
## sigma^2 estimated as 3.855:  log likelihood=-95.1
## AIC=196.21   AICc=196.79   BIC=201.63

R discourages forecasts with quadratic trends in ARIMA models because introducing higher-order polynomial trends can lead to overfitting and increased complexity.

Question 9.8

For the United States GDP series (from global_economy):

global_economy_United_States <- global_economy %>%
  filter(Country == "United States")

global_economy_United_States %>%
  autoplot(GDP)+
  labs(title="United States GDP")

if necessary, find a suitable Box-Cox transformation for the data; The Box-Cox transformation is unnecessary because the variance of the time series data remains relatively stable over time.
fit a suitable ARIMA model to the transformed data using ARIMA();

fit_United_States <- global_economy_United_States %>%
  model(ARIMA(GDP))

report(fit_United_States)

## Series: GDP 
## Model: ARIMA(0,2,2) 
## 
## Coefficients:
##           ma1      ma2
##       -0.4206  -0.3048
## s.e.   0.1197   0.1078
## 
## sigma^2 estimated as 2.615e+22:  log likelihood=-1524.08
## AIC=3054.15   AICc=3054.61   BIC=3060.23

try some other plausible models by experimenting with the orders chosen;

fit_models <- global_economy_United_States %>%
  model("ARIMA(0,2,2)" = ARIMA(GDP),
    "ARIMA(0,2,1)" = ARIMA(GDP ~ pdq(0, 2, 1)),
    "ARIMA(0,2,0)" = ARIMA(GDP ~ pdq(0, 2, 0)))

report(fit_models)

## # A tibble: 3 × 9
##   Country       .model        sigma2 log_lik   AIC  AICc   BIC ar_roots ma_roots
##   <fct>         <chr>          <dbl>   <dbl> <dbl> <dbl> <dbl> <list>   <list>  
## 1 United States ARIMA(0,2,2) 2.61e22  -1524. 3054. 3055. 3060. <cpl>    <cpl>   
## 2 United States ARIMA(0,2,1) 2.92e22  -1528. 3059. 3059. 3063. <cpl>    <cpl>   
## 3 United States ARIMA(0,2,0) 3.10e22  -1530. 3061. 3061. 3063. <cpl>    <cpl>

choose what you think is the best model and check the residual diagnostics;

fit_models %>%
  forecast(h = 10) %>%
  autoplot(global_economy_United_States) +
  ggtitle("US GDP Forecast")

fit_United_States  %>%
  gg_tsresiduals() +
  labs("ARIMA(0, 2, 2) Residual Plots")

produce forecasts of your fitted model. Do the forecasts look reasonable?

fit_United_States %>%
  forecast(h = 10) %>%
  autoplot(global_economy_United_States) +
  ggtitle("US GDP Forecast")

compare the results with what you would obtain using ETS() (with no transformation).

fit_compare <- global_economy_United_States %>%
  model(ARIMA = ARIMA(GDP),
    ETS = ETS(GDP))

glance(fit_compare)

## # A tibble: 2 × 12
##   Country   .model   sigma2 log_lik   AIC  AICc   BIC ar_roots ma_roots      MSE
##   <fct>     <chr>     <dbl>   <dbl> <dbl> <dbl> <dbl> <list>   <list>      <dbl>
## 1 United S… ARIMA  2.61e+22  -1524. 3054. 3055. 3060. <cpl>    <cpl>    NA      
## 2 United S… ETS    6.78e- 4  -1590. 3191. 3192. 3201. <NULL>   <NULL>    2.79e22
## # ℹ 2 more variables: AMSE <dbl>, MAE <dbl>

fit_compare %>%
  forecast(h = 10) %>%
  autoplot(global_economy_United_States) +
  ggtitle("US GDP Forecast")