DATA624 - Homework 6

Packages:

library(fpp3)
library(RColorBrewer)
library(knitr)
library(pracma)
library(cowplot)

Exercise 9.1:

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise?

The differences in these figures can be explained by the increasing sample size (n). When n = 36, the sample size is relatively small, so the absolute value of autocorrelation is higher in the leftmost figure than in the other figures. As n increases, the absolute value of the autocorrelation gets smaller and ultimately converges toward 0. All of these figures do indicate that the data are white noise, as even in the leftmost plot, there are no spikes outside the blue bounds of the ACF plot.

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The critical values are at different distances from the mean of zero because when n is relatively small, i.e. 36, the correlation between any two of these values at any lag varies more than when n is relatively large, i.e. 360 or 1,000. The autocorrelation is measured as an average, so when n is relatively small, the denominator is relatively small, and even uncorrelated data, such as white noise, have larger absolute values of autocorrelation at any lag. As n increases, the relatively larger denominator stabilizes the variation in correlation. As such, the blue bounds of the ACF plot will be wider for small n than large n because it takes a smaller difference in the absolute value of autocorrelation to generate the abnormal spikes we need to see to distinguish autocorrelated data from white noise.

Exercise 9.2:

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

data(gafa_stock)
keep <- c("Symbol", "Date", "Close")
amazon <- gafa_stock |>
    select(all_of(keep)) |>
    filter(Symbol == "AMZN") |>
    mutate(day = row_number()) |>
    update_tsibble(index = day, regular = TRUE)
p1 <- amazon |>
    autoplot(Close) +
    labs(title = "Amazon daily closing stock price")
p2a <- amazon |>
    ACF(Close) |>
    autoplot() +
    labs(subtitle = "ACF: Amazon daily closing stock price")
p2b <- amazon |>
    PACF(Close) |>
    autoplot() +
    labs(subtitle = "PACF: Amazon daily closing stock price")
p1

The time plot of the Amazon daily closing stock price demonstrates the first piece of evidence that the data are non-stationary: there is a clear upward trend visible in the data.

p2 <- plot_grid(p2a, p2b)
p2

The ACF plot of the Amazon daily closing stock price shows two more pieces of evidence indicating the data are non-stationary: the ACF decreases very slowly for this time series, and the value of \(r_1\) is large and positive.

If the PACF provides additional evidence the data are non-stationary, other than confirming the value of \(r_1\) is large and positive, it’s unclear to us. We thought analyzing the PACF was only useful for selecting values of p or q for ARIMA models for stationary data (i.e. non-stationary data that has already been differenced).

Exercise 9.3:

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Turkish GDP from global_economy

keep <- c("Country", "Year", "GDP")
turkey_gdp <- global_economy |>
    select(all_of(keep)) |>
    filter(Country == "Turkey")
p3 <- turkey_gdp |>
    autoplot(GDP) +
    labs(title = "Annual Turkish GDP in USD")
p3

lam <- turkey_gdp |>
    features(GDP, features = guerrero) |>
    pull(lambda_guerrero)
lam <- round(lam, 2)

The ideal lambda proposed for a Box-Cox transformation is \(\approx 0.16\), so we will round down and use a log transformation here.

turkey_gdp <- turkey_gdp |>
    mutate(Log_GDP = log(GDP))
p4 <- turkey_gdp |>
    autoplot(Log_GDP) +
    labs(title = "Annual Turkish log(GDP) in USD")
p4

There is no evidence of seasonality in the plot of log(GDP), but we will confirm the appropriate number of seasonal differences is zero. Then we will determine the appropriate number of first differences for the data.

nsdiffs <- turkey_gdp |>
    features(Log_GDP, unitroot_nsdiffs) |>
    select(-Country)
colnames(nsdiffs) <- "Appropriate N: Seasonal Differences"
ndiffs <- turkey_gdp |>
    features(Log_GDP, unitroot_ndiffs) |>
    select(-Country)
colnames(ndiffs) <- "Appropriate N: First Differences"
diffs <- cbind(nsdiffs, ndiffs)
kable(diffs, format = "simple")

Appropriate N: Seasonal Differences	Appropriate N: First Differences
0	1

The appropriate number of seasonal differences is in fact zero, and the appropriate number of first differences is one. Now we can difference the data and confirm we’ve made it stationary:

p5 <- turkey_gdp |>
    ACF(difference(Log_GDP)) |>
    autoplot() +
    labs(subtitle = "ACF: Annual Changes in Turkish log(GDP) in USD")
p5

The ACF of the differenced log(GDP) data looks just like that of a white noise series, as there are no spikes outside the blue bounds.

turkey_gdp |>
    mutate(diff_Log_GDP = difference(Log_GDP)) |>
    features(diff_Log_GDP, ljung_box, lag = 10)

## # A tibble: 1 × 3
##   Country lb_stat lb_pvalue
##   <fct>     <dbl>     <dbl>
## 1 Turkey     4.12     0.942

And the large p-value from the Ljung-Box test confirms the differenced log(GDP) data are stationary.

Accommodation takings in the state of Tasmania from aus_accommodation

data(aus_accommodation)
keep <- c("Date", "State", "Takings")
tasmania_takings <- aus_accommodation |>
    select(all_of(keep)) |>
    filter(State == "Tasmania")
p6 <- tasmania_takings |>
    autoplot(Takings) +
    labs(title = "Quarterly Takings in Tasmania")
p6

lam <- tasmania_takings |>
    features(Takings, features = guerrero) |>
    pull(lambda_guerrero)
lam <- round(lam, 3)

The ideal lambda proposed for a Box-Cox transformation is \(\approx 0.002\), so we will round down and use a log transformation again.

tasmania_takings <- tasmania_takings |>
    mutate(Log_Takings = log(Takings))
p7 <- tasmania_takings |>
    autoplot(Log_Takings) +
    labs(title = "Quarterly log(Takings) in Tasmania")
p7

There is strong evidence of seasonality in the plot of log(Takings), so we will determine the appropriate number of seasonal differences, then the appropriate number of first differences for the data.

nsdiffs <- tasmania_takings |>
    features(Log_Takings, unitroot_nsdiffs) |>
    select(-State)
colnames(nsdiffs) <- "Appropriate N: Seasonal Differences"
ndiffs <- tasmania_takings |>
    features(Log_Takings, unitroot_ndiffs) |>
    select(-State)
colnames(ndiffs) <- "Appropriate N: First Differences"
diffs <- cbind(nsdiffs, ndiffs)
kable(diffs, format = "simple")

Appropriate N: Seasonal Differences	Appropriate N: First Differences
1	1

The appropriate number of seasonal differences is one, and the appropriate number of first differences is one. Now we can doubly difference the data and confirm we’ve made it stationary:

p8 <- tasmania_takings |>
    transmute(
        `Annual Change log(Takings)` = difference(Log_Takings, 4),
        `Doubly Diff. log(Takings)` = difference(difference(
            Log_Takings, 4), 1)) |>
    pivot_longer(-Date, names_to="Type", values_to="Log_Takings") |>
    mutate(Type = factor(Type, levels = c("Annual Change log(Takings)",
                                          "Doubly Diff. log(Takings)"))) |>
    ggplot(aes(x = Date, y = Log_Takings)) +
    geom_line() +
    facet_grid(vars(Type), scales = "free_y") +
    labs(title = "Tasmanian Takings", y = NULL)
p8

The data now appear stationary.

Monthly sales from souvenirs

data(souvenirs)
p9 <- souvenirs |>
    autoplot(Sales) +
    labs(title = "Monthly Souvenir Sales")
p9

lam <- souvenirs |>
    features(Sales, features = guerrero) |>
    pull(lambda_guerrero)
lam <- round(lam, 3)

The ideal lambda proposed for a Box-Cox transformation is again \(\approx 0.002\), so we will round down and use a log transformation yet again.

souvenirs <- souvenirs |>
    mutate(Log_Sales = log(Sales))
p10 <- souvenirs |>
    autoplot(Log_Sales) +
    labs(title = "Monthly Souvenir log(Sales)")
p10

There is strong evidence of seasonality in the plot of log(Sales), so we will determine the appropriate number of seasonal differences, then the appropriate number of first differences for the data.

nsdiffs <- souvenirs |>
    features(Log_Sales, unitroot_nsdiffs)
colnames(nsdiffs) <- "Appropriate N: Seasonal Differences"
ndiffs <- souvenirs |>
    features(Log_Sales, unitroot_ndiffs)
colnames(ndiffs) <- "Appropriate N: First Differences"
diffs <- cbind(nsdiffs, ndiffs)
kable(diffs, format = "simple")

Appropriate N: Seasonal Differences	Appropriate N: First Differences
1	1

The appropriate number of seasonal differences is one, and the appropriate number of first differences is one. Now we can doubly difference the data and confirm we’ve made it stationary:

p11 <- souvenirs |>
    transmute(
        `Annual Change log(Sales)` = difference(Log_Sales, 12),
        `Doubly Diff. log(Sales)` = difference(difference(
            Log_Sales, 12), 1)) |>
    pivot_longer(-Month, names_to="Type", values_to="Log_Sales") |>
    mutate(Type = factor(Type, levels = c("Annual Change log(Sales)",
                                          "Doubly Diff. log(Sales)"))) |>
    ggplot(aes(x = Month, y = Log_Sales)) +
    geom_line() +
    facet_grid(vars(Type), scales = "free_y") +
    labs(title = "Souvenir Sales", y = NULL)
p11

The data now appear stationary.

Exercise 9.5:

For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

data(aus_retail)
set.seed(1221)
remove <- c("State", "Industry", "Series ID")
my_series <- aus_retail |>
    filter(`Series ID` == sample(aus_retail$`Series ID`, 1)) |>
    select(-all_of(remove))
p12 <- my_series |>
    autoplot(Turnover)
p12

lam <- my_series |>
    features(Turnover, features = guerrero) |>
    pull(lambda_guerrero)
lam <- round(lam, 2)

The ideal lambda proposed for a Box-Cox transformation is \(\approx 0.26\), so we will round down and use a fourth root transformation.

my_series <- my_series |>
    mutate(Turnover_4thRt = nthroot(Turnover, 4))
p13 <- my_series |>
    autoplot(Turnover_4thRt) +
    labs(title = "My Series: Monthly Turnover (Fourth Root)")
p13

We will determine the appropriate number of seasonal differences for the transformed data, then the appropriate number of first differences for the transformed data.

nsdiffs <- my_series |>
    features(Turnover_4thRt, unitroot_nsdiffs)
colnames(nsdiffs) <- "Appropriate N: Seasonal Differences"
ndiffs <- my_series |>
    features(Turnover_4thRt, unitroot_ndiffs)
colnames(ndiffs) <- "Appropriate N: First Differences"
diffs <- cbind(nsdiffs, ndiffs)
kable(diffs, format = "simple")

Appropriate N: Seasonal Differences	Appropriate N: First Differences
1	1

The appropriate number of seasonal differences is one, and the appropriate number of first differences is one. Now we can doubly difference the data and confirm we’ve made it stationary:

p14 <- my_series |>
    transmute(
        `Annual Change Turnover^1/4` = difference(Turnover_4thRt, 12),
        `Doubly Diff. Turnover^1/4` = difference(difference(
            Turnover_4thRt, 12), 1)) |>
    pivot_longer(-Month, names_to="Type", values_to="Turnover_4thRt") |>
    mutate(Type = factor(Type, levels = c("Annual Change Turnover^1/4",
                                          "Doubly Diff. Turnover^1/4"))) |>
    ggplot(aes(x = Month, y = Turnover_4thRt)) +
    geom_line() +
    facet_grid(vars(Type), scales = "free_y") +
    labs(title = "My Series: Turnover", y = NULL)
p14

The data now appear stationary.

Exercise 9.6:

Simulate and plot some data from simple ARIMA models.

Use the following R code to generate data from an AR(1) model with \(\phi_1 = 0.6\) and \(\sigma^2 = 1\). The process starts with \(y_1 = 0\).

y <- numeric(100)
e <- rnorm(100)
for (i in 2:100){
    y[i] <- (0.6 * y[i-1]) + e[i]
}
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

Produce a time plot for the series. How does the plot change as you change \(\phi_1\)?

p15 <- sim |>
    autoplot(y) +
    labs(title = "Sim 1:")
p15

phis = seq(from = -0.9, to = 0.9, by = 0.1)
sim_df <- as.data.frame(matrix(ncol = 2, nrow = 0))
colnames(sim_df) <- c("phi_val", "y")
for (j in 1:length(phis)){
    y <- numeric(100)
    e <- rnorm(100)
    phi <- phis[j]
    phi_val <- (y + 1) * phi
    for (i in 2:100){
        y[i] <- (phi * y[i-1]) + e[i]
    }
    idx <- seq_len(100)
    new_rows <- cbind(idx, phi_val, y)
    sim_df <- rbind(sim_df, new_rows)
}
sim_ts <- sim_df |>
    as_tsibble(index = idx, key = phi_val)
p16 <- sim_ts |>
    autoplot(y) +
    facet_wrap(~ phi_val, ncol = 5) +
    theme_minimal() + 
    theme(legend.position = "none") +
    labs(title = expression(paste("Repeated Simulations with Various ", phi[1],
    " Values")))
p16

Changing \(\phi_1\) from \(-0.9\) to \(0.9\) in intervals of \(0.1\) results in different time series patterns. For all \(-0.9 \le \phi_1 < 0\), the time series patterns frequently oscillate around the mean. The absolute value of the back and forth swings is the largest, and the swings are most frequent, when \(\phi_1\) is smallest (i.e. \(-0.9\)). For all \(0.9 \ge \phi_1 \ge 0\), the time series patterns no longer oscillate around the mean, and changes in these different patterns become less frequent. Because \(\phi_1 < 1\) for all values of \(\phi_1\) here, the most recent observations always have higher weight than observations from the more distant past.

Write your own code to generate data from an MA(1) model with \(\theta_1 = 0.6\) and \(\sigma^2 = 1\).

y <- numeric(100)
e <- rnorm(100)
for (i in 2:100){
    y[i] <- (0.6 * e[i-1]) + e[i]
}
sim2 <- tsibble(idx = seq_len(100), y = y, index = idx)

Produce a time plot for the series. How does the plot change as you change \(\theta_1\)?

p17 <- sim2 |>
    autoplot(y) +
    labs(title = "Sim 2:")
p17

thetas = seq(from = -0.9, to = 0.9, by = 0.1)
sim_df2 <- as.data.frame(matrix(ncol = 2, nrow = 0))
colnames(sim_df2) <- c("theta_val", "y")
for (j in 1:length(thetas)){
    y <- numeric(100)
    e <- rnorm(100)
    theta <- thetas[j]
    theta_val <- (y + 1) * theta
    for (i in 2:100){
        y[i] <- (theta * e[i-1]) + e[i]
    }
    idx <- seq_len(100)
    new_rows <- cbind(idx, theta_val, y)
    sim_df2 <- rbind(sim_df2, new_rows)
}
sim_ts2 <- sim_df2 |>
    as_tsibble(index = idx, key = theta_val)
p18 <- sim_ts2 |>
    autoplot(y) +
    facet_wrap(~ theta_val, ncol = 5) +
    theme_minimal() + 
    theme(legend.position = "none") +
    labs(title = expression(paste("Repeated Simulations with Various ", theta[1],
    " Values")))
p18

Changing \(\theta_1\) from \(-0.9\) to \(0.9\) in intervals of \(0.1\) results in different time series patterns, just like changing \(\phi_1\) did. The smaller the value of \(\theta_1\), the more frequently the data oscillate back and forth around the mean. Because \(\theta_1 < 1\) for all values of \(\theta_1\) here, the most recent observations again always have higher weight than observations from the more distant past.

Generate data from an ARMA(1,1) model with \(\phi_1 = 0.6\), \(\theta_1 = 0.6\), and \(\sigma^2 = 1\).

y <- numeric(100)
e <- rnorm(100)
for (i in 2:100){
    y[i] <- (0.6 * y[i-1]) + (0.6 * e[i-1]) + e[i]
}
sim3 <- tsibble(idx = seq_len(100), y = y, index = idx)

Generate data from an AR(2) model with \(\phi_1 = -0.8\), \(\phi_2 = 0.3\), and \(\sigma^2 = 1\). (Note that these parameters will give a non-stationary series.)

y <- numeric(100)
e <- rnorm(100)
for (i in 3:100){
    y[i] <- (-0.8 * y[i-1]) + (0.3 * y[i-2]) + e[i]
}
sim4 <- tsibble(idx = seq_len(100), y = y, index = idx)

Graph the latter two series and compare them.

p19a <- sim3 |>
    autoplot(y) +
    labs(title = "Sim 3:")
p19b <- sim4 |>
    autoplot(y) +
    labs(title = "Sim 4:")
p19 <- plot_grid(p19a, p19b)
p19

The ARMA(1,1) model (Sim 3) is stationary. Because \(\phi_1 = \theta_1 < 1\) here, the most recent observations always have higher weight than observations from the more distant past. The AR(2) model (Sim 4) is non-stationary. Because \(\phi_1 < 0\), we see oscillation around the mean. The swings back and forth get larger and larger.

Exercise 9.7:

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

Use ARIMA() to find an appropriate ARIMA model. What model was selected? Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

data(aus_airpassengers)
p20 <- aus_airpassengers |>
    autoplot(Passengers)
p20

lam <- aus_airpassengers |>
    features(Passengers, features = guerrero) |>
    pull(lambda_guerrero)
lam <- round(lam, 2)

With a proposed lambda transformation of \(-0.18\), a log transformation is probably necessary, but we will skip it so the following questions are less complicated to answer as written.

cols <- c("stepwise", "search")
fit <- aus_airpassengers |>
    model(stepwise = ARIMA(Passengers),
          search = ARIMA(Passengers, stepwise=FALSE))
pivot_fit <- fit |>
    pivot_longer(cols = all_of(cols), names_to = "Model Name",
                 values_to = "Orders")
kable(pivot_fit, format = "simple")

Model Name	Orders
stepwise	<ARIMA(0,2,1)>
search	<ARIMA(0,2,1)>

glance_fit <- glance(fit) |>
    arrange(AICc) |>
    select(.model:BIC)
kable(glance_fit, format = "simple")

.model	sigma2	log_lik	AIC	AICc	BIC
stepwise	4.307764	-97.01896	198.0379	198.3236	201.6512
search	4.307764	-97.01896	198.0379	198.3236	201.6512

Both the default stepwise ARIMA() procedure and the more extensive search procedure selected the same model: ARIMA(0,2,1).

We check that the residuals look like white noise.

fit |>
    select(stepwise) |>
    gg_tsresiduals()

The residuals do in fact look like white noise.

We plot forecasts for the next 10 periods.

p22a <- fit |>
    forecast(h=10) |>
    filter(.model == "stepwise") |>
    autoplot(aus_airpassengers) +
    labs(subtitle = "AUS Air Passengers: ARIMA(0,2,1) Forecasts")
p22a

Write the model in terms of the backshift operator.

\((1 - B)^2y_t = (1 + \theta_1B)\epsilon_t\)

Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

fit <- aus_airpassengers |>
    model(arima010c = ARIMA(Passengers ~ 1 + pdq(0,1,0)))
p22b <- fit |>
    forecast(h=10) |>
    filter(.model == "arima010c") |>
    autoplot(aus_airpassengers) +
    labs(subtitle = "AUS Air Passengers: ARIMA(0,1,0) w/ Drift Forecasts")
p22b

The forecasts from the ARIMA(0,1,0) w/ drift model are very similar to the forecasts from the ARIMA(0,2,1) forecasts. Forecasts from the former are slightly less optimistic. The prediction intervals in the latter model are wider than the prediction intervals in the former.

Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

fit <- aus_airpassengers |>
    model(arima212c = ARIMA(Passengers ~ 1 + pdq(2, 1, 2)),
          arima212 = ARIMA(Passengers ~ 0 + pdq(2, 1, 2)),
          arima021c = ARIMA(Passengers ~ 1 + pdq(0, 2, 1)))
p23 <- fit |>
    forecast(h=10) |>
    filter(.model == "arima212c") |>
    autoplot(aus_airpassengers) +
    labs(subtitle = "AUS Air Passengers: ARIMA(2,1,2) w/ Drift Forecasts")
p23

The forecasts aren’t very different, but they and the prediction intervals wobble.

p24 <- fit |>
    forecast(h=10) |>
    filter(.model == "arima212") |>
    autoplot(aus_airpassengers) +
    labs(subtitle = "AUS Air Passengers: ARIMA(2,1,2) Forecasts")
p24

Removing the constant has resulted in a NULL model for which forecasts can’t be produced/plotted.

Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

p25 <- fit |>
    forecast(h=10) |>
    filter(.model == "arima021c") |>
    autoplot(aus_airpassengers) +
    labs(subtitle = "AUS Air Passengers: ARIMA(0,2,1) w/ Drift Forecasts")
p25

The forecasts become more optimistic. We’ve been warned that including a constant and an order of differencing \(\ge 2\) is dangerous when forecasting.

Exercise 9.8:

For the United States GDP series (from global_economy):

If necessary, find a suitable Box-Cox transformation for the data;

keep <- c("Country", "Year", "GDP")
us_gdp <- global_economy |>
    select(all_of(keep)) |>
    filter(Country == "United States")
p26 <- us_gdp |>
    autoplot(GDP) +
    labs(title = "Annual United States GDP in USD")
p26

lam <- us_gdp |>
    features(GDP, features = guerrero) |>
    pull(lambda_guerrero)
lam <- round(lam, 2)

With a proposed lambda of \(0.28\), we round down to \(0.25\), a fourth root transformation.

Fit a suitable ARIMA model to the transformed data using ARIMA();

lam <- 0.25
fit1 <- us_gdp |>
    model(stepwise = ARIMA(box_cox(GDP, lam)),
          search = ARIMA(box_cox(GDP, lam), stepwise = FALSE))
pivot_fit <- fit1 |>
    pivot_longer(-Country, names_to = "Model Name",
                 values_to = "Orders")
kable(pivot_fit, format = "simple")

Country	Model Name	Orders
United States	stepwise	<ARIMA(1,1,0) w/ drift>
United States	search	<ARIMA(1,1,0) w/ drift>

glance_fit <- glance(fit1) |>
    arrange(AICc) |>
    select(.model:BIC)
kable(glance_fit, format = "simple")

.model	sigma2	log_lik	AIC	AICc	BIC
stepwise	839.3346	-271.8562	549.7123	550.1652	555.8415
search	839.3346	-271.8562	549.7123	550.1652	555.8415

Try some other plausible models by experimenting with the orders chosen;

fit2 <- us_gdp |>
    model(arima011 = ARIMA(box_cox(GDP, lam) ~ 1 + pdq(0,1,1)),
          arima211 = ARIMA(box_cox(GDP, lam) ~ 1 + pdq(2,1,1)))
pivot_fit <- fit2 |>
    pivot_longer(-Country, names_to = "Model Name",
                 values_to = "Orders")
kable(pivot_fit, format = "simple")

Country	Model Name	Orders
United States	arima011	<ARIMA(0,1,1) w/ drift>
United States	arima211	<ARIMA(2,1,1) w/ drift>

glance_fit <- glance(fit2) |>
    arrange(AICc) |>
    select(.model:BIC)
kable(glance_fit, format = "simple")

.model	sigma2	log_lik	AIC	AICc	BIC
arima011	876.2163	-273.0490	552.0979	552.5508	558.2271
arima211	861.4592	-271.5637	553.1274	554.3039	563.3427

Choose what you think is the best model and check the residual diagnostics;

The stepwise model produced by ARIMA() had the lowest AICc (identical to the model produced by the more exhaustive search procedure), so that model is the best. We check the residual diagnostics.

fit1 |>
    select(stepwise) |>
    gg_tsresiduals()

The residuals look like white noise.

Produce forecasts of your fitted model. Do the forecasts look reasonable?

p27 <- fit1 |>
    forecast(h=10) |>
    filter(.model == "stepwise") |>
    autoplot(us_gdp) +
    labs(subtitle = "US GDP: ARIMA(1,1,0) w/ Drift Forecasts")
p27

Yes, the forecasts look reasonable.

Compare the results with what you would obtain using ETS() (with no transformation).

compare <- us_gdp |> 
    slice(-n()) |>
    stretch_tsibble(.init = 10) |>
    model(ETS(GDP),
          ARIMA(box_cox(GDP, lam))) |>
    forecast(h = 1) |>
    accuracy(us_gdp) |>
    select(.model, RMSE:MAPE)
kable(compare, format = "simple")

.model	RMSE	MAE	MPE	MAPE
ARIMA(box_cox(GDP, lam))	180880178014	114151663572	0.0725540	1.679234
ETS(GDP)	190604101250	126812323851	0.5935103	1.731149

fit3 <- us_gdp |>
    model(ETS = ETS(GDP))
p28 <- fit3 |>
    forecast(h=10) |>
    filter(.model == "ETS") |>
    autoplot(us_gdp) +
    labs(subtitle = "US GDP: ETS Forecasts")
p28

ETS() is less accurate by comparison. Its forecasts are less optimistic, and its prediction intervals are much wider, although ARIMA prediction intervals will always tend to be too narrow.

DATA624 - Homework 6

Glen Dale Davis

2023-10-11

Packages:

Exercise 9.1:

Exercise 9.2:

Exercise 9.3:

Exercise 9.5:

Exercise 9.6:

Exercise 9.7:

Exercise 9.8: