Do the exercises 9.1, 9.2, 9.3, 9.5, 9.6, 9.7, 9.8 in Hyndman. Please submit both the Rpubs link as well as your .rmd file.

library(fpp3)

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──

## ✔ tibble      3.3.0     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.2
## ✔ lubridate   1.9.4     ✔ fable       0.4.1
## ✔ ggplot2     4.0.0

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0     ✔ readr   2.1.5
## ✔ purrr   1.1.0     ✔ stringr 1.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()     masks stats::filter()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag()        masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(tsibble)

9.1

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

a. Explain the differences among these figures. Do they all indicate that the data are white noise?

All three ACF plots have autocorrelations that stay within the significance lines and are centered around zero, indicating that the data are white noise. However, in the ACF with 36 random numbers, the random fluctuations in autocorrelation are greater than in the ACF plot with 1000 samples. The blue significance boundary is also much wider in the small sample because the standard error of the autocorrelations is larger when the sample size is small. As the number of observations grows larger, the fluctuations are smaller and the boundaries are tighter.

b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The critical values are different because the significance bonuds are calculated by \(^+_- 1.96\sqrt{n}\) so smaller sample sizes will have wider bonuds and larger sample sizes will have smaller bounds. The autocorrelations are different in each figure because when there is fewer observations, the random variation is greater so the spikes look larger. When the sample size is larger, the autocorrelations in white noise converge towards zero making the plot appear smoother.

9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

# Amazon stock price
amzn <- gafa_stock %>%
  filter(Symbol == "AMZN") %>%
  select(Date, Close)

amzn %>%
  gg_tsdisplay(Close, plot_type = 'partial')

## Warning: `gg_tsdisplay()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsdisplay()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning: Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.

The plot of the daily close price of Amazon stock shows a strong upward trend up until 2019 where there is a downward trend. Since the price does not fluctuate aronud a constant mean, this does violates stationarity and indicates that the series is non-stationary. The ACF plot shows autocorrelation close to 1 across all lags, slowly decaying as the lag increases. This is common with random walks, a non-stationary process. The PACF plot shows a large spike at lag 1, meaning today’s price is highly dependent on yesterday’s price. Together, these plots confirm that the series is non-stationary and must be differenced to achieve stationarity before fitting an ARIMA model.

9.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

a. Turkish GDP from global_economy.

gdp <- global_economy %>% 
  filter(Country == "Turkey") %>% 
  select(Year, GDP)

gdp %>%
  gg_tsdisplay(GDP, plot_type = 'partial')

The plot of the Turkish GDP shows a strong upward trend and the size of the fluctuations grows as GDP grows. The ACF plot shows a slow decay with large positive correlations across many lags, indicative of a non-stationary series. To determine which Box-Cox transformation is appropriate, we can estimate lambda first.

gdp %>% features(GDP, guerrero)

## # A tibble: 1 × 1
##   lambda_guerrero
##             <dbl>
## 1           0.157

Since the lambda is near 0, we should do a log transformation.

gdp_t <- gdp %>%  
  mutate(y = box_cox(GDP, 0.1572187))

gg_tsdisplay(gdp_t, y, plot_type = "partial") +
  labs(title = "Turkey GDP (Box–Cox transformed)")

After transforming, the flucuations are not as large as before but the ACF still shows a slow decay with positive correlations. This indicates that the trend has not been removed and the series is still non-stationary. We can difference it once to remove the trend.

gdp_t %>% 
  mutate(dy = difference(y)) %>% 
  gg_tsdisplay(dy, plot_type = "partial") +
  labs(title = "Differenced log(GDP): Turkey")

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

After differencing once, the strong upward trend has disappeared and the series now fluctuates around a constant mean, consistent with stationarity. The ACF plot no longer shows slow decay, and the autocorrelations are small and remain within the significance bounds. After applying a log transformation followed by one regular difference, the Turkish GDP data can be considered stationary.

b. Accommodation takings in the state of Tasmania from aus_accommodation.

ac_tas <- aus_accommodation  %>% 
  filter(State == "Tasmania") %>% 
  select(Date, Takings)

ac_tas %>% 
  gg_tsdisplay(Takings, plot_type = "partial")

The time plot shows a clear upward trend and strong seasonal pattern, with the variance increasing as the level of the series rises. The ACF plot also shows strong seasonality, with significant spikes at lags 4, 8, 12, and 16. This indicates the data is non-stationary and requires both a Box–Cox transformation (log) to stabilize the variance and differencing to remove the trend and seasonality.

# Using Guerrero's method to check lambda
ac_tas %>% 
  features(Takings, guerrero)

## # A tibble: 1 × 1
##   lambda_guerrero
##             <dbl>
## 1         0.00182

ac_tas_t <- ac_tas %>%  
  mutate(y = box_cox(Takings, 0.001819643))

gg_tsdisplay(ac_tas_t, y, plot_type = "partial") +
  labs(title = "Tasmania Accommodation Takings(Box–Cox transformed)")

After transformation, the upward trend is still present but the variance is not increasing over time and is more proportional than before. The ACF plot shows strong autocorrelation at lags 4, 8, 12, and 16 with slow decay, indicating non-stationarity. We apply seasonal differencing at lag 4 to remove the quarterly seasonal pattern.

ac_tas_S1 <- ac_tas_t %>%
  mutate(yS = difference(y, lag = 4))

gg_tsdisplay(ac_tas_S1, yS, plot_type = "partial") +
  labs(title = "Seasonally differenced (lag = 4)")

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

After applying the log transform and one seasonal difference, the series fluctuates around a constant mean and the ACF/PACF show no patterns, indicating that the data are stationary. The seasonal spikes in the ACF at lags 4, 8, and 12 have disappeared, confirming that the seasonal component has been removed. The PACF no longer shows significant autocorrelations beyond lag 1, suggesting that no further differencing is needed. Thus, a log transformation with one seasonal difference is sufficient to obtain stationarity.

c. Monthly sales from souvenirs.

sou <- souvenirs %>%  
  select(Month, Sales)

sou %>% 
  gg_tsdisplay(Sales, plot_type = "partial")

sou %>% 
  features(Sales, guerrero)

## # A tibble: 1 × 1
##   lambda_guerrero
##             <dbl>
## 1         0.00212

sou %>%
  features(Sales, unitroot_ndiffs)

## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      1

sou %>%
  features(Sales, unitroot_nsdiffs)

## # A tibble: 1 × 1
##   nsdiffs
##     <int>
## 1       1

The time plot shows a very strong upward trend as well as clear seasonal pattern that repeats every 12 months. Variance also increases as sales increases, suggesting a Box - Cox transformation is needed. Using Guerrero’s method, the lambda is very close to zero, indicating a log transformation should be used. The ACF plot shows large spikes at 1, 12, and 24 and a slow decay, indicating non-stationary data. Based on the KPSS test, one regular and seasonal difference shod be performed to get stationary data. Since the ACF shows strong seasonal spikes, seasonal differencing should be applied first.

# Log transformation and seasonal difference
sou_t <- sou %>% mutate(y = box_cox(Sales, 0))  
sou_D1 <- sou_t %>% mutate(yS = difference(y, lag = 12))

gg_tsdisplay(sou_D1, yS, plot_type="partial") +
  labs(title = "Log transform + seasonal difference (lag = 12)")

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

The ACF plot still shows some slow decay so we can perform a regular difference next.

# Regular differencing
sou_D1d1 <- sou_D1 %>% mutate(ySd = difference(yS))

gg_tsdisplay(sou_D1d1, ySd, plot_type = "partial") +
  labs(title = "Log transform + seasonal (12) + regular difference")

## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_point()`).

After a log transformation, seasonal differencing, and one regular differencing, the strong trend and seasonal cycles are gone. The series fluctuates randomly around a constant mean and the ACF plot shows no slow decay, indicating stationarity.

9.5

For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

set.seed(12345)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1))

myseries %>% 
  gg_tsdisplay(Turnover, plot_type='partial')

# Checking lambda
myseries |> features(Turnover, guerrero)

## # A tibble: 1 × 3
##   State             Industry                                     lambda_guerrero
##   <chr>             <chr>                                                  <dbl>
## 1 Western Australia Clothing, footwear and personal accessory r…         0.00219

The time plot shows a strong upward trend and clear seasonality in the data. Since variance increase over time and the lambda value is close to zero, a log transformation is needed. Also, the ACF plot shows strong seasonal spikes are multiples of 12 months and slow decay, indicating non-statioanry.

myseries_t <- myseries %>% 
  mutate(y = box_cox(Turnover, 0))

gg_tsdisplay(myseries_t, y, plot_type = "partial") +
  labs(title = "Log transform Turnover")

After a log transformation, the variance stabilized but the ACF plot still shows a slow decay so differncing is needed. Since there is large spikes at lag 12, 14 suggesting strong annual seasonality, removing seasonality with seasonal differecing should be performed first.

# seasonal diff
myseries_D1 <- myseries_t %>% 
  mutate(yS = difference(y, lag = 12)) 

gg_tsdisplay(myseries_D1, yS, plot_type = "partial") +
  labs(title = "Retail: log + seasonal (12)")

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

After performing a seasonal differencing, the seasonality is gone but the ACF still shows trend remaining. A regular difference should be performed to get stationary data.

myseries_D1d1 <- myseries_D1 %>% 
  mutate(ySd = difference(yS))

gg_tsdisplay(myseries_D1d1, ySd, plot_type = "partial") +
  labs(title = "Retail: log + seasonal (12) + regular diff")

## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_point()`).

After applying a log transform and seasonal differencing at lag 12, the strong annual seasonality was removed but the series still showed a trend. Adding a further regular difference eliminated this trend and the series now fluctuates around a constant mean. The ACF and PACF values are small and within the significance bands, indicating no strong autocorrelation. The transformed and differenced series is now stationary

9.6

Simulate and plot some data from simple ARIMA models.

a. Use the following R code to generate data from an AR(1) model with phi1 = 0.6 and sigma^2 = 1. The process starts with y1= 0

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

b. Produce a time plot for the series. How does the plot change as you change phi1?

phi <- 0.6
sim_ar1 <- function(phi, n = 100) {
  for (i in 2:n) {
    y[i] <- phi * y[i-1] + e[i]
  }
  tsibble(idx = seq_len(n), y = y, index = idx)
}

set.seed(123)
sim_02 <- sim_ar1(0.2)
sim_06 <- sim_ar1(0.6)
sim_09 <- sim_ar1(0.9)
sim_neg <- sim_ar1(-0.6)

autoplot(sim_02, y) + labs(title="AR(1) with phi=0.2")

autoplot(sim_06, y) + labs(title="AR(1) with phi=0.6")

autoplot(sim_09, y) + labs(title="AR(1) with phi=0.9")

autoplot(sim_neg, y) + labs(title="AR(1) with phi=-0.6")

When phi is small, the series looks random, close to white noise and is little dependent on the past values. As phi increases towards 1, the dependence increases and the series shows longer streaks of highs or lows. When phi is very close to 1, extremely dependent on the past values so it resembles a random walk. When phi is negative, the series oscillates around the mean.

c. Write your own code to generate data from an MA(1) model with theta1 = 0.6 and sigma^2 = 1.

set.seed(123)
n <- 100
theta <- 0.6
sigma <- 1

# Generate random errors
e <- rnorm(n + 1, mean = 0, sd = sigma)

# Initialize and compute MA(1)
y <- numeric(n)
for (t in 2:(n+1)) {
  y[t-1] <- e[t] + theta * e[t-1]
}

# Store in a tsibble
sim_ma1 <- tsibble(idx = 1:n, y = y, index = idx)

# Plot
autoplot(sim_ma1, y) +
  labs(title = "Simulated MA(1) with theta = 0.6, sigma^2 = 1",
       x = "Time", y = "y")

d. Produce a time plot for the series. How does the plot change as you change theta1?

set.seed(123)
n <- 100
thetas <- c(-0.9, -0.6, 0, 0.6, 0.9)  

simulate_ma1 <- function(theta, n){
  e <- rnorm(n)
  y <- numeric(n)
  for(i in 2:n){
    y[i] <- e[i] + theta*e[i-1]
  }
  return(y)
}

for (theta in thetas){
  y <- simulate_ma1(theta, n)
  ts.plot(y, main = paste("MA(1) with theta1 =", theta), ylab="y")
}

When theta1 = 0, the series is just pure white noise with no dependence. As theta increases, nearby values become more correlated and the series shows smoother short term patterns. When the theta is negative, there is a strong oscillating pattern around zero.

e. Generate data from an ARMA(1,1) model with phi1 = 0.6, theta1 = 0.6, and sigma^2 = 1.

for (t in 2:(n+1)) {
  y[t-1] <- phi * y[t-1] + e[t] + theta * e[t-1]
}

sim_arma11 <- tsibble(idx = 1:n, y = y, index = idx)

autoplot(sim_arma11, y) +
  labs(title = "Simulated ARMA(1,1) with phi = 0.6, theta = 0.6, sigma^2 = 1",
       x = "Time", y = "y")

f. Generate data from an AR(2) model with phi1 = -0.8, phi2 =0.3, and sigma^2 = 1. (Note that these parameters will give a non-stationary series.)

phi1 <- -0.8
phi2 <- 0.3
sigma <- 1

for (t in 3:n) {
  y[t] <- phi1 * y[t-1] + phi2 * y[t-2] + e[t]
}

sim_ar2 <- tsibble(idx = 1:n, y = y, index = idx)

autoplot(sim_ar2, y) +
  labs(title = "Simulated AR(2) with phi1 = -0.8, phi2 = 0.3, sigma^2 = 1",
       x = "Time", y = "y")

### g.Graph the latter two series and compare them.

sim_arma11 %>% 
  gg_tsdisplay(y, plot_type='partial') +
  labs(title = "ARMA(1,1) model")

sim_ar2 %>% 
  gg_tsdisplay(y, plot_type='partial') +
  labs(title = "ARMA(2) model")

ARMA(1,1) is stationaary as the series flucutuates around a sontant level. There is no slow decay in the ACF plot. On the other hand, the AR(2) model is oscillating and has variance taht increases over time, indicating non stationarity. This instability makes AR(2) model not suitable for forecasting and further transformation is needed.

9.7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

a. Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods

fit <- aus_airpassengers %>%
  model(ARIMA(Passengers))

report(fit)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

# Check residuals
fit %>% 
  gg_tsresiduals() +
  labs(title = "Residuals for ARIMA(0,2,1)")

## Warning: `gg_tsresiduals()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsresiduals()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

The residuals do look like white noise. The time plot of residuals fluctuates randomly around zero with no visible trend or seasonality. The ACF plot shows all autocorrelations are within the significance bound.

# Forecast next 10 periods
fc <- fit %>% forecast(h = 10)

autoplot(fc, aus_airpassengers) +
  labs(title = "ARIMA Forecast for Australian Air Passengers",
       y = "Passengers (millions)")

### b. Write the model in terms of the backshift operator. \[(1 - B)^2 y_t = (1 + 0.6B)\varepsilon_t\] ### c. Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

# Fit ARIMA(0,1,0) with drift
fit_rwdrift <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ pdq(0,1,0)))

report(fit_rwdrift)

## Series: Passengers 
## Model: ARIMA(0,1,0) w/ drift 
## 
## Coefficients:
##       constant
##         1.4191
## s.e.    0.3014
## 
## sigma^2 estimated as 4.271:  log likelihood=-98.16
## AIC=200.31   AICc=200.59   BIC=203.97

# Forecast next 10 periods
fc_rwdrift <- fit_rwdrift %>% forecast(h = 10)

autoplot(fc_rwdrift, aus_airpassengers) +
  labs(title = "ARIMA(0,1,0) with Drift Forecast",
       y = "Passengers (millions)")

The ARIMA(0,2,1) model from part a had a lower AIC and BIC, indicating that it is a better fit compared to the ARIMA(0,1,0) with drift model. Both models captured the upward trend of the data but ARIMA(0,2,1) performed better due to its moving average term. ARIMA(0,1,0) with drift just produced a simple linear forecast but ARIMA(0,2,1) mode incorporated past errors, leading to a better forecast.

d. Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

fit_212_drift <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ 1 + pdq(2,1,2)))

report(fit_212_drift)

## Series: Passengers 
## Model: ARIMA(2,1,2) w/ drift 
## 
## Coefficients:
##           ar1      ar2     ma1     ma2  constant
##       -0.5518  -0.7327  0.5895  1.0000    3.2216
## s.e.   0.1684   0.1224  0.0916  0.1008    0.7224
## 
## sigma^2 estimated as 4.031:  log likelihood=-96.23
## AIC=204.46   AICc=206.61   BIC=215.43

fc_212_drift <- forecast(fit_212_drift, h = 10)
autoplot(fc_212_drift, aus_airpassengers) +
  labs(title = "ARIMA(2,1,2) with drift")

# Removing the constant
fit_212_nodrift <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))

## Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 2))
## [1] non-stationary AR part from CSS

report(fit_212_nodrift)

## Series: Passengers 
## Model: NULL model 
## NULL model

fc_212_nodrift <- forecast(fit_212_nodrift, h = 10)
autoplot(fc_212_nodrift, aus_airpassengers) +
  labs(title = "ARIMA(2,1,2) without drift")

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).

The ARIMA(2,1,2) model with drift had higher AIC and BIC than the models in parts (a) and (c), indicating poorer fit. Its forecasts resemble those from part (c) but with added curvature due to the AR and MA terms. When the constant is removed, the model fails to estimate and becomes null.

e. Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

# ARIMA(0,2,1) with constant
fit_021_const <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))

## Warning: Model specification induces a quadratic or higher order polynomial trend. 
## This is generally discouraged, consider removing the constant or reducing the number of differences.

report(fit_021_const)

## Series: Passengers 
## Model: ARIMA(0,2,1) w/ poly 
## 
## Coefficients:
##           ma1  constant
##       -1.0000    0.0571
## s.e.   0.0585    0.0213
## 
## sigma^2 estimated as 3.855:  log likelihood=-95.1
## AIC=196.21   AICc=196.79   BIC=201.63

# Forecast 10 periods ahead
fc_021_const <- forecast(fit_021_const, h = 10)

autoplot(fc_021_const, aus_airpassengers) +
  labs(title = "ARIMA(0,2,1) with Constant")

When a constant is added to ARIMA(0,2,1), the model produces a quadratic trend and the forecast curves upward sharply. Instead of a steady linear growth, the forecast accelerates rapidly, which is inconsistent with the overall trend of the series.

9.8

For the United States GDP series (from global_economy):

a. if necessary, find a suitable Box-Cox transformation for the data;

us_gdp <- global_economy %>%
  filter(Country == "United States")

autoplot(us_gdp, GDP)

us_gdp %>%
  features(GDP, features = guerrero)

## # A tibble: 1 × 2
##   Country       lambda_guerrero
##   <fct>                   <dbl>
## 1 United States           0.282

The US GDP shows a strong upward trend. The lambda value is 0.2819, close to zero so a log transformation is appropriate.

b. fit a suitable ARIMA model to the transformed data using ARIMA();

# Log transformation of GDP
us_gdp <- global_economy %>%
  filter(Country == "United States") %>%
  mutate(log_GDP = box_cox(GDP, 0))  

gg_tsdisplay(us_gdp, GDP, plot_type = "partial") +
  labs(title = "Log transform US GDP")

# Fit ARIMA model
fit_us <- us_gdp %>%
  model(ARIMA(log_GDP))

report(fit_us)

## Series: log_GDP 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.6353
## s.e.   0.1138
## 
## sigma^2 estimated as 0.0004278:  log likelihood=139.76
## AIC=-275.53   AICc=-275.3   BIC=-271.48

# Forecast 10 periods ahead
fc_us <- forecast(fit_us, h = 10)

autoplot(fc_us, us_gdp) +
  labs(title = "US GDP Forecast (log transformed + ARIMA)", y = "Log(GDP)")

gg_tsresiduals(fit_us)

After applying a log transformation, the time plot appears smoother, but the ACF shows slow decay, indicating that differencing is required. The ARIMA() function selected an ARIMA(0,2,1) model, which is reasonable given the strong trend in the series. Two differences were needed to achieve stationarity and the MA(1) term captures the short-run correlation.

c. try some other plausible models by experimenting with the orders chosen

# ARIMA(1,2,0)
fit120 <- us_gdp %>%
  model(ARIMA(log_GDP ~ pdq(1,2,0)))

report(fit120)

## Series: log_GDP 
## Model: ARIMA(1,2,0) 
## 
## Coefficients:
##           ar1
##       -0.3542
## s.e.   0.1268
## 
## sigma^2 estimated as 0.0004842:  log likelihood=136.28
## AIC=-268.55   AICc=-268.33   BIC=-264.5

# ARIMA(0,2,2)
fit022 <- us_gdp %>%
  model(ARIMA(log_GDP ~ pdq(0,2,2)))

report(fit022)

## Series: log_GDP 
## Model: ARIMA(0,2,2) 
## 
## Coefficients:
##           ma1      ma2
##       -0.5482  -0.1165
## s.e.   0.1403   0.1332
## 
## sigma^2 estimated as 0.0004302:  log likelihood=140.15
## AIC=-274.3   AICc=-273.83   BIC=-268.22

# ARIMA(1,2,1)
fit121 <- us_gdp %>%
  model(ARIMA(log_GDP ~ pdq(1,2,1)))

report(fit121)

## Series: log_GDP 
## Model: ARIMA(1,2,1) 
## 
## Coefficients:
##          ar1      ma1
##       0.1550  -0.7167
## s.e.  0.1922   0.1297
## 
## sigma^2 estimated as 0.000431:  log likelihood=140.09
## AIC=-274.18   AICc=-273.72   BIC=-268.11

Among the other plausible models, the ARIMA(0,2,1) model from part b had lowest AIC and BIC, making it the best fitting option.

d. choose what you think is the best model and check the residual diagnostics;

fit_best <- us_gdp %>%
  model(ARIMA(log_GDP ~ pdq(0,2,1)))

report(fit_best)

## Series: log_GDP 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.6353
## s.e.   0.1138
## 
## sigma^2 estimated as 0.0004278:  log likelihood=139.76
## AIC=-275.53   AICc=-275.3   BIC=-271.48

# Residual diagnostics
fit_best %>% gg_tsresiduals()

The ARIMA(0,2,1) model was selected as the best because it has the lowest AIC. This residual diagnostic plot confirms that the ARIMA(0,2,1) model is a good fit. The residuals fluctuate randomly around zero without any visible trend. The ACF of the residuals shows no significant autocorrelation, which means the model captured the dependence in the data.

e. produce forecasts of your fitted model. Do the forecasts look reasonable?

fit_best %>%
  forecast(h = 10) %>%
  autoplot(us_gdp) +
  labs(title = "US GDP Forecast from ARIMA(0,2,1)",
       y = "Log US GDP")

The forecasts from the ARIMA(0,2,1) model shows a steadily increasing upward trend, inline with the long term growth in US GDP. Thus, the forecasts looks reasonable for the series.

f. compare the results with what you would obtain using ETS() (with no transformation).

# Fit ETS model
fit_ets <- us_gdp %>%
  model(ETS(GDP))

# Forecast 10 steps ahead
fc_ets <- fit_ets %>% forecast(h = 10)

# Compare with ARIMA forecasts
fit_arima <- us_gdp %>%
  model(ARIMA(log_GDP ~ pdq(0,2,1)))

fc_arima <- fit_arima %>% forecast(h = 10)

# Plot
autoplot(fc_ets, us_gdp) +
  labs(title = "ETS Forecasts of US GDP", y = "GDP")

autoplot(fc_arima, us_gdp) +
  labs(title = "ARIMA(0,2,1) Forecasts of log-transformed GDP", y = "Log GDP")

The ETS model forecasts a exponential growth with a wide prediction interval where as the ARIMA(0,2,1) model forecasts a smoother, steady growth with narrower intervals. Since the ARIMA model used the log transformed GDP, it produced a more linear looking forecast than the ETS model.

HW6

Jian Quan Chen

2025-10-06

9.1