DATA 624 -HW 6

library(fpp3)

Exercise 9.1

1. Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Answer:

a. Explain the differences among these figures. Do they all indicate that the data are white noise?

Figure 9.32: Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers.

Yes, The three plots all represent white noise, but the differences are due to sample size.

In the first plot (36 points), the small sample size causes more random spikes, some of which exceed the significance bands.
In the second plot (360 points), the ACF values are smaller and more stable.
In the third plot (1,000 points), the ACF values are closest to zero, which is expected for white noise.

As the sample size increases, the random variation decreases, and the ACF values better reflect the true white noise pattern.

b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The critical values are different in each plot because they are based on the sample size. The blue dashed lines represent the significance limits, calculated using the formula \(\pm \frac{1.96}{\sqrt{T}}\), where \(T\) is the sample size. As the sample size increases (from 36 to 360 to 1000), the denominator becomes larger, causing the critical values to move closer to zero. This narrowing of the limits reflects greater precision in estimating the true autocorrelations, which are expected to be near zero for white noise.

Exercise 9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

Answer:

data("gafa_stock")

amazon_stock <- gafa_stock |>
  filter(Symbol == "AMZN")

A non-stationary series often shows trends, increasing variance, or seasonality. Visualizing the data will help identify these characteristics

amazon_stock |>
  gg_tsdisplay(Close, lag_max = 30) +
  labs(title = "Amazon Stock - Time Series, ACF & PACF")

The Amazon stock time series plot shows a clear upward trend, indicating non-stationarity. The ACF plot confirms this with autocorrelations that persist across many lags, which is a common feature of non-stationary data. The PACF plot shows a strong spike at lag 1, further suggesting the need for first-order differencing.

To address this, we apply first-order differencing to remove the trend and stabilize the series.

amazon_stock_diff <- amazon_stock |>
  mutate(Diff_Close = difference(Close))

# Plot differenced series
amazon_stock_diff |>
  autoplot(Diff_Close) +
  labs(title = "Differenced Amazon Stock Prices (d = 1)",
       y = "Differenced Closing Price",
       x = "Date")

The differenced Amazon stock price plot shows that the upward trend has been removed, and the data now fluctuates around zero with no obvious trend; a sign that the series is stationary.

Next, we confirm stationarity by examining the ACF and PACF of the differenced series.

amazon_stock_diff |>
  gg_tsdisplay(Diff_Close, lag_max = 30) +
  labs(title = "Amazon Stock (Differenced) - Time Series, ACF & PACF")

The ACF plot now shows quick cut-off behavior, confirming that the series is stationary. The PACF no longer shows a strong spike at lag 1, confirming that first-order differencing (d = 1) was appropriate.

The Amazon stock series was identified as non-stationary due to its upward trend, slow decay in the ACF, and PACF spike at lag 1. After applying first-order differencing, the series became stationary, as confirmed by the absence of a trend in the time series plot, the ACF cutting off quickly, and the PACF stabilizing. This transformation confirms that differencing was necessary to stabilize the series.

Exercise 9.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

To obtain stationary data, we will analyze the Turkish GDP series from the global_economy dataset.

a. find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data for Turkish GDP from global_economy.

Let’s explore the Turkish GDP Data

global_economy |>
  glimpse()

## Rows: 15,150
## Columns: 9
## Key: Country [263]
## $ Country    <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan",…
## $ Code       <fct> AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG,…
## $ Year       <dbl> 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969,…
## $ GDP        <dbl> 537777811, 548888896, 546666678, 751111191, 800000044, 1006…
## $ Growth     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CPI        <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Imports    <dbl> 7.024793, 8.097166, 9.349593, 16.863910, 18.055555, 21.4128…
## $ Exports    <dbl> 4.132233, 4.453443, 4.878051, 9.171601, 8.888893, 11.258279…
## $ Population <dbl> 8996351, 9166764, 9345868, 9533954, 9731361, 9938414, 10152…

turkish_gdp <- global_economy |>
  filter(Country == "Turkey")

turkish_gdp |>
  autoplot(GDP) +
  labs(title = "Turkish GDP", y = "GDP (in USD billions)")

The series shows a clear upward trend with increasing variance, indicating non-stationarity and the need for a Box-Cox transformation to stabilize variance.

Identify non-stationarity with the ACF and PACF.

turkish_gdp |>
  gg_tsdisplay(GDP, lag_max = 30) +
  labs(title = "Turkish GDP - Time Series, ACF & PACF")

The slow decay in the ACF confirms that the series is non-stationary. The strong autocorrelation across multiple lags suggests the need for differencing.

Identify an Appropriate Box-Cox Transformation: A Box-Cox transformation stabilizes variance and improves stationarity. The features() function with guerrero() can helps determine the optimal lambda (Box-Cox parameter).

turkish_gdp |>
  features(GDP, features = guerrero)

## # A tibble: 1 × 2
##   Country lambda_guerrero
##   <fct>             <dbl>
## 1 Turkey            0.157

The optimal Box-Cox parameter was identified as \(\lambda = 0.157.\)

Apply the Box-Cox Transformation and Differencing:

turkish_gdp <- turkish_gdp |>
  mutate(GDP_transformed = box_cox(GDP, lambda = 0.157)) |> # Box-Cox Transformation 
  mutate(GDP_diff = difference(GDP_transformed)) # Differencing

turkish_gdp |>
  autoplot(GDP_diff) +
  labs(title = "Differenced Turkish GDP (Box-Cox Transformed)", 
       y = "Differenced GDP")

The differenced Turkish GDP series (after applying the Box-Cox transformation) now fluctuates around zero, showing no clear upward or downward trend which is a key characteristic of a stationary series. While some variability remains, this is common in economic data and does not suggest non-stationarity. The series appears to be successfully stabilized, confirming that the applied Box-Cox transformation and first-order differencing were appropriate steps.

turkish_gdp |>
  gg_tsdisplay(GDP_diff, lag_max = 30) +
  labs(title = "Differenced Turkish GDP - ACF & PACF")

Here we can see that:

The series now fluctuates around zero with no clear trend.
The ACF shows quick cut-off behavior, confirms stationarity.
The PACF shows no significant spikes, suggests no additional differencing is required.

The combination of the Box-Cox transformation \((\lambda = 0.157)\) and first-order differencing (d = 1) successfully transformed the series into a stationary series.

b. Find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data for accommodation takings in the state of Tasmania from ‘aus_accommodation’.

aus_accommodation |>
  filter(State == "Tasmania") |>
  gg_tsdisplay(Takings, lag_max = 30) +
  labs(title = "Non-transformed Tasmania Accomodation Takings")

The Tasmania accommodation takings series shows a clear upward trend with seasonal patterns, indicating non-stationarity. The ACF’s slow decay confirms the presence of a trend, suggesting the need for differencing and possibly a Box-Cox transformation to stabilize variance.

lambda <- aus_accommodation |>
  filter(State == "Tasmania") |>
  features(Takings, features = guerrero) |>
  pull(lambda_guerrero)  # Store lambda in a variable

aus_accommodation |>
  filter(State == "Tasmania") |>
  features(box_cox(Takings, lambda), unitroot_nsdiffs)

## # A tibble: 1 × 2
##   State    nsdiffs
##   <chr>      <int>
## 1 Tasmania       1

aus_accommodation |>
  filter(State == "Tasmania") |>
  gg_tsdisplay(difference(box_cox(Takings, lambda), 4), lag_max = 30) +
  labs(title = "Tasmania Accommodation Takings (Transformed Data)")

The transformed Tasmania accommodation takings series now do not shows any clear upward trend, with fluctuations appearing stable around a constant mean. The ACF plot shows a sharp drop-off, indicating that the data is now stationary and suitable for modeling.

c.Find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data for monthly sales from ‘souvenirs.’

souvenirs |>
  gg_tsdisplay(Sales, lag_max = 30) +
  labs(title = "Monthly Souvenir Sales - ACF & PACF")

The above plot of the souvenir sales data shows a clear upward trend with strong seasonal peaks occurring annually. The ACF plot confirms non-stationarity, as the slow decay in the ACF suggests a persistent trend, while the noticeable spikes at lag 12 indicate seasonality. These characteristics indicate the need for transformation and differencing.

lambda <- souvenirs |>
  features(Sales, features = guerrero) |>
  pull(lambda_guerrero)
lambda

## [1] 0.002118221

lambda <- 0.0021
souvenirs |>
  gg_tsdisplay(box_cox(Sales, lambda), lag_max = 30) +
  labs(title = "Transformed Souvenir Sales")

A Box-Cox transformation with a calculated lambda value of 0.0021 is applied to stabilize the variance. The transformed series shows reduced variability, but the upward trend and seasonal pattern are still present, confirming that further adjustments are needed.

souvenirs |>
  mutate(Sales_transformed = box_cox(Sales, lambda)) |>
  mutate(Sales_diff = difference(Sales_transformed)) |>
  mutate(Sales_seasonal_diff = difference(Sales_diff, lag = 12)) |>  
  drop_na() |>  
  autoplot(Sales_seasonal_diff) +
  labs(title = "Differenced Souvenir Sales")

To address the remaining trend and seasonality, first-order differencing is applied to remove the trend, followed by seasonal differencing with a lag of 12 to account for the seasonal pattern. The resulting plot shows the series fluctuating around zero, suggesting improved stationarity.

souvenirs |>
  mutate(Sales_transformed = box_cox(Sales, lambda)) |>
  mutate(Sales_diff = difference(Sales_transformed)) |>
  mutate(Sales_seasonal_diff = difference(Sales_diff, lag = 12)) |>  
  drop_na() |>  
  gg_tsdisplay(Sales_seasonal_diff, lag_max = 30) +
  labs(title = "Souvenir Sales - ACF & PACF (Final)")

The ACF and PACF plots confirm stationarity. The ACF now shows a rapid cutoff after a few lags, and the PACF exhibits mostly insignificant spikes, both of which are common indicators of stationary data. This confirms that the Box-Cox transformation combined with differencing successfully stabilized the series.

Exercise 9.5

For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

Let’s plot the data first from exercise 7 in section 2.10:

set.seed(678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1))

# plot
myseries |>
  gg_tsdisplay(Turnover, lag_max = 30) +
  labs(title = "Australian Retail Series - ACF & PACF (Original Data)")

The Australian retail data shows a clear upward trend and seasonal patterns, confirming non-stationarity. The ACF plot’s slow decay and seasonal spikes indicate the need for a Box-Cox transformation and differencing to achieve stationarity

lambda <- myseries |>
  features(Turnover, features = guerrero) |>
  pull(lambda_guerrero)
lambda

## [1] 0.4449179

# Apply Box-Cox Transformation
myseries |>
  gg_tsdisplay(box_cox(Turnover, lambda), lag_max = 30) +
  labs(title = "Transformed Australian Retail Series")

The Box-Cox transformation with \(\lambda = 0.4449\) was applied to stabilize the variance, but the series still showed a trend and seasonality. The ACF plot confirmed the need for further differencing to achieve stationarity.

myseries |>
  mutate(Turnover_transformed = box_cox(Turnover, lambda)) |>
  mutate(Turnover_diff = difference(Turnover_transformed)) |>
  mutate(Turnover_seasonal_diff = difference(Turnover_diff, lag = 12)) |>  
  drop_na() |>  
  gg_tsdisplay(Turnover_seasonal_diff, lag_max = 30) +
  labs(title = "Differenced & Transformed Australian Retail Series")

The final plot shows the transformed and differenced Australian retail series. A Box-Cox transformation with \(\lambda = 0.4449\) was applied to stabilize the variance. First-order differencing removed the trend, and seasonal differencing (lag = 12) eliminated the seasonal pattern. The series now fluctuates around zero, indicating stationarity. The ACF plot confirms this with a rapid cutoff and small spikes, showing that the data has been successfully stabilized.

Exercise 9.6

Simulate and plot some data from simple ARIMA models.

a. Use the following R code to generate data from an AR(1) model with \(\phi_1 = 0.6 \ and\ \alpha^2 = 1\). The process starts with \(y_1 = 0.\)

y <- numeric(100)   
e <- rnorm(100) 

for(i in 2:100) {   
  y[i] <- 0.6 * y[i-1] + e[i]  
}

# Convert the series to a tsibble 
library(tsibble)
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

b.Produce a time plot for the series. How does the plot change as you change \(\phi_1\)?

library(ggplot2)
library(tsibble)
library(latex2exp)

## Warning: package 'latex2exp' was built under R version 4.2.3

library(patchwork)

# Plot 
sim |>
  autoplot(y) +
  labs(title = TeX("Simulated AR(1) Series with $\\phi_1 = 0.6$"),
       y = "Value",
       x = "Time")

Conditions for Stationarity in an AR(1) Model:

For an AR(1) model: \[ y_t = \phi_1 y_{t-1} + e_t\]

The series is stationary if \(|\phi_1| < 1\)
If \(\phi_1\) is between -1 and 1, the series will return to its mean over time and avoid explosive growth.

Exploring Different \(\phi_1\) Values:

simulate_ar1 <- function(phi, sigma = 1) {
  y <- numeric(100)
  e <- rnorm(100, mean = 0, sd = sigma)
  for (i in 2:100) {
    y[i] <- phi * y[i - 1] + e[i]
  }
  tsibble(idx = seq_len(100), y = y, index = idx)
}

phi_values <- c(0.3, 0.6, 0.9, -0.6, -0.9) # Different phi values for comparison

# Plot 
plots <- lapply(phi_values, function(phi) {
  simulate_ar1(phi) |>
    autoplot(y) +
    labs(title = TeX(paste0("AR(1) Process with $\\phi_1 = ", phi, "$")),
         y = "Value",
         x = "Time")
})

wrap_plots(plots, ncol = 2)

As \(\phi_1\) gets closer to \(1\), the series becomes smoother and takes longer to return to the mean. When \(\phi_1\) is negative, the series alternates more sharply. Larger absolute values make the series either more stable or more extreme in its movements.

c. Write your own code to generate data from an MA(1) model with \(\phi_1 = 0.6 \ and\ \alpha^2 = 1\).

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100) {
  y[i] <- e[i] + 0.6 * e[i-1] # Simulate MA(1) process
}

sim_ma1 <- tsibble(idx = seq_len(100), y = y, index = idx)

d.Produce a time plot for the series. How does the plot change as you change \(\theta_1\)?

# Function to simulate MA(1) process
simulate_ma1 <- function(theta, n = 100) {
  e <- rnorm(n)
  y <- numeric(n)
  y[1] <- e[1]  # Initial value
  for(i in 2:n) {
    y[i] <- e[i] + theta * e[i-1]
  }
  return(tsibble(idx = seq_len(n), y = y, index = idx))
}

theta_values <- c(0.3, 0.6, 0.9, -0.6, -0.9) 
plots <- lapply(theta_values, function(theta) {
  simulate_ma1(theta) |>
    ggplot(aes(x = idx, y = y)) +
    geom_line() +
    labs(title = latex2exp::TeX(paste0("MA(1) Process with $\\theta_1 = ", theta, "$")),
     x = "Time", 
     y = "Value")
})
wrap_plots(plots, ncol = 2)

As \(\theta_1\) increases in magnitude, the fluctuations in the series become more pronounced and smoother. Positive values of \(\theta_1\) produce less sharp changes, while negative values introduce alternating patterns, making the series appear more jagged.

e Generate data from an ARMA(1,1) model with \(\phi_1 = 0.6, \ \theta_1 = \ 0.6 \ \ and \ \sigma^2 = 1.\)

# Parameters
phi <- 0.6
theta <- 0.6
n <- 100  # observations

# Simulate ARMA(1,1) data
y <- numeric(n)
e <- rnorm(n)

for (i in 2:n) {
  y[i] <- phi * y[i - 1] + e[i] + theta * e[i - 1]
}
arma_data <- tsibble(idx = seq_len(n), y = y, index = idx)

f. Generate data from an AR(2) model with \(\theta_1 = −0.8,\ \theta_2 = 0.3 \ and \ \sigma^2 = 1\). (Note that these parameters will give a non-stationary series.)

# AR(2) model parameters
phi1 <- -0.8
phi2 <- 0.3

# Generate data
y <- numeric(100)
e <- rnorm(100)
y[1:2] <- c(0, 0)  # Starting values

for (i in 3:100) {
  y[i] <- phi1 * y[i-1] + phi2 * y[i-2] + e[i]
}
ar2_data <- tsibble(idx = seq_len(100), y = y, index = idx)

g Graph the latter two series and compare them.

# Plot both series
p1 <- ggplot(arma_data, aes(x = idx, y = y)) +
  geom_line() +
  labs(title = expression("ARMA(1,1) Process with " ~ phi[1] == 0.6 ~ "," ~ theta[1] == 0.6),
       x = "Time", y = "Value")

p2 <- ggplot(ar2_data, aes(x = idx, y = y)) +
  geom_line() +
  labs(title = expression("AR(2) Process with " ~ phi[1] == -0.8 ~ "," ~ phi[2] == 0.3),
       x = "Time", y = "Value")

# Combine plots
p1 + p2

The ARMA(1,1) process shows fluctuating values that remain relatively stable around zero, exhibiting random behavior with moderate variation. In contrast, the AR(2) process displays an explosive pattern, where the values grow rapidly and oscillate with increasing amplitude. This difference highlights the non-stationary nature of the AR(2) process caused by the chosen parameters, whereas the ARMA(1,1) process appears more stable and stationary.

Exercise 9.7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

a. Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

aus_airpassengers |>
  autoplot(Passengers) +
  labs(title = "Australian Air Passengers (1970-2011)",
       y = "Passengers (Millions)")

fit <- aus_airpassengers |>
  model(ARIMA(Passengers))
report(fit)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

fit |>
  gg_tsresiduals() +
  labs(title = "Residual Diagnostics for ARIMA(0,2,1)")

fit |>
  forecast(h = 10) |>
  autoplot(aus_airpassengers) +
  labs(title = "ARIMA(0,2,1) Forecast for Australian Air Passengers",
       y = "Passengers (Millions)",
       x = "Year")

The chosen ARIMA(0,2,1) model effectively captures the upward trend in Australian air passengers. The residuals resemble white noise, and the 10-period forecast suggests continued growth with increasing uncertainty over time.

b. Write the model in terms of the backshift operator.

The selected ARIMA(0,2,1) model can be written in terms of the backshift operator as follows:

\[ (1 - B)^2 y_t = (1 + \theta_1 B) e_t \] - \(B\) = Backshift operator

\(\theta_1 =−0.8963\)
\(y_t\) = The time series data
\(e_t\) = White noise error term

c. Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

ARIMA(0,1,0) with Drift:

fit_drift <- aus_airpassengers |>
  model(ARIMA(Passengers ~ 1 + pdq(0,1,0)))
report(fit_drift)

## Series: Passengers 
## Model: ARIMA(0,1,0) w/ drift 
## 
## Coefficients:
##       constant
##         1.4191
## s.e.    0.3014
## 
## sigma^2 estimated as 4.271:  log likelihood=-98.16
## AIC=200.31   AICc=200.59   BIC=203.97

Forecast for the Next 10 Periods

fc_drift <- fit_drift |>
  forecast(h = 10)

fc_drift |>
  autoplot(aus_airpassengers) +
  labs(title = "ARIMA(0,1,0) with Drift Forecast for Australian Air Passengers")

Comparison with ARIMA(0,2,1)

fc_comparison <- bind_rows(
  forecast(fit, h = 10) |> mutate(Model = "ARIMA(0,2,1)"),
  forecast(fit_drift, h = 10) |> mutate(Model = "ARIMA(0,1,0) with Drift")
)

fc_comparison |>
  autoplot(aus_airpassengers) +
  labs(title = "Forecast Comparison: ARIMA(0,2,1) vs ARIMA(0,1,0) with Drift")

The ARIMA(0,1,0) with drift model produces a linear forecast that extends the trend seen in the data, while the ARIMA(0,2,1) model captures more of the data’s curvature and projects a steeper increase. The comparison plot shows that ARIMA(0,2,1) forecasts higher passenger numbers with wider prediction intervals, while ARIMA(0,1,0) forecasts are smoother and more conservative.

d.Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

# Fit ARIMA(2,1,2) with drift
fit_arima212_drift <- aus_airpassengers |>
  model(ARIMA(Passengers ~ 1 + pdq(2,1,2)))

# Forecast for the next 10 periods
fc_arima212_drift <- fit_arima212_drift |>
  forecast(h = 10)

# Plot
fc_arima212_drift |>
  autoplot(aus_airpassengers) +
  labs(title = "ARIMA(2,1,2) with Drift")

The forecast plot for the ARIMA(2,1,2) with drift model shows a reasonable extension of the trend, maintaining a similar pattern as the previous ARIMA models. The uncertainty bands widen gradually, as expected

fc_comparison <- bind_rows(
  forecast(fit, h = 10) |> mutate(Model = "ARIMA(0,2,1)"),
  forecast(fit_drift, h = 10) |> mutate(Model = "ARIMA(0,1,0) with Drift"),
  forecast(fit_arima212_drift, h = 10) |> mutate(Model = "ARIMA(2,1,2) with Drift")
)

fc_comparison |>
  autoplot(aus_airpassengers) +
  labs(title = "Forecast Comparison: ARIMA(0,2,1) vs ARIMA(0,1,0) vs ARIMA(2,1,2) with Drift")

The side-by-side comparison effectively visualizes how each model behaves. The ARIMA(0,2,1) forecast (red) shows a slightly sharper upward trend, while the ARIMA(0,1,0) with drift (green) has a moderate upward slope. The ARIMA(2,1,2) with drift (blue) appears to strike a balance between the two models, producing a reasonable and smooth forecast.

fit_arima212_noconstant <- aus_airpassengers |>
  model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))

## Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 2))
## [1] non-stationary AR part from CSS

report(fit_arima212_noconstant)

## Series: Passengers 
## Model: NULL model 
## NULL model

The error message indicates that the non-stationary AR part caused the model to fail when the constant was removed.

The ARIMA(2,1,2) with drift produced a reasonable forecast, while the model without a constant failed to fit the data. This confirms that the drift term plays a significant role in modeling the trend effectively

e Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

fit_arima021_constant <- aus_airpassengers |>
  model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))

## Warning: Model specification induces a quadratic or higher order polynomial trend. 
## This is generally discouraged, consider removing the constant or reducing the number of differences.

report(fit_arima021_constant)

## Series: Passengers 
## Model: ARIMA(0,2,1) w/ poly 
## 
## Coefficients:
##           ma1  constant
##       -1.0000    0.0571
## s.e.   0.0585    0.0213
## 
## sigma^2 estimated as 3.855:  log likelihood=-95.1
## AIC=196.21   AICc=196.79   BIC=201.63

fc_arima021_constant <- fit_arima021_constant |>
  forecast(h = 10)

fc_arima021_constant |>
  autoplot(aus_airpassengers) +
  labs(title = "ARIMA(0,2,1) with Constant Forecast")

The ARIMA(0,2,1) model with a constant shows a steep upward trend in the forecast, resembling quadratic growth. This happens because the constant creates an accelerating trend. The warning suggests this may be unrealistic, and removing the constant or adjusting the model may improve the forecast.

Exercise 9.8

For the United States GDP series (from global_economy):

us_gdp  <- global_economy |>
  filter(Code == 'USA') |>
  select(Country, GDP)
  
us_gdp |>
  autoplot(GDP)

a.if necessary, find a suitable Box-Cox transformation for the data;

lambda_us <- us_gdp |>
  features(GDP, features=guerrero) |>
  pull(lambda_guerrero)
lambda_us

## [1] 0.2819443

us_gdp |> 
  autoplot(box_cox(GDP, lambda_us))

The Box-Cox transformation has helped stabilize the variance. This is crucial for fitting an effective ARIMA model.

b.fit a suitable ARIMA model to the transformed data using ARIMA();

fit_us <- us_gdp |>
    model(ARIMA(box_cox(GDP, lambda_us)))
report(fit_us)

## Series: GDP 
## Model: ARIMA(1,1,0) w/ drift 
## Transformation: box_cox(GDP, lambda_us) 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

The ARIMA() function suggests an ARIMA(1,1,0) model with drift.

c.try some other plausible models by experimenting with the orders chosen;

fit_others <- us_gdp |>
  model(
    "ARIMA(2,1,2)" = ARIMA(box_cox(GDP, lambda_us) ~ 1 + pdq(2,1,2)),
    "ARIMA(0,1,0)" = ARIMA(box_cox(GDP, lambda_us) ~ 1 + pdq(0,1,0)),
    "ARIMA(1,1,0)" = ARIMA(box_cox(GDP, lambda_us) ~ 1 + pdq(1,1,0))       
      )

glance(fit_others) |>
  arrange(AICc) |>
  select(.model:BIC)

## # A tibble: 3 × 6
##   .model       sigma2 log_lik   AIC  AICc   BIC
##   <chr>         <dbl>   <dbl> <dbl> <dbl> <dbl>
## 1 ARIMA(1,1,0)  5479.   -325.  657.  657.  663.
## 2 ARIMA(2,1,2)  5734.   -325.  662.  664.  674.
## 3 ARIMA(0,1,0)  6774.   -332.  668.  668.  672.

From above we can see that the model with the lowest AIC is the ARIMA(1,1,0) with drift, as it was previously suggested by the ARIMA() function.

d.choose what you think is the best model and check the residual diagnostics;

fit_us |>
  gg_tsresiduals()

# Ljung-Box Test for residual autocorrelation
augment(fit_us) |>
  features(.innov, ljung_box, lag = 10)

## # A tibble: 1 × 4
##   Country       .model                         lb_stat lb_pvalue
##   <fct>         <chr>                            <dbl>     <dbl>
## 1 United States ARIMA(box_cox(GDP, lambda_us))    3.81     0.955

The residuals show no significant autocorrelation (p = 0.955), indicating they resemble random noise. This suggests the ARIMA(1,1,0) model is a good fit for the data.

e.produce forecasts of your fitted model. Do the forecasts look reasonable?

fit_us |>
  forecast(h=10) |>
  autoplot(us_gdp) +
  labs(title = "ARIMA(1,1,0) forecast", y = "GDP" )

Yes, The forecasts look reasonable as they are following the trend contained within the data.

f.compare the results with what you would obtain using ETS() (with no transformation).

fit_ets <- us_gdp |>
  model(ETS(GDP))

fit_ets %>%
  forecast(h=10) |>
  autoplot(us_gdp) +
  labs(title = "ETS Forecast", y = "GDP" )

The ARIMA(1,1,0) and ETS models both follow the upward trend of GDP. However, the ETS model shows wider prediction intervals, indicating greater uncertainty in its future forecasts. This suggests that the ARIMA model may provide more stable and confident predictions for this dataset.