Homework 6

Exercise 9.1: Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise?

Each plot in Figure 9.32 represents the ACF for a white noise series but differs in sample size (36, 360, and 1,000 observations). The key distinction among them is the range of the confidence intervals (dashed blue lines) and the magnitude of the autocorrelation values. With fewer observations (leftmost plot), the confidence intervals are wider, and the autocorrelations appear more variable due to the small sample size. As the number of observations increases (middle and right plots), the confidence intervals narrow, and the autocorrelations cluster closer to zero.

Despite these differences, all three figures indicate that the data are white noise, as there are no significant autocorrelations outside the confidence bounds. The fluctuations seen in the smaller sample sizes are expected due to random variation, but they do not imply any underlying structure in the data.

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The critical values (dashed blue lines) are at different distances from zero because they depend on sample size. The formula for the confidence interval at 95% is approximately ± 1.96 / 𝑁 ±1.96/ N . As 𝑁 N increases, the denominator grows, reducing the range of the critical values. This explains why the confidence intervals shrink as the sample size increases, reinforcing the precision of the estimates. Even though each plot represents white noise, the autocorrelations in smaller samples appear more pronounced due to random chance. With fewer data points, there is more room for random fluctuations to appear significant. However, as the sample size grows, these random variations average out, and the autocorrelations tend to settle closer to zero. This is why the rightmost plot (N = 1,000) appears the most stable, with values that are nearly indistinguishable from zero across all lags.

Exercise 9.2: A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

# Load required packages
library(fpp3)

## Warning: package 'fpp3' was built under R version 4.4.2

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.1
## ✔ lubridate   1.9.3     ✔ fable       0.4.1
## ✔ ggplot2     3.5.1

## Warning: package 'tsibble' was built under R version 4.4.2

## Warning: package 'tsibbledata' was built under R version 4.4.2

## Warning: package 'feasts' was built under R version 4.4.2

## Warning: package 'fabletools' was built under R version 4.4.2

## Warning: package 'fable' was built under R version 4.4.2

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

# Filter Amazon stock data
amzn_stock <- gafa_stock %>%
  filter(Symbol == "AMZN")

# Plot the time series, ACF, and PACF
amzn_stock %>%
  gg_tsdisplay(Close, plot_type = 'partial') +
  labs(title = "Amazon Closing Stock Price: Time Series, ACF, and PACF")

## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.

## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.

### Analysis of Non-Stationarity in Amazon Stock Prices Stock prices, such as Amazon’s, are a classic example of a non-stationary time series. The time plot of the closing prices exhibits a clear upward trend over time, indicating that the mean is not constant. This violates one of the key conditions of stationarity, which requires a constant mean over time.

The ACF (Autocorrelation Function) plot provides further evidence of non-stationarity. In a stationary time series, the ACF should decay quickly toward zero as lag increases. However, in this case, the ACF decreases slowly and remains significantly positive for a large number of lags. This gradual decay is a strong indicator of the presence of a trend in the data.

The PACF (Partial Autocorrelation Function) plot also supports this conclusion. The first lag in the PACF is significantly high, while subsequent lags show a sharp drop. This suggests that the series follows an autoregressive pattern, reinforcing the idea that the data needs to be differenced to achieve stationarity.

Exercise 9.3:For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Turkish GDP from global_economy.

library(fpp3)

# Filtering Turkish GDP data
turkish_gdp <- global_economy %>%
  filter(Country == "Turkey")

# Displaying time plot, ACF, and PACF
turkish_gdp %>%
  gg_tsdisplay(GDP, plot_type = 'partial')

# Finding optimal Box-Cox lambda
lambda <- turkish_gdp %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

lambda  # Output optimal lambda

## [1] 0.1572187

# Checking order of differencing
turkish_gdp %>%
  features(box_cox(GDP, lambda), unitroot_ndiffs)

## # A tibble: 1 × 2
##   Country ndiffs
##   <fct>    <int>
## 1 Turkey       1

# Displaying transformed and differenced data
turkish_gdp %>%
  gg_tsdisplay(difference(box_cox(GDP, lambda), 1), plot_type = 'partial')

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

#### Analysis: For Turkish GDP, the time series showed a strong upward trend, indicating non-stationarity. Using the Guerrero method, we determined an optimal Box-Cox transformation lambda of approximately 0.157. After applying the Box-Cox transformation, the Augmented Dickey-Fuller test suggested that first-order differencing (d = 1) was required to make the data stationary. Since the data is annual, seasonal differencing was not necessary. The transformed and differenced series appeared stationary, as confirmed by the ACF and PACF plots, which showed no remaining trends.

Accommodation takings in the state of Tasmania from aus_accommodation.

# Filtering Tasmania accommodation data
tasmania_acc <- aus_accommodation %>%
  filter(State == "Tasmania")

# Displaying time plot, ACF, and PACF
tasmania_acc %>%
  gg_tsdisplay(Takings, plot_type = 'partial')

# Finding optimal Box-Cox lambda
lambda <- tasmania_acc %>%
  features(Takings, features = guerrero) %>%
  pull(lambda_guerrero)

lambda  # Output optimal lambda

## [1] 0.001819643

# Checking seasonal differencing
tasmania_acc %>%
  features(box_cox(Takings, lambda), unitroot_nsdiffs)

## # A tibble: 1 × 2
##   State    nsdiffs
##   <chr>      <int>
## 1 Tasmania       1

# Applying seasonal differencing
tasmania_acc %>%
  gg_tsdisplay(difference(box_cox(Takings, lambda), 4), plot_type = 'partial')

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

# Checking additional differencing
tasmania_acc %>%
  features(difference(box_cox(Takings, lambda), 4), unitroot_ndiffs)

## # A tibble: 1 × 2
##   State    ndiffs
##   <chr>     <int>
## 1 Tasmania      0

Analysis:

Accommodation takings in Tasmania displayed both trend and strong seasonality (quarterly pattern). The optimal Box-Cox transformation lambda was approximately 0.002, which is very close to zero, indicating a log transformation would be appropriate. The seasonal unit root test suggested first-order seasonal differencing (lag = 4, for quarterly data). After this step, the series still displayed some residual trends, prompting an additional first-order regular differencing (d = 1). After these transformations, the series appeared stationary, with the ACF and PACF plots showing no remaining trends.

Monthly sales from souvenirs.

# Loading souvenir sales data
monthly_sales <- souvenirs

# Displaying time plot
monthly_sales %>%
  gg_tsdisplay(Sales)

# Finding optimal Box-Cox lambda
lambda <- monthly_sales %>%
  features(Sales, features = guerrero) %>%
  pull(lambda_guerrero)

lambda  # Output optimal lambda

## [1] 0.002118221

# Displaying Box-Cox transformed data
monthly_sales %>%
  gg_tsdisplay(box_cox(Sales, lambda))

# Checking seasonal differencing
monthly_sales %>%
  features(box_cox(Sales, lambda), unitroot_nsdiffs)

## # A tibble: 1 × 1
##   nsdiffs
##     <int>
## 1       1

# Applying seasonal differencing (lag = 12 for monthly data)
monthly_sales %>%
  gg_tsdisplay(difference(box_cox(Sales, lambda), 12))

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

# Checking additional differencing
monthly_sales %>%
  features(difference(box_cox(Sales, lambda), 12), unitroot_ndiffs)

## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      0

Analysis:

The monthly souvenir sales data exhibited strong seasonality, with annual cycles (lag = 12) visible in the ACF plot. The Guerrero method identified an optimal Box-Cox lambda of approximately 0.002, suggesting a transformation close to the log transformation. Seasonal differencing at was necessary to remove seasonality, and after this step, the Dickey-Fuller test suggested no further differencing was needed. The final transformed and differenced series showed no remaining trend or seasonality, confirming stationarity.

Exercise 9.5:For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

library(fpp3)

# Set seed for reproducibility
set.seed(12345678)

# Select a random time series from aus_retail
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1))

# Display initial time series data
myseries %>%
  gg_tsdisplay(Turnover)

# Finding the optimal Box-Cox transformation lambda
lambda <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

lambda  # Print lambda value

## [1] 0.08303631

# Apply Box-Cox transformation and check for seasonal differencing
myseries %>%
  gg_tsdisplay(box_cox(Turnover, lambda))

# Determine seasonal differencing order
myseries %>%
  features(box_cox(Turnover, lambda), unitroot_nsdiffs)

## # A tibble: 1 × 3
##   State              Industry                                            nsdiffs
##   <chr>              <chr>                                                 <int>
## 1 Northern Territory Clothing, footwear and personal accessory retailing       1

# Apply seasonal differencing (lag = 12 for monthly data)
myseries %>%
  gg_tsdisplay(difference(box_cox(Turnover, lambda), 12))

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

# Check if additional differencing is needed
myseries %>%
  features(difference(box_cox(Turnover, lambda), 12), unitroot_ndiffs)

## # A tibble: 1 × 3
##   State              Industry                                            ndiffs
##   <chr>              <chr>                                                <int>
## 1 Northern Territory Clothing, footwear and personal accessory retailing      0

# Apply regular differencing if necessary
myseries %>%
  gg_tsdisplay(difference(difference(box_cox(Turnover, lambda), 12), 1))

## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).

#### Analysis: The selected retail dataset represents monthly turnover data, which typically exhibits both seasonality and trend. To properly analyze stationarity, the following steps were taken:

Initial Exploration

Using gg_tsdisplay(), the data exhibited a strong positive trend and seasonal fluctuations, with higher turnover near the end of the year. The ACF plot showed significant autocorrelation, confirming non-stationarity.

Box-Cox Transformation

The Guerrero method identified an optimal Box-Cox transformation λ ≈ (value printed in lambda output), which was applied to stabilize variance.

Seasonal Differencing

Since the data is monthly, a seasonal difference of lag = 12 was applied. The ACF plot after seasonal differencing showed that the trend was removed, but there were still some persistent correlations, indicating additional differencing might be necessary.

Additional Differencing

A unit root test on the seasonally differenced series confirmed the need for a first-order regular differencing (d = 1). After applying both seasonal (lag = 12) and regular (d = 1) differencing, the ACF and PACF plots indicated stationarity, confirming that the final differencing order is (d = 1, D = 1, lag = 12).

Exercise 9.6: Simulate and plot some data from simple ARIMA models.

Use the following R code to generate data from an AR(1) model with ϕ1 = 0.6 and σ2=1.The process starts with y1=0.

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

###
autoplot(sim, y) + ggtitle("AR(1) Model with ϕ₁ = 0.6")

Produce a time plot for the series. How does the plot change as you change ϕ1?

library(gridExtra)

## Warning: package 'gridExtra' was built under R version 4.4.3

## 
## Attaching package: 'gridExtra'

## The following object is masked from 'package:dplyr':
## 
##     combine

set.seed(123)
y2 <- numeric(100)
y3 <- numeric(100)
e2 <- rnorm(100)
e3 <- rnorm(100)

for(i in 2:100){
    y2[i] <- -0.6 * y2[i-1] + e2[i]
    y3[i] <- 0.9 * y3[i-1] + e3[i]
}

sim <- tsibble(idx = seq_len(100), y = y, y2 = y2, y3 = y3, index = idx)

plot1 <- autoplot(sim, y2) + ggtitle("ϕ₁ = -0.6")
plot2 <- autoplot(sim, y) + ggtitle("ϕ₁ = 0.6")
plot3 <- autoplot(sim, y3) + ggtitle("ϕ₁ = 0.9")

grid.arrange(plot1, plot2, plot3, nrow = 2)

Analysis:

ϕ₁ = 0.6 → Smooth, autocorrelated pattern.
ϕ₁ = 0.9 → The influence of past values increases, making the series more persistent (closer to a random walk).
ϕ₁ = -0.6 → The series oscillates, as the negative coefficient causes values to switch signs frequently.

Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1.

set.seed(123)
y4 <- numeric(100)
e4 <- rnorm(100)

for(i in 2:100)
  y4[i] <- 0.6 * e4[i-1] + e4[i]

sim_ma1 <- tsibble(idx = seq_len(100), y4 = y4, index = idx)

autoplot(sim_ma1, y4) + ggtitle("MA(1) Model with θ₁ = 0.6")

#### Analysis: An MA(1) model introduces dependence between consecutive error terms. Unlike AR models, where past values influence the present, here the noise terms carry over some influence, creating short-term correlations.

Produce a time plot for the series. How does the plot change as you change θ1?

set.seed(123)
y5 <- numeric(100)
y6 <- numeric(100)
e5 <- rnorm(100)
e6 <- rnorm(100)

for(i in 2:100){
  y5[i] <- -0.6 * e5[i-1] + e5[i]
  y6[i] <- 0.9 * e6[i-1] + e6[i]
}

sim2 <- tsibble(idx = seq_len(100), y4 = y4, y5 = y5, y6 = y6, index = idx)

plot4 <- autoplot(sim2, y5) + ggtitle("θ₁ = -0.6")
plot5 <- autoplot(sim2, y4) + ggtitle("θ₁ = 0.6")
plot6 <- autoplot(sim2, y6) + ggtitle("θ₁ = 0.9")

grid.arrange(plot4, plot5, plot6, nrow = 2)

#### Analysis:

θ₁ = 0.6 → Smooth fluctuations caused by error term dependence.
θ₁ = -0.6 → More oscillations, as negative dependency causes reversals in movement.
θ₁ = 0.9 → Smoother trajectory, with higher dependency on past errors.

Generate data from an ARMA(1,1) model with ϕ1=0.6, θ1=0.6 and σ2=1.

set.seed(123)
y7 <- numeric(100)
e7 <- rnorm(100)

for(i in 2:100)
  y7[i] <- 0.6 * y7[i-1] + 0.6 * e7[i-1] + e7[i]

sim_arma <- tsibble(idx = seq_len(100), y7 = y7, index = idx)

autoplot(sim_arma, y7) + ggtitle("ARMA(1,1) Model with ϕ₁ = 0.6, θ₁ = 0.6")

#### Analysis: This ARMA(1,1) process combines features of both AR and MA models. The persistence from the AR(1) component and the short-term noise correlation from MA(1) interact, leading to a mix of smooth and fluctuating behaviors.

f.Generate data from an AR(2) model with ϕ1=-0.8, ϕ2=0.3 and σ2=1. (Note that these parameters will give a non-stationary series.)

set.seed(123)
y8 <- numeric(100)
e8 <- rnorm(100)

for(i in 3:100)
  y8[i] <- -0.8 * y8[i-1] + 0.3 * y8[i-2] + e8[i]

sim_ar2 <- tsibble(idx = seq_len(100), y8 = y8, index = idx)

autoplot(sim_ar2, y8) + ggtitle("AR(2) Model with ϕ₁ = -0.8, ϕ₂ = 0.3")

#### Analysis: An AR(2) process introduces a second lag, making the process more complex. Here, the negative ϕ₁ induces oscillations, while ϕ₂ contributes to a longer-term structure. The result is a series that fluctuates and exhibits increasing variance over time.

g.Graph the latter two series and compare them.

plot7 <- autoplot(sim_arma, y7) + ggtitle("ARMA(1,1)")
plot8 <- autoplot(sim_ar2, y8) + ggtitle("AR(2)")

grid.arrange(plot7, plot8, nrow = 2)

####Analysis: * ARMA(1,1): The combination of AR and MA elements results in a balanced series with short-term dependencies and a generally stable variance. * AR(2): The negative first-order coefficient induces oscillations, and the second-order coefficient contributes to non-stationarity, causing increasing variance over time.

Exercise 9.7:Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

# Load required libraries
library(fpp3)

# Load the dataset
data("aus_airpassengers")

# Train-test split
train <- aus_airpassengers %>%
  filter(Year <= 2011)

test <- aus_airpassengers %>%
  filter(Year > 2011)

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

# Fit an ARIMA model automatically
fit <- train %>%
  model(ARIMA(Passengers))

# Report the model
report(fit)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8756
## s.e.   0.0722
## 
## sigma^2 estimated as 4.671:  log likelihood=-87.8
## AIC=179.61   AICc=179.93   BIC=182.99

# Check residuals
fit %>% gg_tsresiduals()

# Forecast for the next 10 periods
fit %>%
  forecast(h=10) %>%
  autoplot(train)

#### Analysis: The automatic selection of the ARIMA model results in ARIMA(0,2,1). This model includes two levels of differencing and a first-order moving average component. The residuals appear like white noise, indicating a well-fitted model. The 10-period forecast follows the recent trend in the data.3

Write the model in terms of the backshift operator.

# Backshift notation of ARIMA(0,2,1)
# (1 - B)^2 Yt = εt + Θ1εt-1

The model in backshift notation shows that the second-order differencing is applied to the time series, with an MA(1) component added to account for past errors.

Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

# Fit ARIMA(0,1,0) with drift
fit2 <- train %>%
  model(ARIMA(Passengers ~ 1 + pdq(0,1,0)))

# Report the model
report(fit2)

## Series: Passengers 
## Model: ARIMA(0,1,0) w/ drift 
## 
## Coefficients:
##       constant
##         1.3669
## s.e.    0.3319
## 
## sigma^2 estimated as 4.629:  log likelihood=-89.08
## AIC=182.17   AICc=182.48   BIC=185.59

# Forecast for 10 periods
fit2 %>%
  forecast(h=10) %>%
  autoplot(train)

####Analysis: The ARIMA(0,1,0) model is a random walk with drift, meaning it assumes a constant growth rate. Compared to ARIMA(0,2,1), this model gives a slightly smoother forecast but still follows the overall trend. The confidence intervals are similar, though ARIMA(0,2,1) projects slightly higher future values.

Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

# Fit ARIMA(2,1,2) with drift
fit3 <- train %>%
  model(ARIMA(Passengers ~ 1 + pdq(2,1,2)))

# Report the model
report(fit3)

## Series: Passengers 
## Model: ARIMA(2,1,2) w/ drift 
## 
## Coefficients:
##          ar1      ar2      ma1     ma2  constant
##       1.4694  -0.5103  -1.5736  0.6780    0.0650
## s.e.  0.3780   0.3558   0.3081  0.2688    0.0294
## 
## sigma^2 estimated as 4.748:  log likelihood=-87.74
## AIC=187.47   AICc=189.94   BIC=197.75

# Forecast for 10 periods
fit3 %>%
  forecast(h=10) %>%
  autoplot(train)

# Fit ARIMA(2,1,2) without constant
fit4 <- train %>%
  model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))

## Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 2))
## [1] non-stationary AR part from CSS

# Report the model
report(fit4)

## Series: Passengers 
## Model: NULL model 
## NULL model

# Forecast for 10 periods
fit4 %>%
  forecast(h=10) %>%
  autoplot(train)

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).

#### Analysis: The ARIMA(2,1,2) model includes two autoregressive and two moving average terms. It captures more local fluctuations compared to previous models. When removing the constant, the model becomes non-stationary, and errors may occur, suggesting that the constant is necessary to maintain trend consistency.

Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

# Fit ARIMA(0,2,1) with a constant
fit5 <- train %>%
  model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))

## Warning: Model specification induces a quadratic or higher order polynomial trend. 
## This is generally discouraged, consider removing the constant or reducing the number of differences.

# Report the model
report(fit5)

## Series: Passengers 
## Model: ARIMA(0,2,1) w/ poly 
## 
## Coefficients:
##           ma1  constant
##       -1.0000    0.0721
## s.e.   0.0709    0.0260
## 
## sigma^2 estimated as 4.086:  log likelihood=-85.74
## AIC=177.48   AICc=178.15   BIC=182.55

# Forecast for 10 periods
fit5 %>%
  forecast(h=10) %>%
  autoplot(train)

#### Analysis: Including a constant in an ARIMA(0,2,1) model leads to an accumulating quadratic trend, which is generally discouraged. The warning message suggests that this model is inappropriate as it induces an artificially increasing trend over time.

Exercise 9.8: For the United States GDP series (from global_economy):

if necessary, find a suitable Box-Cox transformation for the data;

# Load required libraries
library(fpp3)

# Load the dataset for the United States
us_gdp <- global_economy %>%
  filter(Code == "USA") %>%
  select(Year, GDP)

# Plot the GDP data
us_gdp %>% autoplot(GDP)

# Compute the optimal Box-Cox transformation parameter
lambda <- us_gdp %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

lambda  # Output the lambda value

## [1] 0.2819443

# Apply Box-Cox transformation
us_gdp <- us_gdp %>%
  mutate(GDP_trans = box_cox(GDP, lambda))

# Plot transformed data
us_gdp %>% autoplot(GDP_trans)

#### Analysis: The Guerrero method suggests a Box-Cox transformation parameter of ~0.28, indicating some need for variance stabilization. The transformed GDP series shows a smoother trend and reduces the steep drop seen around 2008.

fit a suitable ARIMA model to the transformed data using ARIMA();

# Fit an ARIMA model to the transformed data
fit_arima <- us_gdp %>%
  model(ARIMA(GDP_trans))

# Report the model details
report(fit_arima)

## Series: GDP_trans 
## Model: ARIMA(1,1,0) w/ drift 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

# Check residuals
fit_arima %>% gg_tsresiduals()

#### Analysis: The automatic model selection suggests an ARIMA(1,1,0) with drift, meaning the model captures the trend using a first-order differencing and an autoregressive term. The residuals appear randomly distributed, which supports the model’s validity.

try some other plausible models by experimenting with the orders chosen;

# Explore different ARIMA models
fit_alternatives <- us_gdp %>%
  model(
    "ARIMA(2,1,2)" = ARIMA(GDP_trans ~ pdq(2,1,2)),
    "ARIMA(0,1,0)" = ARIMA(GDP_trans ~ pdq(0,1,0)),
    "ARIMA(1,1,0)" = ARIMA(GDP_trans ~ pdq(1,1,0)),
    "ARIMA(1,1,1)" = ARIMA(GDP_trans ~ pdq(1,1,1))
  )

# Compare model performance
fit_alternatives %>% glance()

## # A tibble: 4 × 8
##   .model       sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
##   <chr>         <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
## 1 ARIMA(2,1,2)  5734.   -325.  662.  664.  674. <cpl [2]> <cpl [2]>
## 2 ARIMA(0,1,0)  6774.   -332.  668.  668.  672. <cpl [0]> <cpl [0]>
## 3 ARIMA(1,1,0)  5479.   -325.  657.  657.  663. <cpl [1]> <cpl [0]>
## 4 ARIMA(1,1,1)  5580.   -325.  659.  659.  667. <cpl [1]> <cpl [1]>

Analysis:

Among the tested models, ARIMA(1,1,0) with drift has the lowest AICc value, confirming it as the best option. Other models, like ARIMA(2,1,2), add more complexity without significantly improving fit.

choose what you think is the best model and check the residual diagnostics;

# Fit the best ARIMA model
best_fit <- us_gdp %>%
  model(ARIMA(GDP_trans ~ pdq(1,1,0)))

# Residual diagnostics
best_fit %>% gg_tsresiduals()

#### Analysis: The residuals show no clear autocorrelation, confirming that ARIMA(1,1,0) sufficiently captures the GDP trend.

produce forecasts of your fitted model. Do the forecasts look reasonable?

# Forecast future GDP values
forecast_best <- best_fit %>%
  forecast(h = 10)

# Plot forecasts
forecast_best %>% autoplot(us_gdp)

#### Analysis: The forecast follows the historical trend, suggesting continued GDP growth. However, GDP growing indefinitely at an increasing rate is unrealistic, so external economic factors should be considered.

compare the results with what you would obtain using ETS() (with no transformation)

# Fit an ETS model
fit_ets <- us_gdp %>%
  model(ETS(GDP))

# Fit both ARIMA and ETS models on the original GDP (for fair comparison)
models_comparison <- us_gdp %>%
  model(
    "ARIMA(1,1,0)" = ARIMA(GDP ~ pdq(1,1,0)),  # Using original GDP
    "ETS" = ETS(GDP)  # Using original GDP
  )

# Compare AIC values
models_comparison %>% glance()

## # A tibble: 2 × 11
##   .model    sigma2 log_lik   AIC  AICc   BIC ar_roots ma_roots      MSE     AMSE
##   <chr>      <dbl>   <dbl> <dbl> <dbl> <dbl> <list>   <list>      <dbl>    <dbl>
## 1 ARIMA(… 3.04e+22  -1557. 3117. 3118. 3122. <cpl>    <cpl>    NA       NA      
## 2 ETS     6.78e- 4  -1590. 3191. 3192. 3201. <NULL>   <NULL>    2.79e22  1.20e23
## # ℹ 1 more variable: MAE <dbl>

# Forecast using ETS
fit_ets %>%
  forecast(h=10) %>%
  autoplot(us_gdp)

#### Analysis: The ETS model forecasts slower growth compared to ARIMA. The ARIMA model fits better (lower AICc) due to its ability to capture the GDP’s trend structure, while ETS struggles with its exponential smoothing approach.

Homework 6

Taha Malik

2025-03-23

Exercise 9.1: Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Exercise 9.2: A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

Exercise 9.3:For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Analysis:

Analysis:

Exercise 9.5:For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

Exercise 9.6: Simulate and plot some data from simple ARIMA models.

Exercise 9.7:Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

Exercise 9.8: For the United States GDP series (from global_economy):

Analysis: