DATA 624 HOMEWORK 6

Forecasting: Principles & Practice - Chapter 9 Exercises: 9.1, 9.2, 9.3, 9.5, 9.6, 9.7, 9.8

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise?

The differences among the three ACF plots are due to sample size. As the sample size increases, the ACF becomes more stable and closer to zero, and the confidence bounds become narrower. All three indicate white noise because there is no consistent pattern in the autocorrelations, and most values lie within the confidence bounds. The differences seen are due to random variation in smaller samples

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The critical values (blue dashed lines) are at different distances from zero because they depend on the sample size. The autocorrelations are different in each figure because of sampling variability. Even though all datasets are white noise, each sample is different, so the sample autocorrelations will vary randomly. Larger samples produce autocorrelations that are closer to zero.

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

# Filter Amazon stock
amazon <- gafa_stock %>%
  filter(Symbol == "AMZN")

# Plot time series
autoplot(amazon, Close) +
  labs(title = "Amazon Daily Closing Prices")

# ACF
amazon %>%
  ACF(Close) %>%
  autoplot()

## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.

# PACF
amazon %>%
  PACF(Close) %>%
  autoplot()

## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.

The time plot shows a trend, the ACF decays slowly, and the PACF shows significant early lags, all indicating non-stationarity. Therefore, the series should be differenced.

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Turkish GDP from global_economy.

turkey_gdp <- global_economy %>%
  filter(Country == "Turkey")

autoplot(turkey_gdp, GDP) +
  labs(title = "Turkish GDP", y = "GDP", x = "Year")

turkey_gdp %>%
  features(GDP, guerrero)

#Guerrero's lambda

lambda <- turkey_gdp %>%
  features(GDP, guerrero) %>%
  pull(lambda_guerrero)

lambda

## [1] 0.1572187

turkey_gdp %>%
  features(GDP, unitroot_ndiffs)

autoplot(turkey_gdp, box_cox(GDP, lambda)) +
  labs(title = "Box-Cox Turkish GDP",
       y = "Transformed GDP", x = "Year")

Accommodation takings in the state of Tasmania from aus_accommodation.

tas_accom <- aus_accommodation %>%
  filter(State == "Tasmania")

autoplot(tas_accom, Takings)

# Box-Cox
tas_accom %>%
  features(Takings, guerrero)

# Differencing
tas_accom %>%
  features(Takings, unitroot_ndiffs)

tas_accom %>%
  features(Takings, unitroot_nsdiffs)

Monthly sales from souvenirs.

autoplot(souvenirs)

## Plot variable not specified, automatically selected `.vars = Sales`

# Box-Cox
souvenirs %>%
  features(Sales, guerrero)

# Differencing
souvenirs %>%
  features(Sales, unitroot_ndiffs)

souvenirs %>%
  features(Sales, unitroot_nsdiffs)

For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

set.seed(1234)

myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1))

autoplot(myseries, Turnover)

# Box-Cox transformation
myseries %>%
  features(Turnover, guerrero)

#Regular differencing
myseries %>%
  features(Turnover, unitroot_ndiffs)

# Seasonal differencing
myseries %>%
  features(Turnover, unitroot_nsdiffs)

Simulate and plot some data from simple ARIMA models.

Use the following R code to generate data from an AR(1) model with \(\phi_1 = 0.6\) and \(\sigma^2 = 1\).The process starts with y1 = 0.

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

Produce a time plot for the series. How does the plot change as you change \(\phi_1\)?

autoplot(sim, y) +
  labs(title = "Simulated AR(1)",
       x = "Time", y = "y")

As ϕ1 gets larger and stays positive, the series becomes smoother and shows more persistence. When ϕ1 is close to 0, the series looks more like white noise.

Write your own code to generate data from an MA(1) model with \(\theta_1 = 0.6\) and \(\sigma^2 = 1\).

y <- numeric(100)
e <- rnorm(100)

for(i in 2:100){
  y[i] <- 0.6*e[i-1] + e[i]
}
sim_ma1 <- tsibble(idx = seq_len(100), y = y, index = idx)

Produce a time plot for the series. How does the plot change as you change \(\theta_1\)?

autoplot(sim_ma1, y) +
  labs(title = "MA(1) Series with theta = 0.6", x = "Time", y = "y")

As θ1 changes, the short-run dependence in the series changes.

Generate data from an ARMA(1,1) model with \(\phi_1 = 0.6\), \(\theta_1 = 0.6\) and \(\sigma^2 = 1\).

y <- numeric(100)
e <- rnorm(100)

for(i in 2:100){
  y[i] <- 0.6 * y[i-1] + e[i] + 0.6 * e[i-1]
}

sim_arma11 <- tsibble(idx = seq_len(100), y = y, index = idx)

Generate data from an AR(2) model with \(\phi_1 = -0.8\), \(\phi_2 = 0.3\) and \(\sigma^2 = 1\). (Note that these parameters will give a non-stationary series.)

y <- numeric(100)
e <- rnorm(100)

for(i in 3:100){
  y[i] <- -0.8 * y[i-1] + 0.3 * y[i-2] + e[i]
}

sim_ar2 <- tsibble(idx = seq_len(100), y = y, index = idx)

Graph the latter two series and compare them.

autoplot(sim_arma11, y) +
  labs(title = "ARMA(1,1) Series", x = "Time", y = "y")

autoplot(sim_ar2, y) +
  labs(title = "AR(2) Series", x = "Time", y = "y")

Overall, the ARMA(1,1) series looks more stable, while the AR(2) series appears more unstable and oscillatory.

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

data <- aus_airpassengers

fit <- data %>%
  model(ARIMA(Passengers))

report(fit)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

gg_tsresiduals(fit)

## Warning: `gg_tsresiduals()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsresiduals()` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

fc <- fit %>% forecast(h = 10)

autoplot(fc, data) +
  labs(title = "ARIMA Forecasts for Air Passengers",
       y = "Passengers (millions)")

Write the model in terms of the backshift operator.

\((1 - B)y_t = c + (1 + \theta_1 B)\varepsilon_t\). Where: B is the backshift operator, c is the drift term𝜃1 is the MA parameter.

Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

fit_rw <- data %>%
  model(ARIMA(Passengers ~ pdq(0,1,0)))

fc_rw <- fit_rw %>% forecast(h = 10)

autoplot(fc_rw, data) +
  labs(title = "ARIMA(0,1,0) with Drift Forecast")

Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

fit_212 <- data %>%
  model(ARIMA(Passengers ~ pdq(2,1,2)))

## Warning: It looks like you're trying to fully specify your ARIMA model but have not said
## if a constant should be included. You can include a constant using `ARIMA(y~1)`
## to the formula or exclude it by adding `ARIMA(y~0)`.

## Warning: 1 error encountered for ARIMA(Passengers ~ pdq(2, 1, 2))
## [1] Could not find an appropriate ARIMA model.
## This is likely because automatic selection does not select models with characteristic roots that may be numerically unstable.
## For more details, refer to https://otexts.com/fpp3/arima-r.html#plotting-the-characteristic-roots

fc_212 <- fit_212 %>% forecast(h = 10)

autoplot(fc_212, data) +
  labs(title = "ARIMA(2,1,2) with Drift Forecast")

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).

# Without constant (no drift)
fit_212_nodrift <- data %>%
  model(ARIMA(Passengers ~ pdq(2,1,2) + 0))

## Warning: 1 error encountered for ARIMA(Passengers ~ pdq(2, 1, 2) + 0)
## [1] non-stationary AR part from CSS

fc_212_nodrift <- fit_212_nodrift %>% forecast(h = 10)

autoplot(fc_212_nodrift, data) +
  labs(title = "ARIMA(2,1,2) without Drift Forecast")

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).

Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

fit_021 <- data %>%
  model(ARIMA(Passengers ~ pdq(0,2,1)))

fc_021 <- fit_021 %>% forecast(h = 10)

autoplot(fc_021, data) +
  labs(title = "ARIMA(0,2,1) with Constant Forecast")

The ARIMA(0,2,1) model applies second differencing,As a result, the forecasts may show curvature or instability and can behave unrealistically.

For the United States GDP series (from global_economy):

if necessary, find a suitable Box-Cox transformation for the data;

us_gdp <- global_economy %>%
  filter(Country == "United States")

autoplot(us_gdp, GDP)

us_gdp %>%
  features(GDP, guerrero)

fit a suitable ARIMA model to the transformed data using ARIMA();

fit_arima <- us_gdp %>%
  model(ARIMA(box_cox(GDP, 0)))  

report(fit_arima)

## Series: GDP 
## Model: ARIMA(0,2,1) 
## Transformation: box_cox(GDP, 0) 
## 
## Coefficients:
##           ma1
##       -0.6353
## s.e.   0.1138
## 
## sigma^2 estimated as 0.0004278:  log likelihood=139.76
## AIC=-275.53   AICc=-275.3   BIC=-271.48

try some other plausible models by experimenting with the orders chosen;

fit_alt <- us_gdp %>%
  model(
    arima1 = ARIMA(box_cox(GDP, 0) ~ pdq(0,1,1)),
    arima2 = ARIMA(box_cox(GDP, 0) ~ pdq(1,1,1)),
    arima3 = ARIMA(box_cox(GDP, 0) ~ pdq(2,1,2))
  )

report(fit_alt)

## Warning in report.mdl_df(fit_alt): Model reporting is only supported for
## individual models, so a glance will be shown. To see the report for a specific
## model, use `select()` and `filter()` to identify a single model.

choose what you think is the best model and check the residual diagnostics;

gg_tsresiduals(fit_arima)

The best model is ARIMA(1,1,1) because it has the lowest residual variance among the candidate models. The residuals appear to be uncorrelated and centered around zero.

produce forecasts of your fitted model. Do the forecasts look reasonable?

fc_arima <- fit_alt %>%
  select(arima2) %>%
  forecast(h = 10)

autoplot(us_gdp, GDP) +
  autolayer(fc_arima) +
  labs(title = "ARIMA(1,1,1) Forecasts for US GDP")

The forecasts show a smooth upward trend, which is consistent with historical GDP growth. The forecasts look reasonable.

compare the results with what you would obtain using ETS() (with no transformation).

fit_ets <- us_gdp %>%
  model(ETS(GDP))

fc_ets <- fit_ets %>% forecast(h = 10)

autoplot(fc_ets, us_gdp) +
  labs(title = "ETS Forecasts for US GDP")

**The ETS model (without transformation) also produces upward-trending forecasts.

DATA 624 HOMEWORK 6

Chanice Mckenzie

Forecasting: Principles & Practice - Chapter 9 Exercises: 9.1, 9.2, 9.3, 9.5, 9.6, 9.7, 9.8