Data624_Homework6_ahmed

1a) Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

figure9.32

The three ACF plots look different mainly because the sample sizes are different.

With only 36 observations, the ACF shows larger random spikes.
With 360 observations, the spikes are smaller and closer to zero.
With 1000 observations, the spikes are even smaller and stay very close to zero.

Yes, all three figures are still consistent with white noise because there is no clear pattern in the autocorrelations, and most values stay near zero. Random variation is expected, especially for smaller samples.

1b) Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The critical values are at different distances from zero because they depend on the sample size. The approximate bounds are

\[ \pm \frac{1.96}{\sqrt{n}} \]

As the sample size \(n\) gets larger, the bounds get smaller and move closer to zero.

The autocorrelations are different in each figure because each plot is based on a different random sample. Even though all three series are white noise, random variation will make the sample autocorrelations look different. Smaller samples tend to have more variation, while larger samples tend to have autocorrelations closer to zero.

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

library(fpp3)

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──

## ✔ tibble      3.3.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.2     ✔ feasts      0.4.2
## ✔ lubridate   1.9.4     ✔ fable       0.5.0
## ✔ ggplot2     4.0.1

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

amazon <- gafa_stock |>
  filter(Symbol == "AMZN")

amazon |>
  autoplot(Close)

amazon |>
  ACF(Close) |>
  autoplot()

## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.

amazon |>
  PACF(Close) |>
  autoplot()

## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.

Answer:

The time plot shows that Amazon’s closing prices have a clear upward trend over time and do not fluctuate around a constant mean. This indicates the series is non-stationary.

The ACF remains very high and close to 1 for many lags and decreases slowly. This slow decay is a strong sign of non-stationarity, since a stationary series would show correlations dropping off quickly.

The PACF shows a very large spike at lag 1, with smaller values afterward. This pattern is also consistent with a non-stationary series and suggests that differencing is needed.

All three plots indicate that the series is non-stationary, so differencing should be applied before fitting an ARIMA model.

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

3a) Turkish GDP from global_economy.

lambda_gdp <- global_economy |>
  filter(Country == "Turkey") |>
  features(GDP, features = guerrero)

gdp_ts <- global_economy |>
  filter(Country == "Turkey") |>
  mutate(GDP = box_cox(GDP, lambda_gdp$lambda_guerrero))

gdp_ts |>
  gg_tsdisplay(difference(GDP))

## Warning: `gg_tsdisplay()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsdisplay()` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

3b) Accommodation takings in the state of Tasmania from aus_accommodation.

lambda_acc <- aus_accommodation |>
  filter(State == "Tasmania") |>
  features(Takings, features = guerrero)

acc_ts <- aus_accommodation |>
  filter(State == "Tasmania") |>
  mutate(Takings = box_cox(Takings, lambda_acc$lambda_guerrero))

acc_ts |>
  gg_tsdisplay(difference(Takings, 4))

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

3c) Monthly sales from souvenirs.

lambda_souv <- souvenirs |>
  features(Sales, features = guerrero)

souv_ts <- souvenirs |>
  mutate(Sales = box_cox(Sales, lambda_souv$lambda_guerrero))

souv_ts |>
  gg_tsdisplay(difference(Sales, 12))

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

Simulate and plot some data from simple ARIMA models.

6a) Use the following R code to generate data from an AR(1) model with φ₁ = 0.6 and σ² = 1. The process starts with y₁ = 0.

set.seed(1)

y <- numeric(100)
e <- rnorm(100)

for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]

tsibble(idx = 1:100, y = y, index = idx) |>
  autoplot(y)

6b) Produce a time plot for the series. How does the plot change as you change φ₁?

set.seed(1)

y <- numeric(100)
e <- rnorm(100)

for(i in 2:100)
  y[i] <- 0.9 * y[i-1] + e[i]

tsibble(idx = 1:100, y = y, index = idx) |>
  autoplot(y)

The time plot changes depending on the value of ( _1 ). When ( _1 ) is close to 0, the series looks more like random noise with little dependence between consecutive observations. As ( _1 ) becomes larger and positive, the series becomes smoother and shows longer runs above or below the mean, indicating stronger persistence. When ( _1 ) is close to 1, the series changes more slowly over time. If ( _1 ) is negative, the series tends to alternate more sharply from one observation to the next, producing a zig zag pattern. Overall, larger values of ( |_1| ) create stronger dependence in the series.

6c) Write your own code to generate data from an MA(1) model with θ₁ = 0.6 and σ² = 1.

set.seed(1)

e <- rnorm(101)
y <- numeric(100)

for(i in 1:100)
  y[i] <- e[i+1] + 0.6 * e[i]

tsibble(idx = 1:100, y = y, index = idx) |>
  autoplot(y)

6d) Produce a time plot for the series. How does the plot change as you change θ₁?

set.seed(1)

e <- rnorm(101)
y <- numeric(100)

for(i in 1:100)
  y[i] <- e[i+1] + 0.9 * e[i]

tsibble(idx = 1:100, y = y, index = idx) |>
  autoplot(y)

As ( _1 ) changes, the time plot changes in the short-term dependence between neighboring observations. When ( _1 ) is close to 0, the series looks more like white noise. As ( _1 ) increases in magnitude, consecutive observations show stronger local association. With a positive ( _1 ), nearby values tend to move together more, so the series appears to have slightly smoother short run swings. If ( _1 ) were negative, the series would show more short run alternation from one point to the next. Overall, changing ( _1 ) affects the local pattern of the series, but unlike an AR(1) model, the dependence does not persist for long over time.

6e) Generate data from an ARMA(1,1) model with φ₁ = 0.6, θ₁ = 0.6 and σ² = 1.

set.seed(1)

y <- numeric(100)
e <- rnorm(101)

for(i in 2:100)
  y[i] <- 0.6 * y[i-1] + e[i+1] + 0.6 * e[i]

tsibble(idx = 1:100, y = y, index = idx) |>
  autoplot(y)

6f) Generate data from an AR(2) model with φ₁ = −0.8, φ₂ = 0.3 and σ² = 1. (Note that these parameters will give a non-stationary series.)

set.seed(1)

y <- numeric(100)
e <- rnorm(100)

for(i in 3:100)
  y[i] <- -0.8 * y[i-1] + 0.3 * y[i-2] + e[i]

tsibble(idx = 1:100, y = y, index = idx) |>
  autoplot(y)

6g) Graph the latter two series and compare them.

set.seed(1)

y_arma <- numeric(100)
e1 <- rnorm(101)

for(i in 2:100)
  y_arma[i] <- 0.6 * y_arma[i-1] + e1[i+1] + 0.6 * e1[i]

y_ar2 <- numeric(100)
e2 <- rnorm(100)

for(i in 3:100)
  y_ar2[i] <- -0.8 * y_ar2[i-1] + 0.3 * y_ar2[i-2] + e2[i]

ts_arma <- tibble(idx = 1:100, y = y_arma, series = "ARMA(1,1)")
ts_ar2  <- tibble(idx = 1:100, y = y_ar2, series = "AR(2)")

bind_rows(ts_arma, ts_ar2) |>
  ggplot(aes(x = idx, y = y)) +
  geom_line() +
  facet_wrap(~series, scales = "free_y")

The ARMA(1,1) series appears stable and fluctuates around a roughly constant mean with fairly constant variance over time. In contrast, the AR(2) series shows oscillations whose amplitude increases dramatically, so it is not stable and does not appear stationary. The AR(2) process here is explosive, which is why the values grow larger over time, while the ARMA(1,1) process remains bounded and behaves like a stationary time series. Overall, the ARMA(1,1) plot looks realistic for a stationary series, whereas the AR(2) plot shows an unstable process.

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970–2011.

7a) Use ARIMA() to find an appropriate ARIMA model. What model was selected? Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

fit <- aus_airpassengers |>
  model(ARIMA(Passengers))

report(fit)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

glance(fit)

## # A tibble: 1 × 8
##   .model            sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
##   <chr>              <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
## 1 ARIMA(Passengers)   4.31   -97.0  198.  198.  202. <cpl [0]> <cpl [1]>

gg_tsresiduals(fit)

## Warning: `gg_tsresiduals()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsresiduals()` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

fit |>
  forecast(h = 10) |>
  autoplot(aus_airpassengers)

The ARIMA() function selected an ARIMA(0,2,1) model for the aus_airpassengers series. Based on the residual diagnostics, the residuals appear to be approximately white noise. The residual plot shows no clear systematic pattern over time, and the residual ACF does not show any significant autocorrelation since all spikes remain within the significance bounds. The histogram is centered near zero, although there are a few larger residuals. Overall, the residual checks suggest that the model is adequate.

The 10 period forecast indicates that passenger numbers are expected to continue increasing. The point forecast follows the upward pattern seen in the historical data, and the forecast intervals widen as the forecast horizon increases, showing greater uncertainty further into the future.

7b) Write the model in terms of the backshift operator.

(1 - phi1B)(1 - B)y[t] = c + (1 + theta1B)e[t]

7c) Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part (a).

fit_010 <- aus_airpassengers |>
  model(ARIMA(Passengers ~ pdq(0,1,0) + drift()))

## Warning: 1 error encountered for ARIMA(Passengers ~ pdq(0, 1, 0) + drift())
## [1] could not find function "drift"

fit_010 |>
  forecast(h = 10) |>
  autoplot(aus_airpassengers)

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).

The forecasts from an ARIMA(0,1,0) model with drift also show a continued upward trend, since this model projects future values by extending the average growth over time. Compared with part (a), the forecasts are fairly similar in direction, but the ARIMA(0,2,1) model from part (a) is more flexible because it allows for additional structure beyond a simple random walk with drift. The drift model gives a steadier linear increase, while the model in part (a) is better supported by the model selection process and residual diagnostics. Overall, both models predict growth in passenger numbers, but the model from part (a) is the preferred choice.

7d) Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts (a) and (c). Remove the constant and see what happens.

fit_212_drift <- aus_airpassengers |>
  model(ARIMA(Passengers ~ pdq(2,1,2) + drift()))

## Warning: 1 error encountered for ARIMA(Passengers ~ pdq(2, 1, 2) + drift())
## [1] could not find function "drift"

fit_212_drift |>
  forecast(h = 10) |>
  autoplot(aus_airpassengers)

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).

# remove constant

fit_212_noconstant <- aus_airpassengers |>
  model(ARIMA(Passengers ~ pdq(2,1,2) + 0))

## Warning: 1 error encountered for ARIMA(Passengers ~ pdq(2, 1, 2) + 0)
## [1] non-stationary AR part from CSS

fit_212_noconstant |>
  forecast(h = 10) |>
  autoplot(aus_airpassengers)

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).

The ARIMA(2,1,2) model with drift also produces forecasts that continue upward, similar to parts (a) and (c). Its forecasts are generally close to the ARIMA(0,1,0) with drift because both include a drift term that extends the long-run upward movement in the data. Compared with part (a), the forecasts may differ slightly in shape because the extra AR and MA terms allow more short run adjustment, but the overall direction remains increasing. When the constant is removed, the forecasts noticeably changes. Without the drift term, the model no longer forces a continued upward average increase, so the forecasts become much flatter. This shows that the long run growth in the forecasts is being driven mainly by the drift term. Overall, the models with drift are more consistent with the upward trend in the historical series, while removing the constant weakens that trend in the forecasts.

7e) Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

fit_021 <- aus_airpassengers |>
  model(ARIMA(Passengers ~ pdq(0,2,1)))

fit_021 |>
  forecast(h = 10) |>
  autoplot(aus_airpassengers)

The ARIMA(0,2,1) model with a constant produces forecasts that rise more quickly over time and show a curved upward pattern rather than a straight line increase. This happens because with second differencing, adding a constant implies a linear trend in the differenced series, which becomes an accelerating trend in the original series. As a result, the forecasts can grow unrealistically fast. This is why including a constant in a model with (d = 2) is often not appropriate.

For the United States GDP series (from global_economy):

If necessary, find a suitable Box-Cox transformation for the data.

us_gdp <- global_economy |>
  filter(Country == "United States")

lambda <- us_gdp |>
  features(GDP, guerrero)

lambda

## # A tibble: 1 × 2
##   Country       lambda_guerrero
##   <fct>                   <dbl>
## 1 United States           0.282

# transformation

us_gdp <- us_gdp |>
  mutate(GDP_bc = box_cox(GDP, lambda$lambda_guerrero))

Fit a suitable ARIMA model to the transformed data using ARIMA().

fit_arima <- us_gdp |>
  model(ARIMA(GDP_bc))

report(fit_arima)

## Series: GDP_bc 
## Model: ARIMA(1,1,0) w/ drift 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

Try some other plausible models by experimenting with the orders chosen.

fit_models <- us_gdp |>
  model(
    auto = ARIMA(GDP_bc),
    arima111 = ARIMA(GDP_bc ~ pdq(1,1,1)),
    arima011 = ARIMA(GDP_bc ~ pdq(0,1,1)),
    arima211 = ARIMA(GDP_bc ~ pdq(2,1,1)),
    arima212 = ARIMA(GDP_bc ~ pdq(2,1,2))
  )

glance(fit_models)

## # A tibble: 5 × 9
##   Country       .model   sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
##   <fct>         <chr>     <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
## 1 United States auto      5479.   -325.  657.  657.  663. <cpl [1]> <cpl [0]>
## 2 United States arima111  5580.   -325.  659.  659.  667. <cpl [1]> <cpl [1]>
## 3 United States arima011  5689.   -326.  659.  659.  665. <cpl [0]> <cpl [1]>
## 4 United States arima211  5647.   -325.  660.  661.  671. <cpl [2]> <cpl [1]>
## 5 United States arima212  5734.   -325.  662.  664.  674. <cpl [2]> <cpl [2]>

Choose what you think is the best model and check the residual diagnostics.

best_fit <- us_gdp |>
  model(ARIMA(GDP_bc))

report(best_fit)

## Series: GDP_bc 
## Model: ARIMA(1,1,0) w/ drift 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

gg_tsresiduals(best_fit)

Produce forecasts from your fitted model. Do the forecasts look reasonable?

best_fit |>
  forecast(h = 10) |>
  autoplot(us_gdp)

The forecasts look reasonable. They continue the upward pattern seen in the historical U.S. GDP series and do not show any sudden or unrealistic changes. The point forecasts increase steadily over the forecast horizon, which is consistent with the long run growth in the data. The prediction intervals widen as the horizon increases, showing greater uncertainty further into the future, which is what we would expect.

Compare the results with what you would obtain using ETS() (with no transformation).

fit_ets <- us_gdp |>
  model(ETS(GDP))

report(fit_ets)

## Series: GDP 
## Model: ETS(M,A,N) 
##   Smoothing parameters:
##     alpha = 0.9990876 
##     beta  = 0.5011949 
## 
##   Initial states:
##          l[0]        b[0]
##  448093333334 64917355687
## 
##   sigma^2:  7e-04
## 
##      AIC     AICc      BIC 
## 3190.787 3191.941 3201.089

fit_ets |>
  forecast(h = 10) |>
  autoplot(us_gdp)

Yes:

The ETS() forecasts grow faster and have much wider intervals than the transformed ARIMA forecasts. The ARIMA model gives steadier and more reasonable forecasts, while the untransformed ETS() model appears less stable here.

Data624_Homework6_ahmed_hassan

Ahmed Hassan

2026-03-11