HW 6

9.11 Exercises

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
1. Explain the differences among these figures. Do they all indicate that the data are white noise?
The three ACF plots differ because of their sample sizes. In the left plot with 36 observations, the autocorrelations appear more irregular and show larger spikes due to greater variability from the small sample. In the middle plot with 360 observations, the ACF becomes more stable, with smaller spikes that stay closer to zero. In the right plot with 1,000 observations, the autocorrelations are very close to zero with minimal fluctuation, reflecting a much more accurate estimate. Despite these differences, all three plots indicate white noise because there is no clear pattern and most values lie within the confidence bounds.
Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers.

Figure 9.32: Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers.
1. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?
  
  The critical values are different because they depend on sample size. Smaller samples have wider bounds since there is more uncertainty, while larger samples have narrower bounds because the estimates are more precise. The autocorrelations differ because of random variation. Even though all three are white noise, smaller samples show more fluctuation, while larger samples stay closer to zero.

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

The daily closing prices for Amazon display a clear upward trend, indicating that the mean changes over time and the series is non-stationary. In the ACF plot, the correlations stay high and decrease slowly across many lags instead of dropping off quickly, showing strong dependence over time. The PACF plot also has noticeable spikes at the first few lags, suggesting the presence of structure in the data. These features indicate that the series should be differenced to remove the trend and make it more stationary.

library(fpp3)

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──

## ✔ tibble      3.3.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.2     ✔ feasts      0.4.2
## ✔ lubridate   1.9.4     ✔ fable       0.5.0
## ✔ ggplot2     4.0.1

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

# Filter Amazon stock data
amazon <- gafa_stock %>%
  filter(Symbol == "AMZN")

# Plot daily closing prices
amazon %>%
  autoplot(Close) +
  labs(title = "Amazon Daily Closing Prices")

# ACF plot
amazon %>%
  ACF(Close) %>%
  autoplot() +
  labs(title = "ACF of Amazon Stock Prices")

## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.

# PACF plot
amazon %>%
  PACF(Close) %>%
  autoplot() +
  labs(title = "PACF of Amazon Stock Prices")

## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Turkish GDP from global_economy.

After log transformation and first differencing, the series fluctuates around zero with no clear trend, suggesting the data is now stationary. The variation looks fairly stable over time, so one difference is sufficient.

    # --- Turkish GDP ---
    turkey_gdp <- global_economy %>%
      filter(Country == "Turkey")

    # Find Box-Cox lambda
    lambda_gdp <- turkey_gdp %>%
      features(GDP, features = guerrero) %>%
      pull(lambda_guerrero)

    lambda_gdp

## [1] 0.1572187

    # Apply transformation and differencing
    turkey_gdp %>%
      mutate(GDP_trans = box_cox(GDP, lambda_gdp),
             GDP_diff = difference(GDP_trans)) %>%
      autoplot(GDP_diff) +
      labs(title = "Differenced Turkish GDP")

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

Accommodation takings in the state of Tasmania from aus_accommodation.

After log transformation and both seasonal and first differencing, the series centers around zero with no visible seasonal pattern. The fluctuations are consistent, indicating the trend and seasonality have been removed.

   # --- Tasmania Accommodation ---
tas_accom <- aus_accommodation %>%
  filter(State == "Tasmania")

# Find Box-Cox lambda
lambda_accom <- tas_accom %>%
  features(Takings, features = guerrero) %>%
  pull(lambda_guerrero)

lambda_accom

## [1] 0.001819643

# Apply transformation and differencing
tas_accom %>%
  mutate(Takings_trans = box_cox(Takings, lambda_accom),
         Takings_diff = difference(Takings_trans, lag = 12),
         Takings_diff2 = difference(Takings_diff)) %>%
  autoplot(Takings_diff2) +
  labs(title = "Differenced Tasmania Accommodation Takings")

## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).

Monthly sales from souvenirs.

After log transformation and both seasonal and first differencing, the series shows no clear trend or repeating seasonal pattern. The values move randomly around zero with stable variation, suggesting the series is now stationary.

    # --- Souvenir Sales ---
    souvenir <- souvenirs

    # Find Box-Cox lambda
    lambda_souvenir <- souvenir %>%
      features(Sales, features = guerrero) %>%
      pull(lambda_guerrero)

    lambda_souvenir

## [1] 0.002118221

    # Apply transformation and differencing
    souvenir %>%
      mutate(Sales_trans = box_cox(Sales, lambda_souvenir),
             Sales_diff = difference(Sales_trans, lag = 12),
             Sales_diff2 = difference(Sales_diff)) %>%
      autoplot(Sales_diff2) +
      labs(title = "Differenced Souvenir Sales")

## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).

For the souvenirs data, write down the differences you chose above using backshift operator notation.

For the souvenirs data, both a regular difference and a seasonal difference are used because the data shows a trend and yearly seasonality. In backshift notation, this is written as (1 - B)(1 - B^12) y_t, where (1 - B) removes the trend and (1 - B^12) removes the seasonal pattern that repeats every 12 periods.
For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

For the retail data, you first apply a transformation like a log to stabilize the variance. Then, you use a first difference to remove the trend and a seasonal difference if there is a repeating pattern, such as monthly seasonality. This means using one regular difference and one seasonal difference to make the data stationary.
Simulate and plot some data from simple ARIMA models.
1. Use the following R code to generate data from an AR(1) model with ϕ1=0.6 and σ2=1. The process starts with y1=0. The function rnorm 100 generates random noise with variance 1 which acts as the random shock at each time step. In this AR 1 process each value depends on 0.6 times the previous value plus this random noise so past values influence future ones. As a result the series shows positive correlation where values tend to move in the same direction over short periods. When plotted this creates smooth gradual movements with persistence so values change more steadily instead of jumping randomly like white noise.
```
    library(tsibble)
    library(ggplot2)

    set.seed(123)  # for reproducibility

    # simulate AR(1)
    y <- numeric(100)
    e <- rnorm(100)

    for(i in 2:100) {
      y[i] <- 0.6 * y[i-1] + e[i]
    }

    # create tsibble
    sim <- tsibble(idx = seq_len(100), y = y, index = idx)

    # plot
    autoplot(sim, y) +
      labs(title = "Simulated AR(1) Process (phi = 0.6)",
           y = "y",
           x = "Time")
```
```
```         
y <- numeric(100) e <- rnorm(100) for(i in 2:100)   y[i] <- 0.6*y[i-1] + e[i] sim <- tsibble(idx = seq_len(100), y = y, index = idx)
```
```
1. Produce a time plot for the series. How does the plot change as you change ϕ1?

set.seed(123)

phi_vals <- c(-0.6, 0.2, 0.6, 0.9)

sim_data <- data.frame()

for (phi in phi_vals) {
  y <- numeric(100)
  e <- rnorm(100)
  
  for (i in 2:100) {
    y[i] <- phi * y[i-1] + e[i]
  }
  
  temp <- data.frame(
    idx = 1:100,
    y = y,
    phi = as.factor(phi)
  )
  
  sim_data <- rbind(sim_data, temp)
}

sim_ts <- as_tsibble(sim_data, index = idx, key = phi)

autoplot(sim_ts, y) +
  labs(title = "AR(1) Processes for Different phi1 Values",
       x = "Time",
       y = "y")

**As phi1 changes the time series changes behavior. When phi1 is near 0 the series looks like white noise with little connection between values. As phi1 increases the series becomes smoother and more persistent so changes happen more slowly. When phi1 is close to 1 the series shows strong dependence and long trends. When phi1 is negative the values switch direction often creating a zigzag pattern.**

Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1.

set.seed(123)

# generate noise
e <- rnorm(100)

# initialize series
y <- numeric(100)

# MA(1) process
for(i in 2:100) {
  y[i] <- e[i] + 0.6 * e[i-1]
}

# view first few values
head(y)

## [1]  0.0000000 -0.5664629  1.4206018  1.0057334  0.1715928  1.7926376

Each value depends on the current noise plus 0.6 times the previous noise. This creates short term dependence between observations but unlike AR models it does not persist far into the future.

Produce a time plot for the series. How does the plot change as you change θ1?

set.seed(123)

theta_vals <- c(-0.6, 0, 0.6, 0.9)

sim_data <- data.frame()

for (theta in theta_vals) {
  e <- rnorm(100)
  y <- numeric(100)
  
  for (i in 2:100) {
    y[i] <- e[i] + theta * e[i-1]
  }
  
  temp <- data.frame(
    idx = 1:100,
    y = y,
    theta = as.factor(theta)
  )
  
  sim_data <- rbind(sim_data, temp)
}

ggplot(sim_data, aes(x = idx, y = y)) +
  geom_line() +
  facet_wrap(~theta, scales = "free_y") +
  labs(title = "MA(1) Processes for Different theta1 Values",
       x = "Time",
       y = "y")

Generate data from an ARMA(1,1) model with ϕ1=0.6, θ1=0.6 and σ2=1.

set.seed(123)

# generate noise
e <- rnorm(100)

# initialize series
y <- numeric(100)

# ARMA(1,1) process
for (i in 2:100) {
  y[i] <- 0.6 * y[i-1] + e[i] + 0.6 * e[i-1]
}

# view first few values
head(y)

## [1]  0.0000000 -0.5664629  1.0807241  1.6541678  1.1640935  2.4910937

Each value depends on the previous value and both the current and previous noise. This combines the persistence of an AR model with the short term shock effects of an MA model.

Generate data from an AR(2) model with ϕ1=−0.8, ϕ2=0.3 and σ2=1. (Note that these parameters will give a non-stationary series.)

set.seed(123)

# generate noise
e <- rnorm(100)

# initialize series
y <- numeric(100)

# set initial values
y[1] <- 0
y[2] <- 0

# AR(2) process
for (i in 3:100) {
  y[i] <- -0.8 * y[i-1] + 0.3 * y[i-2] + e[i]
}

# view first few values
head(y)

## [1]  0.000000  0.000000  1.558708 -1.176458  1.538067  0.131674

**Each value depends on the two previous values along with random noise. The negative phi1 makes the series alternate direction, while phi2 adds more influence from earlier values. These parameters make the series non stationary, so it may not stay around a constant mean and can show unstable behavior over time.**

Graph the latter two series and compare them.

set.seed(123)

# --- ARMA(1,1) ---
e1 <- rnorm(100)
y1 <- numeric(100)

for (i in 2:100) {
  y1[i] <- 0.6 * y1[i-1] + e1[i] + 0.6 * e1[i-1]
}

# --- AR(2) ---
e2 <- rnorm(100)
y2 <- numeric(100)

for (i in 3:100) {
  y2[i] <- -0.8 * y2[i-1] + 0.3 * y2[i-2] + e2[i]
}

# combine data
data <- data.frame(
  idx = rep(1:100, 2),
  y = c(y1, y2),
  series = rep(c("ARMA(1,1)", "AR(2)"), each = 100)
)

# plot
ggplot(data, aes(x = idx, y = y)) +
  geom_line() +
  facet_wrap(~series, scales = "free_y") +
  labs(title = "Comparison of ARMA(1,1) and AR(2) Series",
       x = "Time",
       y = "y")

The ARMA(1,1) series is smoother and more stable since it uses both past values and recent noise, keeping it close to a constant level. The AR(2) series is more uneven and unstable, with bigger swings and less consistency. Overall, the ARMA(1,1) series is steadier, while the AR(2) series is more volatile.

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.
1. Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

# Load data
data <- aus_airpassengers

# Fit ARIMA automatically
fit <- data %>%
  model(ARIMA(Passengers))

report(fit)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

#check for white noise
gg_tsresiduals(fit)

## Warning: `gg_tsresiduals()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsresiduals()` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

augment(fit) %>%
  features(.innov, ljung_box, lag = 24)

## # A tibble: 1 × 3
##   .model            lb_stat lb_pvalue
##   <chr>               <dbl>     <dbl>
## 1 ARIMA(Passengers)    24.0     0.463

#Forecasts For Next 10 Cycles

fc <- fit %>%
  forecast(h = 10)

autoplot(data, level = NULL) +
  autolayer(fc)

## Plot variable not specified, automatically selected `.vars = Passengers`

## Warning in geom_line(...): Ignoring unknown parameters: `level`

The selected model is ARIMA(0,1,1)(0,1,1)[12]. It applies both regular and seasonal differencing to remove trend and seasonality. The residuals behave like white noise, with no clear pattern and no significant autocorrelations. The forecasts indicate a continued upward trend with a repeating seasonal pattern and wider prediction intervals over time.

2.  Write the model in terms of the backshift operator.

yt=yt−1+yt−12−yt−13+εt+θ1εt−1+Θ1εt−12+θ1Θ1εt−13

3.  Plot forecasts from an ARIMA(0,1,0) model with drift and compare
    these to part a.

ata <- aus_airpassengers

# Model from part (a)
fit <- data %>%
  model(ARIMA(Passengers))

fc <- fit %>%
  forecast(h = 10)

# Random walk with drift
fit_rw <- data %>%
  model(ARIMA(Passengers ~ pdq(0,1,0) + 1))

fc_rw <- fit_rw %>%
  forecast(h = 10)

# Plot comparison
autoplot(data) +
  autolayer(fc, series = "Seasonal ARIMA", level = NULL) +
  autolayer(fc_rw, series = "RW with drift", level = NULL)

## Plot variable not specified, automatically selected `.vars = Passengers`

## Warning in geom_line(mapping = without(mapping, "shape"), data = unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters: `series`
## Ignoring unknown parameters: `series`

**The ARIMA(0,1,0) model with drift produces forecasts that follow a straight upward line, reflecting a constant growth rate over time. The prediction intervals widen as the forecast horizon increases, showing increasing uncertainty.

Compared to the ARIMA(0,1,1)(0,1,1)[12] model from part (a), the forecasts here do not capture any seasonal variation and are much smoother. The seasonal ARIMA model provides more detailed forecasts with repeating patterns, making it a better fit for the data.**

4.  Plot forecasts from an ARIMA(2,1,2) model with drift and compare
    these to parts a and c. Remove the constant and see what
    happens.

#with constant
fit_212_drift <- data %>%
  model(ARIMA(Passengers ~ pdq(2,1,2) + 1))

fc_212_drift <- fit_212_drift %>%
  forecast(h = 10)

#without constant
fit_212_nodrift <- data %>%
  model(ARIMA(Passengers ~ pdq(2,1,2) + 0))

## Warning: 1 error encountered for ARIMA(Passengers ~ pdq(2, 1, 2) + 0)
## [1] non-stationary AR part from CSS

fc_212_nodrift <- fit_212_nodrift %>%
  forecast(h = 10)

#all models plotted together
autoplot(data) +
  autolayer(fc, series = "Seasonal ARIMA", level = NULL) +
  autolayer(fc_rw, series = "RW with drift", level = NULL) +
  autolayer(fc_212_drift, series = "ARIMA(2,1,2) with drift", level = NULL) +
  autolayer(fc_212_nodrift, series = "ARIMA(2,1,2) no drift", level = NULL)

## Plot variable not specified, automatically selected `.vars = Passengers`

## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `series`

## Warning in geom_line(mapping = without(mapping, "shape"), data = unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters: `series`
## Ignoring unknown parameters: `series`
## Ignoring unknown parameters: `series`

## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).

The ARIMA(2,1,2) model with drift produces upward trending forecasts that are more flexible than the random walk model but still do not capture seasonality like the ARIMA(0,1,1)(0,1,1)[12] model. When the constant is removed, the forecasts flatten and lose the upward trend, showing that the drift term controls long-term growth.

5.  Plot forecasts from an ARIMA(0,2,1) model with a constant. What
    happens?

fit_021 <- data %>%
  model(ARIMA(Passengers ~ pdq(0,2,1) + 1))

## Warning: Model specification induces a quadratic or higher order polynomial trend.  This
## is generally discouraged, consider removing the constant or reducing the number
## of differences.

fc_021 <- fit_021 %>%
  forecast(h = 10)

autoplot(data) +
  autolayer(fc_021, level = NULL)

## Plot variable not specified, automatically selected `.vars = Passengers`

The ARIMA(0,2,1) model with a constant generates forecasts that follow a curved, accelerating pattern instead of a straight line. The prediction intervals expand quickly, reflecting greater uncertainty over time. This happens because second differencing with a constant produces a quadratic trend in the forecasts.

HW 6

Rebecca Bronstein

2026-03-19

9.11 Exercises