9.1

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

a: Explain the differences among these figures. Do they all indicate that the data are white noise?

The figures show autocorrelations that fluctuate randomly around zero, with no discernible pattern or consistently significant spikes beyond the confidence bounds. The main difference though is that as the sample size increases from 36 to 360 to 1,000, the blue critical bands narrow considerably, reflecting greater precision in estimating true zero autocorrelation. So each figure indicates the data are consistent with white noise, although the larger series provide stronger visual evidence.

b: Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The critical values depend on the sample size. When T increases, the standard error of the autocorrelation estimate also decreases, so the confidence bands narrow around zero. Each white noise series is an independent random draw, so the autocorrelation values vary due to sampling variability. So the differences across figures reflect both the effect of sample size on precision and random variation in the generated series.

9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

gafa_stock %>%
  filter(Symbol == "AMZN") %>%
  gg_tsdisplay(Close, plot_type = "partial")

## Warning: `gg_tsdisplay()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsdisplay()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning: Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.

The times series plot with the stock price shows an upward trend over time. The ACF plot bars decrease very slowly and stay large for many lags, this slow decrease means the data has a long memory or trend and is non-stationary. The PACF has a very large spike at lag 1 (close to 1), and then other lags are small.The pattern is usual for a random walk or highly persistent series. So since the mean is not stable, the series should be differenced by 1.

gafa_stock  %>%
  filter(Symbol == "AMZN")  %>%
  features(Close, unitroot_ndiffs)

9.3 For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Turkish GDP from global_economy.

global_economy  %>%
  filter(Country == "Turkey")  %>%
  gg_tsdisplay(GDP, plot_type='partial')

library(feasts)

#optimal lambda
lambda <- global_economy %>%
  filter(Country == "Turkey") %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

lambda

## [1] 0.1572187

global_economy %>% 
  filter(Country == "Turkey") %>% 
  mutate(GDP_clean = difference(box_cox(GDP, lambda))) %>% 
  gg_tsdisplay(GDP_clean, plot_type = "partial") +
  labs(
    title = "Turkish GDP After Box-Cox & Differencing",
    subtitle = paste("λ =", round(lambda, 2)),
    y = "Transformed & Differenced GDP",
    x = "Year"
  ) +
  theme_minimal()

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

#find order of differencing
global_economy  %>%
  filter(Country == "Turkey") %>%
  features(box_cox(GDP,lambda), unitroot_ndiffs)

From the non-transformed graph of the Turkish GDP, we see that it is not stationary and doesn’t follow a seasonal pattern. However, after transforming the graph using a box-cox transformation, the graphs become stationary. Lastly, the order of differencing needed for this is 1.

Accommodation takings in the state of Tasmania from aus_accommodation.

aus_accommodation  %>%
  filter(State == "Tasmania")  %>%
  gg_tsdisplay(Takings, plot_type='partial')

#optimal lambda
lambda <- aus_accommodation  %>%
  filter(State == "Tasmania") %>%
  features(Takings, features = guerrero)  %>%
  pull(lambda_guerrero)

lambda

## [1] 0.001819643

aus_accommodation %>% 
  filter(State == "Tasmania") %>%
  mutate(
    Takings_transformed = box_cox(Takings, lambda),
    Takings_clean = difference(Takings_transformed, 4)
  ) %>%
  gg_tsdisplay(Takings_clean, plot_type = "partial") +
  labs(
    title = "Tasmania Accommodation Takings",
    subtitle = paste("Box-Cox λ =", round(lambda, 2)),
    y = "Transformed & Differenced Takings",
    x = "Date"
  ) +
  theme_minimal()

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

#find order of differencing
aus_accommodation  %>%
  filter(State == "Tasmania")  %>%
  features(box_cox(Takings,lambda), unitroot_nsdiffs)

From the accommodations taking in Tasmania, we can see from the non-seasonal graph that it is not stationary and seasonal. From the 1 differincing and transformed graph, the graph is stationary. In addition, we can see that the PACF plot shortens after the first lag, and that the ACF shows that it decreases rapidly.

c.Monthly sales from souvenirs.

souvenirs  %>%
  gg_tsdisplay(Sales, plot_type='partial', lag = 48)

lambda <- souvenirs  %>%
  features(Sales, features = guerrero)  %>%
  pull(lambda_guerrero)

souvenirs %>% 
  mutate(
    Sales_clean = difference(box_cox(Sales, lambda), 12)
  ) %>% 
  gg_tsdisplay(Sales_clean, plot_type = "partial", lag = 48) +
  labs(
    title = paste("Differenced Monthly Souvenir Sales"),
    subtitle = paste("Box-Cox λ =", round(lambda, 2)),
    y = "Sales (transformed & differenced)",
    x = "Month"
  ) +
  theme_minimal()

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

souvenirs   %>%
  features(box_cox(Sales,lambda), unitroot_nsdiffs)

The non-transform graph of the monthly sovenir sales shows that they are non stationary and have a seasonal pattern. The transformed graph with a differencing of 1 show that the series is stationary, and that the PACF plot truncates after the first lag while the ACF decreases rapidly.

9.5 For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

aus_series <- aus_retail  %>% 
  filter(`Series ID` == sample(aus_retail$`Series ID`,1)) 
aus_series %>%  gg_tsdisplay(Turnover, plot_type='partial', lag = 36)

lambda <- aus_series %>% 
  features(Turnover, features = guerrero) %>% 
  pull(lambda_guerrero)

aus_series %>% 
  mutate(
    Turnover_clean = difference(box_cox(Turnover, lambda), 12)
  ) %>% 
  gg_tsdisplay(Turnover_clean, plot_type = "partial", lag = 36) +
  labs(
    title = "Queensland Food Retailing Turnover",
    subtitle = paste("After Box-Cox (λ =", round(lambda, 2), ") and seasonal differencing (lag 12)"),
    y = "Transformed & differenced",
    x = "Date"
  ) +
  theme_bw()

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

aus_series %>% 
  features(box_cox(Turnover, lambda), unitroot_nsdiffs)

The non-transformed graph of the retail data shows that it has a seasonal pattern and is not stationary. Through applying a differencing of 1 and the box-cox transformation, we can see that the data is now stationary, centering itself around zero.

9.6 Simulate and plot some data from simple ARIMA models.

Use the following R code to generate data from an AR(1) model with ϕ1 = 0.6 and σ2 = 1. The process starts with y1 = 0.

set.seed(245)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

Produce a time plot for the series. How does the plot change as you change ϕ1? from ϕ1 = .6

sim   %>% 
  autoplot(y)

changing ϕ1 = -1

for(i in 2:100)
  y[i] <- -1*y[i-1] + e[i]

tsibble(idx = seq_len(100), y = y, index = idx)  %>% 
  autoplot(y)

changing ϕ1 = 1

for(i in 2:100)
  y[i] <- 1*y[i-1] + e[i]

tsibble(idx = seq_len(100), y = y, index = idx)   %>% 
  autoplot(y)

changing ϕ1 = -.6

for(i in 2:100)
  y[i] <- -.6*y[i-1] + e[i]

tsibble(idx = seq_len(100), y = y, index = idx)  %>% 
  autoplot(y)

changing ϕ1 = 0

for(i in 2:100)
  y[i] <- -0*y[i-1] + e[i]

tsibble(idx = seq_len(100), y = y, index = idx)   %>% 
  autoplot(y)

changing ϕ1 = -.3

for(i in 2:100)
  y[i] <- -.3*y[i-1] + e[i]

tsibble(idx = seq_len(100), y = y, index = idx)   %>% 
  autoplot(y)

changing ϕ1 = .3

for(i in 2:100)
  y[i] <- .3*y[i-1] + e[i]

tsibble(idx = seq_len(100), y = y, index = idx)  %>% 
  autoplot(y)

As shown in the graphs, when ϕ1 increases from 0 to 1, the series becomes smoother and more persistent. As ϕ1 becomes negative, the series oscillates more wildly around the mean.

Write your own code to generate data from an MA(1) model with θ1 = 0.6 and σ2 = 1.

for(i in 2:100)
  y[i] <- e[i] + 0.6 * e[i-1]

sim2 <- tsibble(idx = seq_len(100), y = y, index = idx)

Produce a time plot for the series. How does the plot change as you change θ1.

MA(1) model with θ1 = 0.6 and σ2 = 1.

sim2   %>% 
  autoplot(y)

MA(1) model with θ1 = -1 and σ2 = 1.

for(i in 2:100)
  y[i] <- e[i] + -1 * e[i-1]

tsibble(idx = seq_len(100), y = y, index = idx)  %>% 
  autoplot(y)

MA(1) model with θ1 = 1 and σ2 = 1.

for(i in 2:100)
  y[i] <- e[i] + 1 * e[i-1]

tsibble(idx = seq_len(100), y = y, index = idx)  %>% 
  autoplot(y)

MA(1) model with θ1 = -.6 and σ2 = 1.

for(i in 2:100)
  y[i] <- e[i] + -.6 * e[i-1]

tsibble(idx = seq_len(100), y = y, index = idx)  %>% 
  autoplot(y)

MA(1) model with θ1 = 0 and σ2 = 1.

for(i in 2:100)
  y[i] <- e[i] + 0 * e[i-1]

tsibble(idx = seq_len(100), y = y, index = idx)  %>% 
  autoplot(y)

MA(1) model with θ1 = -.3 and σ2 = 1.

for(i in 2:100)
  y[i] <- e[i] + -.3 * e[i-1]

tsibble(idx = seq_len(100), y = y, index = idx)  %>% 
  autoplot(y)

MA(1) model with θ1 = .3 and σ2 = 1.

for(i in 2:100)
  y[i] <- e[i] + .3 * e[i-1]

tsibble(idx = seq_len(100), y = y, index = idx)  %>% 
  autoplot(y)

For an MA(1) model, when ϕ1 = 0 the series is white noise. Positive ϕ1 values smooth the series, making it more persistent and less volatile as ϕ1 increases. While negative ϕ1 vals cause wild oscillations around the mean, with more frequent spikes as ϕ1 decreases further.

Generate data from an ARMA(1,1) model with ϕ1=0.6, θ1=0.6 and σ2=1.

set.seed(1432)

y <- numeric(100)

for(i in 2:100)
  y[i] <- e[i] + 0.6 * e[i-1] + 0.6 * y[i-1]

arma_mod <- tsibble(idx = seq_len(100), y = y, index = idx)

Generate data from an AR(2) model with ϕ1=−0.8, ϕ2=0.3 and σ2=1. (Note that these parameters will give a non-stationary series.)

set.seed(2531)

y <- numeric(100)

for(i in 3:100)
  y[i] <- -0.8 * y[i-1] + 0.3 * y[i-2] + e[i]

arma_mod2 <- tsibble(idx = seq_len(100), y = y, index = idx)

Graph the latter two series and compare them.

ARMA(1,1) model with ϕ1=0.6, θ1=0.6 and σ2=1

arma_mod  %>% 
  gg_tsdisplay(y, plot_type='partial') +
  theme_light()

AR(2) model with ϕ1=−0.8, ϕ2=0.3 and σ2=1

arma_mod2  %>% 
  gg_tsdisplay(y, plot_type='partial') +
  theme_light()

The ARMA(1,1) model produced a stationary, random-looking series. The ACF decays to zero and its PACF cuts off after lag 1, indicating moderate influence from both the previous value and previous error. However, the AR(2) model is non-stationary, causing the series to oscillate between positive and negative values with exponentially growing magnitude. Its ACF oscillates without decaying, and its PACF shows only a negative spike at lag 1, failing the stationarity constraint (ϕ₂ − ϕ₁ < 1).

9.7 Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

ARIMA(0,2,1) was reported with the most appropriate fit

fit_mod <- aus_airpassengers   %>%
  filter(Year < 2012)  %>%
  model(ARIMA(Passengers)) 

report(fit_mod)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8756
## s.e.   0.0722
## 
## sigma^2 estimated as 4.671:  log likelihood=-87.8
## AIC=179.61   AICc=179.93   BIC=182.99

fit_mod   %>%
  gg_tsresiduals()

## Warning: `gg_tsresiduals()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsresiduals()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

fit_mod   %>%
  forecast(h = 10)   %>%
  autoplot(aus_airpassengers)

From the series plot, it appears to be white noise.

Write the model in terms of the backshift operator.

y_t = -0.87 * e_{t-1} + e_t (1 - B)^2 * y_t = (1 - 0.87 * B) * e_t

Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

fit_mod2 <-aus_airpassengers   %>%
  filter(Year < 2012)  %>%
  model(ARIMA(Passengers ~ pdq(0,1,0)))

fit_mod2   %>%
  gg_tsresiduals()

fit_mod2   %>%
  forecast(h = 10)   %>%
  autoplot(aus_airpassengers)

The ARIMA(0,2,1) model underestimates passenger numbers, while this model overestimated them.

Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

ARIMA(2,1,2) model with drift

fit_mod3 <-aus_airpassengers %>%
  filter(Year < 2012)  %>%
  model(ARIMA(Passengers ~ pdq(2,1,2)))

fit_mod3 %>%
  forecast(h = 10)   %>%
  autoplot(aus_airpassengers)

fit_mod3 %>%
  gg_tsresiduals()

Now we remove the constant from the model

fit_mod3 <-aus_airpassengers   %>%
  filter(Year < 2012)   %>%
  model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))

## Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 2))
## [1] non-stationary AR part from CSS

fit_mod3 %>%
  forecast(h = 10)   %>%
  autoplot(aus_airpassengers)

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).

report(fit_mod3)

## Series: Passengers 
## Model: NULL model 
## NULL model

The model is similar to that of ARIMA(0,2,1), with the residuals seeming to be white noise. When the constant is removed from the model, there does not seem to be a forecast.

Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

fit_mod4 <-aus_airpassengers   %>%
  filter(Year < 2012)   %>%
  model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))

## Warning: Model specification induces a quadratic or higher order polynomial trend.  This
## is generally discouraged, consider removing the constant or reducing the number
## of differences.

fit_mod4 %>%
  forecast(h = 10)   %>%
  autoplot(aus_airpassengers)

Through the inclusion of a constant, the slope of the forecast appears to be steeper, indicating a greater slope. However, since the model induces a higher order polynomial trend, it is encouraged to remove the constant or reducing the amount of differences.

9.8 For the United States GDP series (from global_economy):

if necessary, find a suitable Box-Cox transformation for the data;

lambda <- global_economy %>%
  filter(Code == "USA") |>
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

lambda

## [1] 0.2819443

global_economy  %>%
  filter(Code == "USA") %>%
  gg_tsdisplay(box_cox(GDP, lambda), plot_type = 'partial')

fit a suitable ARIMA model to the transformed data using ARIMA();

ARIMA(1,1,0) w/ drift

us_econ_arima <- global_economy %>%
  filter(Code == "USA")  %>%
  model(ARIMA(box_cox(GDP, lambda))) 

report(us_econ_arima)

## Series: GDP 
## Model: ARIMA(1,1,0) w/ drift 
## Transformation: box_cox(GDP, lambda) 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

us_econ_arima  %>%
  gg_tsresiduals()

try some other plausible models by experimenting with the orders chosen;

ARIMA(1,1,2) seems likely model with lowest AICc values

test_fits <- global_economy  %>%
  filter(Code == "USA")  %>%
  model(arima_110 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,1,0)),
        arima_112 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,1,2)),
        arima_120 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,2,0)),
        arima_210 = ARIMA(box_cox(GDP,lambda) ~ pdq(2,1,0)),
        arima_310 = ARIMA(box_cox(GDP,lambda) ~ pdq(3,1,0))
        )

report(test_fits) %>%
  arrange(AICc)

## Warning in report.mdl_df(test_fits): Model reporting is only supported for
## individual models, so a glance will be shown. To see the report for a specific
## model, use `select()` and `filter()` to identify a single model.

choose what you think is the best model and check the residual diagnostics;

From ARIMA(1,1,2) model time series, it appears to be white noise.

test_fits  %>%
  select(arima_120)  %>%
  gg_tsresiduals()

produce forecasts of your fitted model. Do the forecasts look reasonable?

Yes, the forecasts of my fitted model look reasonable, as it projects 10 years into the future. From the time series forecast graph, we can see that the graph has been generally increasing in slope overtime. The ARIMA forecast I chose, matches closely to the trajectory of the US economy is going towards.

test_fits  %>%
  forecast(h=10)  %>%
  filter(.model=='arima_120') %>%
  autoplot(global_economy)

compare the results with what you would obtain using ETS() (with no transformation).

It appears from the ETS model that the slope is less steep than the model I chose, however the ETS forecasting band is much wider confidence level than the ARIMA model, showing that there is greater uncertainty with it than the ARIMA model.

us_ets_fit <- global_economy %>%
  filter(Code == "USA") %>%
  model(ETS(GDP))

us_ets_fit  %>%
  forecast(h = 10) %>%
  autoplot(global_economy)

DATA624

Jackie Yee

2026-03-08

9.1

9.2

9.3 For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

9.5 For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

9.6 Simulate and plot some data from simple ARIMA models.

9.7 Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

9.8 For the United States GDP series (from global_economy):