1a) Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
The three ACF plots look different mainly because the sample sizes are different.
Yes, all three figures are still consistent with white noise because there is no clear pattern in the autocorrelations, and most values stay near zero. Random variation is expected, especially for smaller samples.
1b) Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?
The critical values are at different distances from zero because they depend on the sample size. The approximate bounds are
\[ \pm \frac{1.96}{\sqrt{n}} \]
As the sample size \(n\) gets larger, the bounds get smaller and move closer to zero.
The autocorrelations are different in each figure because each plot is based on a different random sample. Even though all three series are white noise, random variation will make the sample autocorrelations look different. Smaller samples tend to have more variation, while larger samples tend to have autocorrelations closer to zero.
library(fpp3)
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──
## ✔ tibble 3.3.1 ✔ tsibble 1.1.6
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.2 ✔ feasts 0.4.2
## ✔ lubridate 1.9.4 ✔ fable 0.5.0
## ✔ ggplot2 4.0.1
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
amazon <- gafa_stock |>
filter(Symbol == "AMZN")
amazon |>
autoplot(Close)
amazon |>
ACF(Close) |>
autoplot()
## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.
amazon |>
PACF(Close) |>
autoplot()
## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.
Answer:
The time plot shows that Amazon’s closing prices have a clear upward trend over time and do not fluctuate around a constant mean. This indicates the series is non-stationary.
The ACF remains very high and close to 1 for many lags and decreases slowly. This slow decay is a strong sign of non-stationarity, since a stationary series would show correlations dropping off quickly.
The PACF shows a very large spike at lag 1, with smaller values afterward. This pattern is also consistent with a non-stationary series and suggests that differencing is needed.
All three plots indicate that the series is non-stationary, so differencing should be applied before fitting an ARIMA model.
3a) Turkish GDP from global_economy.
lambda_gdp <- global_economy |>
filter(Country == "Turkey") |>
features(GDP, features = guerrero)
gdp_ts <- global_economy |>
filter(Country == "Turkey") |>
mutate(GDP = box_cox(GDP, lambda_gdp$lambda_guerrero))
gdp_ts |>
gg_tsdisplay(difference(GDP))
## Warning: `gg_tsdisplay()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsdisplay()` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
3b) Accommodation takings in the state of Tasmania from
aus_accommodation.
lambda_acc <- aus_accommodation |>
filter(State == "Tasmania") |>
features(Takings, features = guerrero)
acc_ts <- aus_accommodation |>
filter(State == "Tasmania") |>
mutate(Takings = box_cox(Takings, lambda_acc$lambda_guerrero))
acc_ts |>
gg_tsdisplay(difference(Takings, 4))
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
3c) Monthly sales from souvenirs.
lambda_souv <- souvenirs |>
features(Sales, features = guerrero)
souv_ts <- souvenirs |>
mutate(Sales = box_cox(Sales, lambda_souv$lambda_guerrero))
souv_ts |>
gg_tsdisplay(difference(Sales, 12))
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
6a) Use the following R code to generate data from an AR(1) model with φ₁ = 0.6 and σ² = 1. The process starts with y₁ = 0.
set.seed(1)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
tsibble(idx = 1:100, y = y, index = idx) |>
autoplot(y)
6b) Produce a time plot for the series. How does the plot change as you change φ₁?
set.seed(1)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.9 * y[i-1] + e[i]
tsibble(idx = 1:100, y = y, index = idx) |>
autoplot(y)
The time plot changes depending on the value of ( _1 ). When ( _1 ) is close to 0, the series looks more like random noise with little dependence between consecutive observations. As ( _1 ) becomes larger and positive, the series becomes smoother and shows longer runs above or below the mean, indicating stronger persistence. When ( _1 ) is close to 1, the series changes more slowly over time. If ( _1 ) is negative, the series tends to alternate more sharply from one observation to the next, producing a zig zag pattern. Overall, larger values of ( |_1| ) create stronger dependence in the series.
6c) Write your own code to generate data from an MA(1) model with θ₁ = 0.6 and σ² = 1.
set.seed(1)
e <- rnorm(101)
y <- numeric(100)
for(i in 1:100)
y[i] <- e[i+1] + 0.6 * e[i]
tsibble(idx = 1:100, y = y, index = idx) |>
autoplot(y)
6d) Produce a time plot for the series. How does the plot change as you change θ₁?
set.seed(1)
e <- rnorm(101)
y <- numeric(100)
for(i in 1:100)
y[i] <- e[i+1] + 0.9 * e[i]
tsibble(idx = 1:100, y = y, index = idx) |>
autoplot(y)
As ( _1 ) changes, the time plot changes in the short-term dependence
between neighboring observations. When ( _1 ) is close to 0, the series
looks more like white noise. As ( _1 ) increases in magnitude,
consecutive observations show stronger local association. With a
positive ( _1 ), nearby values tend to move together more, so the series
appears to have slightly smoother short run swings. If ( _1 ) were
negative, the series would show more short run alternation from one
point to the next. Overall, changing ( _1 ) affects the local pattern of
the series, but unlike an AR(1) model, the dependence does not persist
for long over time.
6e) Generate data from an ARMA(1,1) model with φ₁ = 0.6, θ₁ = 0.6 and σ² = 1.
set.seed(1)
y <- numeric(100)
e <- rnorm(101)
for(i in 2:100)
y[i] <- 0.6 * y[i-1] + e[i+1] + 0.6 * e[i]
tsibble(idx = 1:100, y = y, index = idx) |>
autoplot(y)
6f) Generate data from an AR(2) model with φ₁ = −0.8, φ₂ = 0.3 and σ² = 1. (Note that these parameters will give a non-stationary series.)
set.seed(1)
y <- numeric(100)
e <- rnorm(100)
for(i in 3:100)
y[i] <- -0.8 * y[i-1] + 0.3 * y[i-2] + e[i]
tsibble(idx = 1:100, y = y, index = idx) |>
autoplot(y)
6g) Graph the latter two series and compare them.
set.seed(1)
y_arma <- numeric(100)
e1 <- rnorm(101)
for(i in 2:100)
y_arma[i] <- 0.6 * y_arma[i-1] + e1[i+1] + 0.6 * e1[i]
y_ar2 <- numeric(100)
e2 <- rnorm(100)
for(i in 3:100)
y_ar2[i] <- -0.8 * y_ar2[i-1] + 0.3 * y_ar2[i-2] + e2[i]
ts_arma <- tibble(idx = 1:100, y = y_arma, series = "ARMA(1,1)")
ts_ar2 <- tibble(idx = 1:100, y = y_ar2, series = "AR(2)")
bind_rows(ts_arma, ts_ar2) |>
ggplot(aes(x = idx, y = y)) +
geom_line() +
facet_wrap(~series, scales = "free_y")
The ARMA(1,1) series appears stable and fluctuates around a roughly constant mean with fairly constant variance over time. In contrast, the AR(2) series shows oscillations whose amplitude increases dramatically, so it is not stable and does not appear stationary. The AR(2) process here is explosive, which is why the values grow larger over time, while the ARMA(1,1) process remains bounded and behaves like a stationary time series. Overall, the ARMA(1,1) plot looks realistic for a stationary series, whereas the AR(2) plot shows an unstable process.
aus_airpassengers, the total number of
passengers (in millions) from Australian air carriers for the period
1970–2011.7a) Use ARIMA() to find an appropriate ARIMA model. What
model was selected? Check that the residuals look like white noise. Plot
forecasts for the next 10 periods.
fit <- aus_airpassengers |>
model(ARIMA(Passengers))
report(fit)
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8963
## s.e. 0.0594
##
## sigma^2 estimated as 4.308: log likelihood=-97.02
## AIC=198.04 AICc=198.32 BIC=201.65
glance(fit)
## # A tibble: 1 × 8
## .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 ARIMA(Passengers) 4.31 -97.0 198. 198. 202. <cpl [0]> <cpl [1]>
gg_tsresiduals(fit)
## Warning: `gg_tsresiduals()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsresiduals()` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
fit |>
forecast(h = 10) |>
autoplot(aus_airpassengers)
The ARIMA() function selected an ARIMA(0,2,1) model for the aus_airpassengers series. Based on the residual diagnostics, the residuals appear to be approximately white noise. The residual plot shows no clear systematic pattern over time, and the residual ACF does not show any significant autocorrelation since all spikes remain within the significance bounds. The histogram is centered near zero, although there are a few larger residuals. Overall, the residual checks suggest that the model is adequate.
The 10 period forecast indicates that passenger numbers are expected to continue increasing. The point forecast follows the upward pattern seen in the historical data, and the forecast intervals widen as the forecast horizon increases, showing greater uncertainty further into the future.
7b) Write the model in terms of the backshift operator.
(1 - phi1B)(1 - B)y[t] = c + (1 + theta1B)e[t]
7c) Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part (a).
fit_010 <- aus_airpassengers |>
model(ARIMA(Passengers ~ pdq(0,1,0) + drift()))
## Warning: 1 error encountered for ARIMA(Passengers ~ pdq(0, 1, 0) + drift())
## [1] could not find function "drift"
fit_010 |>
forecast(h = 10) |>
autoplot(aus_airpassengers)
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).
The forecasts from an ARIMA(0,1,0) model with drift also show a continued upward trend, since this model projects future values by extending the average growth over time. Compared with part (a), the forecasts are fairly similar in direction, but the ARIMA(0,2,1) model from part (a) is more flexible because it allows for additional structure beyond a simple random walk with drift. The drift model gives a steadier linear increase, while the model in part (a) is better supported by the model selection process and residual diagnostics. Overall, both models predict growth in passenger numbers, but the model from part (a) is the preferred choice.
7d) Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts (a) and (c). Remove the constant and see what happens.
fit_212_drift <- aus_airpassengers |>
model(ARIMA(Passengers ~ pdq(2,1,2) + drift()))
## Warning: 1 error encountered for ARIMA(Passengers ~ pdq(2, 1, 2) + drift())
## [1] could not find function "drift"
fit_212_drift |>
forecast(h = 10) |>
autoplot(aus_airpassengers)
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).
# remove constant
fit_212_noconstant <- aus_airpassengers |>
model(ARIMA(Passengers ~ pdq(2,1,2) + 0))
## Warning: 1 error encountered for ARIMA(Passengers ~ pdq(2, 1, 2) + 0)
## [1] non-stationary AR part from CSS
fit_212_noconstant |>
forecast(h = 10) |>
autoplot(aus_airpassengers)
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).
The ARIMA(2,1,2) model with drift also produces forecasts that continue upward, similar to parts (a) and (c). Its forecasts are generally close to the ARIMA(0,1,0) with drift because both include a drift term that extends the long-run upward movement in the data. Compared with part (a), the forecasts may differ slightly in shape because the extra AR and MA terms allow more short run adjustment, but the overall direction remains increasing. When the constant is removed, the forecasts noticeably changes. Without the drift term, the model no longer forces a continued upward average increase, so the forecasts become much flatter. This shows that the long run growth in the forecasts is being driven mainly by the drift term. Overall, the models with drift are more consistent with the upward trend in the historical series, while removing the constant weakens that trend in the forecasts.
7e) Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?
fit_021 <- aus_airpassengers |>
model(ARIMA(Passengers ~ pdq(0,2,1)))
fit_021 |>
forecast(h = 10) |>
autoplot(aus_airpassengers)
The
ARIMA(0,2,1) model with a constant produces forecasts
that rise more quickly over time and show a curved upward pattern rather
than a straight line increase. This happens because with second
differencing, adding a constant implies a linear trend in the
differenced series, which becomes an accelerating trend in the original
series. As a result, the forecasts can grow unrealistically fast. This
is why including a constant in a model with (d = 2) is often not
appropriate.
global_economy):us_gdp <- global_economy |>
filter(Country == "United States")
lambda <- us_gdp |>
features(GDP, guerrero)
lambda
## # A tibble: 1 × 2
## Country lambda_guerrero
## <fct> <dbl>
## 1 United States 0.282
# transformation
us_gdp <- us_gdp |>
mutate(GDP_bc = box_cox(GDP, lambda$lambda_guerrero))
ARIMA().fit_arima <- us_gdp |>
model(ARIMA(GDP_bc))
report(fit_arima)
## Series: GDP_bc
## Model: ARIMA(1,1,0) w/ drift
##
## Coefficients:
## ar1 constant
## 0.4586 118.1822
## s.e. 0.1198 9.5047
##
## sigma^2 estimated as 5479: log likelihood=-325.32
## AIC=656.65 AICc=657.1 BIC=662.78
fit_models <- us_gdp |>
model(
auto = ARIMA(GDP_bc),
arima111 = ARIMA(GDP_bc ~ pdq(1,1,1)),
arima011 = ARIMA(GDP_bc ~ pdq(0,1,1)),
arima211 = ARIMA(GDP_bc ~ pdq(2,1,1)),
arima212 = ARIMA(GDP_bc ~ pdq(2,1,2))
)
glance(fit_models)
## # A tibble: 5 × 9
## Country .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 United States auto 5479. -325. 657. 657. 663. <cpl [1]> <cpl [0]>
## 2 United States arima111 5580. -325. 659. 659. 667. <cpl [1]> <cpl [1]>
## 3 United States arima011 5689. -326. 659. 659. 665. <cpl [0]> <cpl [1]>
## 4 United States arima211 5647. -325. 660. 661. 671. <cpl [2]> <cpl [1]>
## 5 United States arima212 5734. -325. 662. 664. 674. <cpl [2]> <cpl [2]>
best_fit <- us_gdp |>
model(ARIMA(GDP_bc))
report(best_fit)
## Series: GDP_bc
## Model: ARIMA(1,1,0) w/ drift
##
## Coefficients:
## ar1 constant
## 0.4586 118.1822
## s.e. 0.1198 9.5047
##
## sigma^2 estimated as 5479: log likelihood=-325.32
## AIC=656.65 AICc=657.1 BIC=662.78
gg_tsresiduals(best_fit)
best_fit |>
forecast(h = 10) |>
autoplot(us_gdp)
The forecasts look reasonable. They continue the upward pattern seen in the historical U.S. GDP series and do not show any sudden or unrealistic changes. The point forecasts increase steadily over the forecast horizon, which is consistent with the long run growth in the data. The prediction intervals widen as the horizon increases, showing greater uncertainty further into the future, which is what we would expect.
ETS() (with no transformation).fit_ets <- us_gdp |>
model(ETS(GDP))
report(fit_ets)
## Series: GDP
## Model: ETS(M,A,N)
## Smoothing parameters:
## alpha = 0.9990876
## beta = 0.5011949
##
## Initial states:
## l[0] b[0]
## 448093333334 64917355687
##
## sigma^2: 7e-04
##
## AIC AICc BIC
## 3190.787 3191.941 3201.089
fit_ets |>
forecast(h = 10) |>
autoplot(us_gdp)
Yes:
The ETS() forecasts grow faster and have much wider
intervals than the transformed ARIMA forecasts. The ARIMA model gives
steadier and more reasonable forecasts, while the untransformed
ETS() model appears less stable here.