Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
a: Explain the differences among these figures. Do they all indicate that the data are white noise?
The figures show autocorrelations that fluctuate randomly around zero, with no discernible pattern or consistently significant spikes beyond the confidence bounds. The main difference though is that as the sample size increases from 36 to 360 to 1,000, the blue critical bands narrow considerably, reflecting greater precision in estimating true zero autocorrelation. So each figure indicates the data are consistent with white noise, although the larger series provide stronger visual evidence.
b: Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?
The critical values depend on the sample size. When T increases, the standard error of the autocorrelation estimate also decreases, so the confidence bands narrow around zero. Each white noise series is an independent random draw, so the autocorrelation values vary due to sampling variability. So the differences across figures reflect both the effect of sample size on precision and random variation in the generated series.
A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
gafa_stock %>%
filter(Symbol == "AMZN") %>%
gg_tsdisplay(Close, plot_type = "partial")
## Warning: `gg_tsdisplay()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsdisplay()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
The times series plot with the stock price shows an upward trend over time. The ACF plot bars decrease very slowly and stay large for many lags, this slow decrease means the data has a long memory or trend and is non-stationary. The PACF has a very large spike at lag 1 (close to 1), and then other lags are small.The pattern is usual for a random walk or highly persistent series. So since the mean is not stable, the series should be differenced by 1.
gafa_stock %>%
filter(Symbol == "AMZN") %>%
features(Close, unitroot_ndiffs)
global_economy %>%
filter(Country == "Turkey") %>%
gg_tsdisplay(GDP, plot_type='partial')
library(feasts)
#optimal lambda
lambda <- global_economy %>%
filter(Country == "Turkey") %>%
features(GDP, features = guerrero) %>%
pull(lambda_guerrero)
lambda
## [1] 0.1572187
global_economy %>%
filter(Country == "Turkey") %>%
mutate(GDP_clean = difference(box_cox(GDP, lambda))) %>%
gg_tsdisplay(GDP_clean, plot_type = "partial") +
labs(
title = "Turkish GDP After Box-Cox & Differencing",
subtitle = paste("λ =", round(lambda, 2)),
y = "Transformed & Differenced GDP",
x = "Year"
) +
theme_minimal()
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
#find order of differencing
global_economy %>%
filter(Country == "Turkey") %>%
features(box_cox(GDP,lambda), unitroot_ndiffs)
From the non-transformed graph of the Turkish GDP, we see that it is not stationary and doesn’t follow a seasonal pattern. However, after transforming the graph using a box-cox transformation, the graphs become stationary. Lastly, the order of differencing needed for this is 1.
aus_accommodation %>%
filter(State == "Tasmania") %>%
gg_tsdisplay(Takings, plot_type='partial')
#optimal lambda
lambda <- aus_accommodation %>%
filter(State == "Tasmania") %>%
features(Takings, features = guerrero) %>%
pull(lambda_guerrero)
lambda
## [1] 0.001819643
aus_accommodation %>%
filter(State == "Tasmania") %>%
mutate(
Takings_transformed = box_cox(Takings, lambda),
Takings_clean = difference(Takings_transformed, 4)
) %>%
gg_tsdisplay(Takings_clean, plot_type = "partial") +
labs(
title = "Tasmania Accommodation Takings",
subtitle = paste("Box-Cox λ =", round(lambda, 2)),
y = "Transformed & Differenced Takings",
x = "Date"
) +
theme_minimal()
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).
#find order of differencing
aus_accommodation %>%
filter(State == "Tasmania") %>%
features(box_cox(Takings,lambda), unitroot_nsdiffs)
From the accommodations taking in Tasmania, we can see from the non-seasonal graph that it is not stationary and seasonal. From the 1 differincing and transformed graph, the graph is stationary. In addition, we can see that the PACF plot shortens after the first lag, and that the ACF shows that it decreases rapidly.
c.Monthly sales from souvenirs.
souvenirs %>%
gg_tsdisplay(Sales, plot_type='partial', lag = 48)
lambda <- souvenirs %>%
features(Sales, features = guerrero) %>%
pull(lambda_guerrero)
souvenirs %>%
mutate(
Sales_clean = difference(box_cox(Sales, lambda), 12)
) %>%
gg_tsdisplay(Sales_clean, plot_type = "partial", lag = 48) +
labs(
title = paste("Differenced Monthly Souvenir Sales"),
subtitle = paste("Box-Cox λ =", round(lambda, 2)),
y = "Sales (transformed & differenced)",
x = "Month"
) +
theme_minimal()
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).
souvenirs %>%
features(box_cox(Sales,lambda), unitroot_nsdiffs)
The non-transform graph of the monthly sovenir sales shows that they are non stationary and have a seasonal pattern. The transformed graph with a differencing of 1 show that the series is stationary, and that the PACF plot truncates after the first lag while the ACF decreases rapidly.
aus_series <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
aus_series %>% gg_tsdisplay(Turnover, plot_type='partial', lag = 36)
lambda <- aus_series %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
aus_series %>%
mutate(
Turnover_clean = difference(box_cox(Turnover, lambda), 12)
) %>%
gg_tsdisplay(Turnover_clean, plot_type = "partial", lag = 36) +
labs(
title = "Queensland Food Retailing Turnover",
subtitle = paste("After Box-Cox (λ =", round(lambda, 2), ") and seasonal differencing (lag 12)"),
y = "Transformed & differenced",
x = "Date"
) +
theme_bw()
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).
aus_series %>%
features(box_cox(Turnover, lambda), unitroot_nsdiffs)
The non-transformed graph of the retail data shows that it has a seasonal pattern and is not stationary. Through applying a differencing of 1 and the box-cox transformation, we can see that the data is now stationary, centering itself around zero.
set.seed(245)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim %>%
autoplot(y)
changing ϕ1 = -1
for(i in 2:100)
y[i] <- -1*y[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) %>%
autoplot(y)
changing ϕ1 = 1
for(i in 2:100)
y[i] <- 1*y[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) %>%
autoplot(y)
changing ϕ1 = -.6
for(i in 2:100)
y[i] <- -.6*y[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) %>%
autoplot(y)
changing ϕ1 = 0
for(i in 2:100)
y[i] <- -0*y[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) %>%
autoplot(y)
changing ϕ1 = -.3
for(i in 2:100)
y[i] <- -.3*y[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) %>%
autoplot(y)
changing ϕ1 = .3
for(i in 2:100)
y[i] <- .3*y[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) %>%
autoplot(y)
As shown in the graphs, when ϕ1 increases from 0 to 1, the
series becomes smoother and more persistent. As ϕ1 becomes negative, the
series oscillates more wildly around the mean.
for(i in 2:100)
y[i] <- e[i] + 0.6 * e[i-1]
sim2 <- tsibble(idx = seq_len(100), y = y, index = idx)
MA(1) model with θ1 = 0.6 and σ2 = 1.
sim2 %>%
autoplot(y)
MA(1) model with θ1 = -1 and σ2 = 1.
for(i in 2:100)
y[i] <- e[i] + -1 * e[i-1]
tsibble(idx = seq_len(100), y = y, index = idx) %>%
autoplot(y)
MA(1) model with θ1 = 1 and σ2 = 1.
for(i in 2:100)
y[i] <- e[i] + 1 * e[i-1]
tsibble(idx = seq_len(100), y = y, index = idx) %>%
autoplot(y)
MA(1) model with θ1 = -.6 and σ2 = 1.
for(i in 2:100)
y[i] <- e[i] + -.6 * e[i-1]
tsibble(idx = seq_len(100), y = y, index = idx) %>%
autoplot(y)
MA(1) model with θ1 = 0 and σ2 = 1.
for(i in 2:100)
y[i] <- e[i] + 0 * e[i-1]
tsibble(idx = seq_len(100), y = y, index = idx) %>%
autoplot(y)
MA(1) model with θ1 = -.3 and σ2 = 1.
for(i in 2:100)
y[i] <- e[i] + -.3 * e[i-1]
tsibble(idx = seq_len(100), y = y, index = idx) %>%
autoplot(y)
MA(1) model with θ1 = .3 and σ2 = 1.
for(i in 2:100)
y[i] <- e[i] + .3 * e[i-1]
tsibble(idx = seq_len(100), y = y, index = idx) %>%
autoplot(y)
For an MA(1) model, when ϕ1 = 0 the series is white noise. Positive ϕ1 values smooth the series, making it more persistent and less volatile as ϕ1 increases. While negative ϕ1 vals cause wild oscillations around the mean, with more frequent spikes as ϕ1 decreases further.
set.seed(1432)
y <- numeric(100)
for(i in 2:100)
y[i] <- e[i] + 0.6 * e[i-1] + 0.6 * y[i-1]
arma_mod <- tsibble(idx = seq_len(100), y = y, index = idx)
set.seed(2531)
y <- numeric(100)
for(i in 3:100)
y[i] <- -0.8 * y[i-1] + 0.3 * y[i-2] + e[i]
arma_mod2 <- tsibble(idx = seq_len(100), y = y, index = idx)
ARMA(1,1) model with ϕ1=0.6, θ1=0.6 and σ2=1
arma_mod %>%
gg_tsdisplay(y, plot_type='partial') +
theme_light()
AR(2) model with ϕ1=−0.8, ϕ2=0.3 and σ2=1
arma_mod2 %>%
gg_tsdisplay(y, plot_type='partial') +
theme_light()
The ARMA(1,1) model produced a stationary, random-looking series. The ACF decays to zero and its PACF cuts off after lag 1, indicating moderate influence from both the previous value and previous error. However, the AR(2) model is non-stationary, causing the series to oscillate between positive and negative values with exponentially growing magnitude. Its ACF oscillates without decaying, and its PACF shows only a negative spike at lag 1, failing the stationarity constraint (ϕ₂ − ϕ₁ < 1).
ARIMA(0,2,1) was reported with the most appropriate fit
fit_mod <- aus_airpassengers %>%
filter(Year < 2012) %>%
model(ARIMA(Passengers))
report(fit_mod)
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8756
## s.e. 0.0722
##
## sigma^2 estimated as 4.671: log likelihood=-87.8
## AIC=179.61 AICc=179.93 BIC=182.99
fit_mod %>%
gg_tsresiduals()
## Warning: `gg_tsresiduals()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsresiduals()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
fit_mod %>%
forecast(h = 10) %>%
autoplot(aus_airpassengers)
From the series plot, it appears to be white noise.
y_t = -0.87 * e_{t-1} + e_t (1 - B)^2 * y_t = (1 - 0.87 * B) * e_t
fit_mod2 <-aus_airpassengers %>%
filter(Year < 2012) %>%
model(ARIMA(Passengers ~ pdq(0,1,0)))
fit_mod2 %>%
gg_tsresiduals()
fit_mod2 %>%
forecast(h = 10) %>%
autoplot(aus_airpassengers)
The ARIMA(0,2,1) model underestimates passenger numbers, while this model overestimated them.
ARIMA(2,1,2) model with drift
fit_mod3 <-aus_airpassengers %>%
filter(Year < 2012) %>%
model(ARIMA(Passengers ~ pdq(2,1,2)))
fit_mod3 %>%
forecast(h = 10) %>%
autoplot(aus_airpassengers)
fit_mod3 %>%
gg_tsresiduals()
Now we remove the constant from the model
fit_mod3 <-aus_airpassengers %>%
filter(Year < 2012) %>%
model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))
## Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 2))
## [1] non-stationary AR part from CSS
fit_mod3 %>%
forecast(h = 10) %>%
autoplot(aus_airpassengers)
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).
report(fit_mod3)
## Series: Passengers
## Model: NULL model
## NULL model
The model is similar to that of ARIMA(0,2,1), with the residuals seeming to be white noise. When the constant is removed from the model, there does not seem to be a forecast.
fit_mod4 <-aus_airpassengers %>%
filter(Year < 2012) %>%
model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))
## Warning: Model specification induces a quadratic or higher order polynomial trend. This
## is generally discouraged, consider removing the constant or reducing the number
## of differences.
fit_mod4 %>%
forecast(h = 10) %>%
autoplot(aus_airpassengers)
Through the inclusion of a constant, the slope of the forecast appears to be steeper, indicating a greater slope. However, since the model induces a higher order polynomial trend, it is encouraged to remove the constant or reducing the amount of differences.
lambda <- global_economy %>%
filter(Code == "USA") |>
features(GDP, features = guerrero) %>%
pull(lambda_guerrero)
lambda
## [1] 0.2819443
global_economy %>%
filter(Code == "USA") %>%
gg_tsdisplay(box_cox(GDP, lambda), plot_type = 'partial')
ARIMA(1,1,0) w/ drift
us_econ_arima <- global_economy %>%
filter(Code == "USA") %>%
model(ARIMA(box_cox(GDP, lambda)))
report(us_econ_arima)
## Series: GDP
## Model: ARIMA(1,1,0) w/ drift
## Transformation: box_cox(GDP, lambda)
##
## Coefficients:
## ar1 constant
## 0.4586 118.1822
## s.e. 0.1198 9.5047
##
## sigma^2 estimated as 5479: log likelihood=-325.32
## AIC=656.65 AICc=657.1 BIC=662.78
us_econ_arima %>%
gg_tsresiduals()
ARIMA(1,1,2) seems likely model with lowest AICc values
test_fits <- global_economy %>%
filter(Code == "USA") %>%
model(arima_110 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,1,0)),
arima_112 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,1,2)),
arima_120 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,2,0)),
arima_210 = ARIMA(box_cox(GDP,lambda) ~ pdq(2,1,0)),
arima_310 = ARIMA(box_cox(GDP,lambda) ~ pdq(3,1,0))
)
report(test_fits) %>%
arrange(AICc)
## Warning in report.mdl_df(test_fits): Model reporting is only supported for
## individual models, so a glance will be shown. To see the report for a specific
## model, use `select()` and `filter()` to identify a single model.
From ARIMA(1,1,2) model time series, it appears to be white noise.
test_fits %>%
select(arima_120) %>%
gg_tsresiduals()
Yes, the forecasts of my fitted model look reasonable, as it projects 10 years into the future. From the time series forecast graph, we can see that the graph has been generally increasing in slope overtime. The ARIMA forecast I chose, matches closely to the trajectory of the US economy is going towards.
test_fits %>%
forecast(h=10) %>%
filter(.model=='arima_120') %>%
autoplot(global_economy)
It appears from the ETS model that the slope is less steep than the model I chose, however the ETS forecasting band is much wider confidence level than the ARIMA model, showing that there is greater uncertainty with it than the ARIMA model.
us_ets_fit <- global_economy %>%
filter(Code == "USA") %>%
model(ETS(GDP))
us_ets_fit %>%
forecast(h = 10) %>%
autoplot(global_economy)