library(fpp3)
library(tidyverse)
Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
Since the spikes
lie within the 95% interval bounds the series can be considered white
noise. For smaller sample size such as 36 random numbers several lags
may appear significant just by chance while some may not. In more
moderate sample size such as 360 the fluctuations are smaller and most
autocorrelations are closer to zero. And lastly in larger sample size
such as 1000 the ACF values are cluster tightly around zero, very few
appear significant.
The critical values are at different distances from the mean of zero because as our sample sizes increases we can be more confident that the true autocrrelations are near zero allowing the boundaries to be more narrow. The autocorrelations are different in each figure becuase each random sample produces different random fluctuations around zero. And also as the sample sizes become larger these random differences average out and the ACFs look closer to the theoretical expectations flat at zero.
A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
amazon_stock <- gafa_stock %>%
filter(Symbol == "AMZN")
amazon_stock %>%
autoplot(Close) +
labs(title = "Amazon Daily Closing Prices",
y = "Closing Price (USD)",
x = "Date")
amazon_stock %>%
ACF(Close) %>%
autoplot() +
labs(title = "ACF of Amazon Closing Prices")
## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.
amazon_stock %>%
PACF(Close) %>%
autoplot() +
labs(title = "PACF of Amazon Closing Prices")
## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.
The series is non-stationary because the there is a clear upward trend over time and the vairance and mean change over time. In the ACF plot the values decay slowly and remain significantly positive across many lags. And in the PACF plot there is a significant spike at lag 1 with smaller values in the subsequent lags. This supports that differencing once might help remove the trend.
For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
Turkish GDP from global_economy.
turkey_gdp <- global_economy %>%
filter(Country == "Turkey")
turkey_gdp %>%
autoplot(GDP) +
labs(title = "Turkish GDP")
turkey_gdp %>%
features(GDP, features = guerrero)
## # A tibble: 1 × 2
## Country lambda_guerrero
## <fct> <dbl>
## 1 Turkey 0.157
turkey_gdp <- turkey_gdp %>%
mutate(GDP_trans = box_cox(GDP, 0.1572187))
turkey_gdp %>%
autoplot(GDP_trans)
turkey_gdp %>%
features(GDP_trans, unitroot_kpss)
## # A tibble: 1 × 3
## Country kpss_stat kpss_pvalue
## <fct> <dbl> <dbl>
## 1 Turkey 1.50 0.01
turkey_gdp %>%
mutate(diff_gdp = difference(GDP_trans)) %>%
ACF(diff_gdp, na.action = na.exclude) %>%
autoplot()
Accommodation takings in the state of Tasmania from aus_accommodation.
tas_accom <- aus_accommodation %>%
filter(State == "Tasmania")
tas_accom %>%
autoplot(Takings)
tas_accom %>%
features(Takings, features = guerrero)
## # A tibble: 1 × 2
## State lambda_guerrero
## <chr> <dbl>
## 1 Tasmania 0.00182
tas_accom <- tas_accom %>%
mutate(Takings_trans = box_cox(Takings, 0.001819643))
tas_accom %>%
autoplot(Takings_trans)
tas_accom %>%
features(Takings_trans, unitroot_ndiffs)
## # A tibble: 1 × 2
## State ndiffs
## <chr> <int>
## 1 Tasmania 1
tas_accom %>%
features(Takings_trans, unitroot_nsdiffs)
## # A tibble: 1 × 2
## State nsdiffs
## <chr> <int>
## 1 Tasmania 1
tas_accom_diff <- tas_accom %>%
mutate(diff_takings = difference(Takings_trans, lag = 12))
tas_accom_diff %>%
autoplot(diff_takings)
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
Monthly sales from souvenirs.
souvenirs %>%
autoplot(Sales)
souvenirs %>%
features(Sales, features = guerrero)
## # A tibble: 1 × 1
## lambda_guerrero
## <dbl>
## 1 0.00212
souvenirs <- souvenirs %>%
mutate(Sales_trans = box_cox(Sales, 0.002118221))
souvenirs %>%
features(Sales_trans, unitroot_nsdiffs)
## # A tibble: 1 × 1
## nsdiffs
## <int>
## 1 1
souvenirs %>%
features(Sales_trans, unitroot_ndiffs)
## # A tibble: 1 × 1
## ndiffs
## <int>
## 1 1
souvenirs_diff <- souvenirs %>%
mutate(Sales_diff = difference(difference(Sales_trans, lag = 12), lag = 1))
souvenirs_diff %>%
autoplot(Sales_diff)
## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).
For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
set.seed(624)
retail <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
retail |>
autoplot(Turnover) +
labs(title = "Retail Turnover")
retail %>%
features(Turnover, features = guerrero)
## # A tibble: 1 × 3
## State Industry lambda_guerrero
## <chr> <chr> <dbl>
## 1 New South Wales Takeaway food services 0.00214
retail <- retail %>%
mutate(Turnover_trans = box_cox(Turnover, 0.002144737))
retail %>%
features(Turnover_trans, unitroot_nsdiffs)
## # A tibble: 1 × 3
## State Industry nsdiffs
## <chr> <chr> <int>
## 1 New South Wales Takeaway food services 1
retail %>%
features(Turnover_trans, unitroot_ndiffs)
## # A tibble: 1 × 3
## State Industry ndiffs
## <chr> <chr> <int>
## 1 New South Wales Takeaway food services 1
Both have 1 so well apply both
retail_diff <- retail %>%
mutate(
Turnover_diff = difference(difference(Turnover_trans, lag = 12), lag = 1)
)
retail_diff %>%
autoplot(Turnover_diff) +
labs(title = "Differenced & Transformed Retail Turnover")
## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).
retail_diff %>%
ACF(Turnover_diff, na.action = na.exclude) %>%
autoplot()
retail_diff %>%
PACF(Turnover_diff, na.action = na.exclude) %>%
autoplot()
The stationary series is
difference(difference(box_cox(Turnover, 0.002144737), lag = 12), lag = 1)
Simulate and plot some data from simple ARIMA models.
Use the following R code to generate data from an AR(1) model with phi 1 = 0.6 and sigma 2 = 1 . The process starts with y 1 = 0.
set.seed(624)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
Produce a time plot for the series. How does the plot change as you change phi1?
sim %>%
autoplot(y) +
labs(title = "AR(1) (phi = 0.6)")
when phi is 0.9 it is more persistent and slower decay when phi is 0.2
less persistent quicker delay and noisy when phi is -0.6 alternating
values oscillatory patterns
Write your own code to generate data from an MA(1) model with theta1=0.6 and sigma2=1.
set.seed(624)
e <- rnorm(101, mean = 0, sd = 1)
y <- numeric(100)
for(i in 1:100) {
y[i] <- e[i + 1] + 0.6 * e[i]
}
sim_ma1 <- tsibble(idx = 1:100, y = y, index = idx)
Produce a time plot for the series. How does the plot change as you change theta1 ?
sim_ma1 %>%
autoplot(y) +
labs(title = "MA(1) (theta = 0.6)")
when theta is 0 it becomes white noise when theta is 0.9 short bursts of
correlation when theta is -0.6 more negative correlation between
successive points
Generate data from an ARMA(1,1) model with phi1=0.6 , theta1=0.6 and sigma2=1.
set.seed(624)
e <- rnorm(101)
y <- numeric(100)
for(i in 2:100) {
y[i] <- 0.6 * y[i - 1] + e[i] + 0.6 * e[i - 1]
}
sim_arma11 <- tsibble(idx = 1:100, y = y, index = idx)
sim_arma11 %>%
autoplot(y) +
labs(title = "ARMA(1,1) (phi = 0.6, theta = 0.6)")
Generate data from an AR(2) model with phi1=-0.8 , phi2=0.3 and sigma2=1. (Note that these parameters will give a non-stationary series.)
set.seed(624)
e <- rnorm(100)
y <- numeric(100)
y[1] <- 0
y[2] <- 0
for(i in 3:100) {
y[i] <- -0.8 * y[i - 1] + 0.3 * y[i - 2] + e[i]
}
sim_ar2_ns <- tsibble(idx = 1:100, y = y, index = idx)
sim_ar2_ns %>%
autoplot(y) +
labs(title = "AR(2) (phi1 = -0.8, phi2 = 0.3)")
Graph the latter two series and compare them.
library(patchwork)
p1 <- sim_arma11 %>%
autoplot(y) +
labs(title = "ARMA(1,1)")
p2 <- sim_ar2_ns %>%
autoplot(y) +
labs(title = "Non-Stationary AR(2)")
p1 / p2
ARMA(1,1) mean reverting, stationary, fluctuations centered around 0
AR(2) non stationary values drift, thye explode and oscillate doesnt
settle back to the mean
Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.
Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.
fit_auto <- aus_airpassengers %>%
model(ARIMA(Passengers))
report(fit_auto)
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8963
## s.e. 0.0594
##
## sigma^2 estimated as 4.308: log likelihood=-97.02
## AIC=198.04 AICc=198.32 BIC=201.65
fit_auto %>%
gg_tsresiduals()
## Warning: `gg_tsresiduals()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsresiduals()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
fit_auto %>%
forecast(h = 10) %>%
autoplot(aus_airpassengers) +
labs(title = "Forecast: Automatic ARIMA Model")
Write the model in terms of the backshift operator.
(1-B)^2 [yt] = (1+0.8963B)log(Passenger)
Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.
fit_b <- aus_airpassengers |>
model(ARIMA(Passengers ~ 1 + pdq(0,1,0)))
fit_b %>%
forecast(h = 10) %>%
autoplot(aus_airpassengers) +
labs(title = "ARIMA(0,1,0) with Drift")
Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.
air <- aus_airpassengers %>%
filter(Year >= 1970, Year <= 2011)
fit_with_drift <- air %>%
model(ARIMA(Passengers ~ pdq(2,1,2) + drift()))
## Warning: 1 error encountered for ARIMA(Passengers ~ pdq(2, 1, 2) + drift())
## [1] could not find function "drift"
fit_no_drift <- air %>%
model(ARIMA(Passengers ~ pdq(2,1,2)))
fc_with_drift <- fit_with_drift %>% forecast(h = 10)
fc_no_drift <- fit_no_drift %>% forecast(h = 10)
library(patchwork)
p1 <- fc_with_drift %>%
autoplot(air) +
labs(title = "ARIMA(2,1,2) with Drift",
y = "Passengers (millions)")
p2 <- fc_no_drift %>%
autoplot(air) +
labs(title = "ARIMA(2,1,2) without Drift",
y = "Passengers (millions)")
p1 / p2
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).
Doesn’t forecast with drift but does without it.
Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?
fit_d <- air %>%
model(ARIMA(Passengers ~ pdq(0,2,1) + drift()))
## Warning: 1 error encountered for ARIMA(Passengers ~ pdq(0, 2, 1) + drift())
## [1] could not find function "drift"
fit_d %>%
forecast(h = 10) %>%
autoplot(air) +
labs(title = "Part D: ARIMA(0,2,1) with Constant",
y = "Passengers (millions)")
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).
It doesn’t work with drift but it does without it.
fit_d <- air %>%
model(ARIMA(Passengers ~ pdq(0,2,1)))
fit_d %>%
forecast(h = 10) %>%
autoplot(air) +
labs(title = "Part D: ARIMA(0,2,1) with Constant",
y = "Passengers (millions)")
For the United States GDP series (from global_economy): if necessary, find a suitable Box-Cox transformation for the data; fit a suitable ARIMA model to the transformed data using ARIMA();
us_gdp <- global_economy %>%
filter(Country == "United States") %>%
select(Year, GDP)
autoplot(us_gdp, GDP) +
labs(title = "US GDP over Time", y = "GDP (USD Billions)")
lambda <- us_gdp %>% features(GDP, features = guerrero) %>% pull(lambda_guerrero)
lambda
## [1] 0.2819443
fit_arima <- us_gdp %>%
model(ARIMA(box_cox(GDP, lambda)))
report(fit_arima)
## Series: GDP
## Model: ARIMA(1,1,0) w/ drift
## Transformation: box_cox(GDP, lambda)
##
## Coefficients:
## ar1 constant
## 0.4586 118.1822
## s.e. 0.1198 9.5047
##
## sigma^2 estimated as 5479: log likelihood=-325.32
## AIC=656.65 AICc=657.1 BIC=662.78
try some other plausible models by experimenting with the orders chosen; choose what you think is the best model and check the residual diagnostics; produce forecasts of your fitted model. Do the forecasts look reasonable? compare the results with what you would obtain using ETS() (with no transformation).
fit_simple <- us_gdp %>%
model(ARIMA(box_cox(GDP, lambda) ~ pdq(1,1,0)))
report(fit_simple)
## Series: GDP
## Model: ARIMA(1,1,0) w/ drift
## Transformation: box_cox(GDP, lambda)
##
## Coefficients:
## ar1 constant
## 0.4586 118.1822
## s.e. 0.1198 9.5047
##
## sigma^2 estimated as 5479: log likelihood=-325.32
## AIC=656.65 AICc=657.1 BIC=662.78
fit_alt1 <- us_gdp %>%
model(ar210 = ARIMA(box_cox(GDP, lambda) ~ pdq(2,1,0)),
ar111 = ARIMA(box_cox(GDP, lambda) ~ pdq(1,1,1)),
ar211 = ARIMA(box_cox(GDP, lambda) ~ pdq(2,1,1)))
glance(fit_alt1)
## # A tibble: 3 × 8
## .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 ar210 5580. -325. 659. 659. 667. <cpl [2]> <cpl [0]>
## 2 ar111 5580. -325. 659. 659. 667. <cpl [1]> <cpl [1]>
## 3 ar211 5647. -325. 660. 661. 671. <cpl [2]> <cpl [1]>
fit_alt1 %>%
select(ar111) %>%
gg_tsresiduals()
fit_alt1 %>%
select(ar111) %>%
augment() %>%
features(.innov, ljung_box, lag = 10)
## # A tibble: 1 × 3
## .model lb_stat lb_pvalue
## <chr> <dbl> <dbl>
## 1 ar111 3.84 0.954
fit_alt1 %>%
select(ar111) %>%
forecast(h = 10) %>%
autoplot(us_gdp) +
labs(title = "Forecast: US GDP using ARIMA(1,1,1)",
y = "GDP (USD Billions)")
fit_ets <- us_gdp %>%
model(ets = ETS(GDP))
forecast(fit_ets, h = 10) %>%
autoplot(us_gdp) +
labs(title = "Forecast: US GDP using ETS Model")
fc_combined <- bind_rows(
forecast(fit_alt1 %>% select(ar111), h = 10) %>% mutate(Model = "ARIMA(1,1,1)"),
forecast(fit_ets, h = 10) %>% mutate(Model = "ETS")
)
fc_combined %>%
autoplot(us_gdp, level = NULL) +
facet_wrap(~Model) +
labs(title = "US GDP Forecasts: ARIMA vs ETS",
y = "GDP (USD Billions)")