library(tidyverse)
library(fpp3)
library(kableExtra)
library(cowplot)
Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
The differences among these figures are that the ACF range are different. We can see as we progress from left to right, the ranges becomes narrower. They all indicate that they are white noise as we do not see any lags cross the blue line boundary.
As our sample size increases, we see that the variation between two numbers become less influential to each other compared to smaller sample sizes. As we have less numbers to choose from, and wider ranges, the ACF then detects these critical values at a larger scale from zero.
A classic example of a non-stationary series are stock prices. Plot
the daily closing prices for Amazon stock (contained in
gafa_stock), along with the ACF and PACF. Explain how each
plot shows that the series is non-stationary and should be
differenced.
amazon <-
gafa_stock |>
filter(Symbol == 'AMZN') |>
mutate(trading_day = row_number()) |>
update_tsibble(index=trading_day, regular=T)
kbl(head(amazon)) |>
kable_classic(full_width = F, html_font = "Cambria")
| Symbol | Date | Open | High | Low | Close | Adj_Close | Volume | trading_day |
|---|---|---|---|---|---|---|---|---|
| AMZN | 2014-01-02 | 398.80 | 399.36 | 394.02 | 397.97 | 397.97 | 2137800 | 1 |
| AMZN | 2014-01-03 | 398.29 | 402.71 | 396.22 | 396.44 | 396.44 | 2210200 | 2 |
| AMZN | 2014-01-06 | 395.85 | 397.00 | 388.42 | 393.63 | 393.63 | 3170600 | 3 |
| AMZN | 2014-01-07 | 395.04 | 398.47 | 394.29 | 398.03 | 398.03 | 1916000 | 4 |
| AMZN | 2014-01-08 | 398.47 | 403.00 | 396.04 | 401.92 | 401.92 | 2316500 | 5 |
| AMZN | 2014-01-09 | 403.71 | 406.89 | 398.44 | 401.01 | 401.01 | 2103000 | 6 |
We can clearly see a strong upward trend where the data is not stationary in its closing price.
amazon |>
autoplot(Close) +
labs(x = "Day", y="Price ($ USD)", title = "Amazon Closing Price")
We see high autocorrelation in each lag that has a slowly decreasing effect as we lag further ahead.
amazon |>
ACF(Close) |>
autoplot() +
labs(subtitle = "Amazon ACF")
Our PACF plot shows an extremely high autocorrelation in the first lag, while there are more lags outside our critical values of concern later on.
amazon |>
PACF(Close) |>
autoplot() +
labs(title = "Amazon PACF")
For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
turkey <-
global_economy |>
filter(Country == "Turkey")
kbl(head(turkey)) |>
kable_classic(full_width = F, html_font = "Cambria")
| Country | Code | Year | GDP | Growth | CPI | Imports | Exports | Population |
|---|---|---|---|---|---|---|---|---|
| Turkey | TUR | 1960 | 13995067818 | NA | 5.40e-05 | 3.671072 | 2.055800 | 27472331 |
| Turkey | TUR | 1961 | 8022222222 | 1.156069 | 5.57e-05 | 6.786704 | 5.124654 | 28146893 |
| Turkey | TUR | 1962 | 8922222222 | 5.571429 | 5.78e-05 | 7.970112 | 5.603985 | 28832805 |
| Turkey | TUR | 1963 | 10355555556 | 9.066306 | 6.15e-05 | 6.974249 | 4.184549 | 29531342 |
| Turkey | TUR | 1964 | 11177777778 | 5.459057 | 6.22e-05 | 5.467197 | 4.473161 | 30244232 |
| Turkey | TUR | 1965 | 11944444444 | 2.823529 | 6.50e-05 | 5.395349 | 4.558140 | 30972965 |
turkey |>
autoplot(GDP) +
labs(title = "Turkey GDP", x='Year', y='')
lambda <-
turkey |>
features(GDP, features = guerrero) |>
pull(lambda_guerrero)
turkey |>
autoplot(box_cox(GDP, lambda)) +
labs(title = latex2exp::TeX(paste0(
"Transformed Turkey GDP with $\\lambda$ = ",
round(lambda,3))),
x='Year',
y='')
No differencing is needed as the data is recorded yearly.
tasmania <- aus_accommodation |>
filter(State == "Tasmania")
kbl(head(tasmania)) |>
kable_classic(full_width = F, html_font = "Cambria")
| Date | State | Takings | Occupancy | CPI |
|---|---|---|---|---|
| 1998 Q1 | Tasmania | 28.69907 | 67 | 67.0 |
| 1998 Q2 | Tasmania | 19.01751 | 45 | 67.4 |
| 1998 Q3 | Tasmania | 16.10556 | 39 | 67.5 |
| 1998 Q4 | Tasmania | 25.88618 | 56 | 67.8 |
| 1999 Q1 | Tasmania | 28.35130 | 66 | 67.8 |
| 1999 Q2 | Tasmania | 20.12543 | 48 | 68.1 |
tasmania |>
autoplot(Takings) +
labs(title = "Tasmanian Tourist Accommodation Takings")
lambda <-
tasmania |>
features(Takings, features = guerrero) |>
pull(lambda_guerrero)
tasmania |>
autoplot(box_cox(Takings, lambda)) +
labs(title = latex2exp::TeX(paste0(
"Transformed Tasmania Takings with $\\lambda$ = ",
round(lambda,3))),
y='')
First Order Differencing
Since we are provided quarterly data, we will use a seasonality (lag)
of 4 for the difference() function.
trans_tasmania <-
tasmania |>
mutate(bc_trans = box_cox(Takings, lambda))
trans_tasmania |>
autoplot(difference(bc_trans, lag=4)) +
labs(caption = latex2exp::TeX(paste0(
"Transformed Tasmania Takings with $\\lambda$ = ",
round(lambda,3))),
y='',
title = "First Order Differencing")
Unfortunately we still see issues of obtaining stationary data as our first three lags are outside our critical values.
trans_tasmania |>
ACF(difference(bc_trans, lag=4)) |>
autoplot() +
labs(caption = latex2exp::TeX(paste0(
"Transformed Tasmania Takings with $\\lambda$ = ",
round(lambda,3))),
y='',
title = "ACF: First Order Differencing")
Second Order Differencing
trans_tasmania |>
autoplot(difference(bc_trans, lag=4) |> difference()) +
labs(caption = latex2exp::TeX(paste0(
"Transformed Tasmania Takings with $\\lambda$ = ",
round(lambda,3))),
y='',
title = "Second Order Differencing")
Even after our second order, we still have lags outside of our critical values, so we are still unable to make it completely stationary, even though we have significant improvements.
trans_tasmania |>
ACF(difference(bc_trans, lag=4) |> difference()) |>
autoplot() +
labs(caption = latex2exp::TeX(paste0(
"Transformed Tasmania Takings with $\\lambda$ = ",
round(lambda,3))),
y='',
title = "ACF: Second Order Differencing")
kbl(head(souvenirs)) |>
kable_classic(full_width = F, html_font = "Cambria")
| Month | Sales |
|---|---|
| 1987 Jan | 1664.81 |
| 1987 Feb | 2397.53 |
| 1987 Mar | 2840.71 |
| 1987 Apr | 3547.29 |
| 1987 May | 3752.96 |
| 1987 Jun | 3714.74 |
souvenirs |>
autoplot(Sales) +
labs(title = "Souvenir Sales")
lambda <-
souvenirs |>
features(Sales, features = guerrero) |>
pull(lambda_guerrero)
souvenirs |>
autoplot(box_cox(Sales, lambda)) +
labs(title = latex2exp::TeX(paste0(
"Transformed Souvenir Sales with $\\lambda$ = ",
round(lambda,3))),
y='')
First Order Differencing
Since we are provided monthly data, we will use a seasonality (lag)
of 12 for the difference() function.
trans_souvenirs <-
souvenirs |>
mutate(bc_trans = box_cox(Sales, lambda))
trans_souvenirs |>
autoplot(difference(bc_trans, lag=12)) +
labs(caption = latex2exp::TeX(paste0(
"Transformed Souvenir Sales with $\\lambda$ = ",
round(lambda,3))),
y='',
title = "First Order Differencing")
We still see issues of obtaining stationary data as our first two lags are outside our critical values.
trans_souvenirs |>
ACF(difference(bc_trans, lag=12)) |>
autoplot() +
labs(caption = latex2exp::TeX(paste0(
"Transformed Souvenir Sales with $\\lambda$ = ",
round(lambda,3))),
y='',
title = "ACF: First Order Differencing")
Second Order Differencing
trans_souvenirs |>
autoplot(difference(bc_trans, lag=12) |> difference()) +
labs(caption = latex2exp::TeX(paste0(
"Transformed Souvenir Sales with $\\lambda$ = ",
round(lambda,3))),
y='',
title = "Second Order Differencing")
After our second order, we have improved the data to become stationary with only the initial lag outside the boundaries.
trans_souvenirs |>
ACF(difference(bc_trans, lag=12) |> difference()) |>
autoplot() +
labs(caption = latex2exp::TeX(paste0(
"Transformed Souvenir Sales with $\\lambda$ = ",
round(lambda,3))),
y='',
title = "ACF: Second Order Differencing")
For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
set.seed(42)
retail <-
aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
kbl(head(retail)) |>
kable_classic(full_width = F, html_font = "Cambria")
| State | Industry | Series ID | Month | Turnover |
|---|---|---|---|---|
| Western Australia | Newspaper and book retailing | A3349822A | 1982 Apr | 9.7 |
| Western Australia | Newspaper and book retailing | A3349822A | 1982 May | 11.0 |
| Western Australia | Newspaper and book retailing | A3349822A | 1982 Jun | 10.7 |
| Western Australia | Newspaper and book retailing | A3349822A | 1982 Jul | 9.0 |
| Western Australia | Newspaper and book retailing | A3349822A | 1982 Aug | 9.1 |
| Western Australia | Newspaper and book retailing | A3349822A | 1982 Sep | 10.0 |
retail |>
autoplot(Turnover) +
labs(title = "Newspaper and Book Turnover")
lambda <-
retail |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
retail |>
autoplot(box_cox(Turnover, lambda)) +
labs(title = latex2exp::TeX(paste0(
"Transformed Newspaper and Book Turnover with $\\lambda$ = ",
round(lambda,3))),
y='')
First Order Differencing
Since we are provided monthly data, we will use a seasonality (lag)
of 12 for the difference() function.
trans_retail <-
retail |>
mutate(bc_trans = box_cox(Turnover, lambda))
trans_retail |>
autoplot(difference(bc_trans, lag=12)) +
labs(caption = latex2exp::TeX(paste0(
"Transformed Newspaper and Book Turnover with $\\lambda$ = ",
round(lambda,3))),
y='',
title = "First Order Differencing")
We still see a lot of issues of obtaining stationary data as multiple lags are outside our critical values.
trans_retail |>
ACF(difference(bc_trans, lag=12)) |>
autoplot() +
labs(caption = latex2exp::TeX(paste0(
"Transformed Newspaper and Book Turnover with $\\lambda$ = ",
round(lambda,3))),
y='',
title = "ACF: First Order Differencing")
Second Order Differencing
trans_retail |>
autoplot(difference(bc_trans, lag=12) |> difference()) +
labs(caption = latex2exp::TeX(paste0(
"Transformed Newspaper and Book Turnover with $\\lambda$ = ",
round(lambda,3))),
y='',
title = "Second Order Differencing")
After our second order, we have significantly improved the data to become stationary. However, at lag 12, we have a serious issue of autocorrelation occurring.
trans_retail |>
ACF(difference(bc_trans, lag=12) |> difference()) |>
autoplot() +
labs(caption = latex2exp::TeX(paste0(
"Transformed Newspaper and Book Turnover with $\\lambda$ = ",
round(lambda,3))),
y='',
title = "ACF: Second Order Differencing")
Simulate and plot some data from simple ARIMA models.
set.seed(42)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim <-
tsibble(idx = seq_len(100), y = y, index = idx)
sim |>
autoplot(y) +
labs(title = "AR(1)", caption = "Phi = 0.6")
When \(\phi\) is adjusted throughout the time plot series, we can see the variation becomes much more stable as we approach zero, then becomes increasingly different when we \(\phi\) is approaching 1 again.
set.seed(42)
phi_step <-
seq(from = -1, to = 1, by = 0.25)
sim_phi <-
as.data.frame(matrix(ncol = 2, nrow = 0))
colnames(sim_phi) <- c("phi_val", "y")
for (j in 1:length(phi_step)){
y <-
numeric(100)
e <-
rnorm(100)
phi <-
phi_step[j]
phi_val <-
(y + 1) * phi
for (i in 2:100){
y[i] <- (phi * y[i-1]) + e[i]
}
idx <-
seq_len(100)
new_rows <-
cbind(idx, phi_val, y)
sim_phi <-
rbind(sim_phi, new_rows)
}
sim_phi |>
as_tsibble(index = idx, key = phi_val) |>
autoplot(y) +
facet_wrap(~ phi_val, ncol = 3) +
theme_bw() +
theme(legend.position = "none")
set.seed(42)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*e[i-1] + e[i]
sim <-
tsibble(idx = seq_len(100), y = y, index = idx)
When we look at \(\theta\) approaching 0, we see that our values are closer to the mean. When it begins to approach towards 1, the variation in values begin to become larger again.
set.seed(42)
theta_step <-
seq(from = -1, to = 1, by = 0.25)
sim_theta <-
as.data.frame(matrix(ncol = 2, nrow = 0))
colnames(sim_theta) <- c("theta_val", "y")
for (j in 1:length(theta_step)){
y <-
numeric(100)
e <-
rnorm(100)
theta <-
theta_step[j]
theta_val <-
(y + 1) * theta
for (i in 2:100){
y[i] <- (theta * e[i-1]) + e[i]
}
idx <-
seq_len(100)
new_rows <-
cbind(idx, theta_val, y)
sim_theta <-
rbind(sim_theta, new_rows)
}
sim_theta |>
as_tsibble(index = idx, key = theta_val) |>
autoplot(y) +
facet_wrap(~ theta_val, ncol = 3) +
theme_bw() +
theme(legend.position = "none")
y <- numeric(100)
e <- rnorm(100)
for (i in 2:100){
y[i] <- (0.6 * y[i-1]) + (0.6 * e[i-1]) + e[i]
}
sim_arma <-
tsibble(idx = seq_len(100), y = y, index = idx)
y <- numeric(100)
e <- rnorm(100)
for (i in 3:100){
y[i] <- (-0.8 * y[i-1]) + (0.3 * y[i-2]) + e[i]
}
sim_ar <-
tsibble(idx = seq_len(100), y = y, index = idx)
ARMA(1,1) does not show any specific trend or seasonality. AR(2) starts from almost a straight line with approximately zero variance and begins to grow exponentially in variance as we go further through the series.
arma_plot <-
sim_arma |>
autoplot(y) +
labs(title = "ARMA(1,1)")
ar_plot <-
sim_ar |>
autoplot(y) +
labs(title="AR(2)" , caption = latex2exp::TeX("ARMA(1,1): ($\\phi_1 = 0.6$, $\\theta_1 = 0.6$)
AR(2): ($\\phi_1 = -0.8$, $\\phi_2 = 0.3$)"))
plot_grid(arma_plot, ar_plot, ncol=1)
Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.
kbl(head(aus_airpassengers)) |>
kable_classic(full_width = F, html_font = "Cambria")
| Year | Passengers |
|---|---|
| 1970 | 7.3187 |
| 1971 | 7.3266 |
| 1972 | 7.7956 |
| 1973 | 9.3846 |
| 1974 | 10.6647 |
| 1975 | 11.0551 |
ARIMA(0,2,1) model was selected
fit <- aus_airpassengers |>
model(ARIMA(Passengers))
report(fit)
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8963
## s.e. 0.0594
##
## sigma^2 estimated as 4.308: log likelihood=-97.02
## AIC=198.04 AICc=198.32 BIC=201.65
We see the remainder as white noise.
fit |>
gg_tsresiduals()
pass_fc <-
fit |>
forecast(h="10 years")
pass_fc |>
autoplot(aus_airpassengers) +
labs(title = "Australian Air Passengers", caption = "Forecasting Next 10 Years")
\(y_t = (1 + \theta B) \epsilon_t(1-B2)\)
We can see the forecast is dampened and the prediction intervals appear to be narrower.
fit <-
aus_airpassengers |>
model(ARIMA(Passengers ~ 1 + pdq(0,1,0)))
arima_fc_010 <-
fit |>
forecast(h="10 years")
arima_fc_010 |>
autoplot(aus_airpassengers) +
labs(title = "10 Year Forecast with Drift", caption = "ARIMA(0, 1, 0)")
The forecasts trends are similar, but now we have more variation within the forecasted years and not a straight line. When we removed the constant an error is given thar the forecasts can’t be created.
fit <-
aus_airpassengers |>
model(ARIMA(Passengers ~ 1 + pdq(2,1,2)))
arima_fc_212 <-
fit |>
forecast(h="10 years")
arima_fc_212 |>
autoplot(aus_airpassengers) +
labs(title = "10 Year Forecast with Drift", caption = "ARIMA(2, 1, 2)")
We now get a warning
Warning: Model specification induces a quadratic or higher order polynomial trend.
This is generally discouraged, consider removing the constant or reducing the number of differences.
fit <-
aus_airpassengers |>
model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))
arima_fc_021 <-
fit |>
forecast(h="10 years")
arima_fc_021 |>
autoplot(aus_airpassengers) +
labs(title = "10 Year Forecast with Drift", caption = "ARIMA(0, 2, 1)")