library(fpp3)
library(tidyverse)
library(fable)
library(latex2exp)
Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
Since each of these graphs represent vastly different quantities of data we have large disparities in the dashed blue lines since they are dependent on the size of the time series. In addition the values get smaller with more data because the variance will decrease in the calculated acf due to having more data, this produces a smaller average over the course of all the lags. Since for all of these graphs the values all appear within the range of the dashed blue lines it is reasonable to assume that the data are all white noise.
The critical distances are determined by the length of the time series, specifically by the expression \(\pm\frac{1.96}{\sqrt{T}}\) where \(T\) is the length of the time series. Thus by increasing the length of the time series we have decreased the distance of the critical values. Since white noise is different they should be expected to be different because white noise should not be discernable from random changes.
A classic example of a non-stationary series are stock prices. Plot
the daily closing prices for Amazon stock (contained in
gafa_stock
), along with the ACF and PACF. Explain how each
plot shows that the series is non-stationary and should be
differenced.
The first plot of just the closing price shows that there is an upward trend to the data which would disqualify it from being stationary. The ACF plot shows that for all the lags the acf is outside the bounds of the dashed blue lines, this indicates that there is trend to the data since it is not just white noise. Similarly for the PACF there are at least 4 different lag values with an pacf outside the blue bounds indicating that the data is not white noise and thus non-stationary.
amzn_close <- gafa_stock |>
filter(Symbol == 'AMZN') |>
select(Close) |>
print()
## # A tsibble: 1,258 x 2 [!]
## Close Date
## <dbl> <date>
## 1 398. 2014-01-02
## 2 396. 2014-01-03
## 3 394. 2014-01-06
## 4 398. 2014-01-07
## 5 402. 2014-01-08
## 6 401. 2014-01-09
## 7 398. 2014-01-10
## 8 391. 2014-01-13
## 9 398. 2014-01-14
## 10 396. 2014-01-15
## # ℹ 1,248 more rows
amzn_close |>
gg_tsdisplay(plot_type = "partial")
For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
global_economy.
#Showing initial data
turkish_gdp <- global_economy |>
filter(Country == 'Turkey') |>
select(GDP)
turkish_gdp |>
gg_tsdisplay(plot_type = "partial") +
ggtitle(paste("Turkish GDP"))
#Applying Box-Cox transform
lambda <- turkish_gdp |>
features(GDP, features = guerrero) |>
pull(lambda_guerrero)
bc_turkish_gdp <- turkish_gdp |>
mutate(GDP = box_cox(turkish_gdp$GDP, lambda))
bc_turkish_gdp |>
gg_tsdisplay(plot_type = "partial") +
ggtitle(paste("Turkish GDP w/ Box-Cox = ", as.character(lambda)))
#Testing Box-Cox transform
bc_turkish_gdp |>
features(GDP, unitroot_kpss) |>
print()
## # A tibble: 1 × 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 1.50 0.01
#Finding suggested difference
bc_turkish_gdp |>
features(GDP, unitroot_ndiffs) |>
print()
## # A tibble: 1 × 1
## ndiffs
## <int>
## 1 1
#Applying difference
diff_bc_turkish_gdp <- bc_turkish_gdp |>
mutate(GDP = difference(GDP))
diff_bc_turkish_gdp |>
gg_tsdisplay(plot_type = "partial") +
ggtitle(paste("Turkish GDP w/ Box-Cox = ", as.character(lambda), " & Diff = 1"))
#Testing differenced data
diff_bc_turkish_gdp |>
features(GDP, unitroot_kpss) |>
print()
## # A tibble: 1 × 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 0.0889 0.1
aus_accommodation
.#Showing initial data
taz_take <- aus_accommodation |>
filter(State == 'Tasmania') |>
select(Takings)
taz_take |>
gg_tsdisplay(plot_type = "partial") +
ggtitle("Tasmania Takings")
#Applying Box-Cox transform
lambda <- taz_take |>
features(Takings, features = guerrero) |>
pull(lambda_guerrero)
bc_taz_take <- taz_take |>
mutate(Takings = box_cox(taz_take$Takings, lambda))
bc_taz_take |>
gg_tsdisplay(plot_type = "partial") +
ggtitle(paste("Tasmania Takings w/ Box-Cox = ", as.character(lambda)))
#Testing Box-Cox transform
bc_taz_take |>
features(Takings, unitroot_kpss) |>
print()
## # A tibble: 1 × 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 1.84 0.01
#Finding suggested difference
bc_taz_take |>
features(Takings, unitroot_ndiffs) |>
print()
## # A tibble: 1 × 1
## ndiffs
## <int>
## 1 1
#Applying difference
diff_bc_taz_take <- bc_taz_take |>
mutate(Takings = difference(Takings))
diff_bc_taz_take |>
gg_tsdisplay(plot_type = "partial") +
ggtitle(paste("Tasmania Takings w/ Box-Cox = ", as.character(lambda), " & Diff = 1"))
#Testing differenced data
diff_bc_taz_take |>
features(Takings, unitroot_kpss) |>
print()
## # A tibble: 1 × 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 0.258 0.1
souvenirs
.#Showing initial data
souvenirs |>
gg_tsdisplay(plot_type = "partial") +
ggtitle("Souvenirs")
#Applying Box-Cox transform
lambda <- souvenirs |>
features(Sales, features = guerrero) |>
pull(lambda_guerrero)
bc_souvenirs <- souvenirs |>
mutate(Sales = box_cox(souvenirs$Sales, lambda))
bc_souvenirs |>
gg_tsdisplay(plot_type = "partial") +
ggtitle(paste("Tasmania Sales w/ Box-Cox = ", as.character(lambda)))
#Testing Box-Cox transform
bc_souvenirs |>
features(Sales, unitroot_kpss) |>
print()
## # A tibble: 1 × 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 1.79 0.01
#Finding suggested difference
bc_souvenirs |>
features(Sales, unitroot_ndiffs) |>
print()
## # A tibble: 1 × 1
## ndiffs
## <int>
## 1 1
#Applying difference
diff_bc_souvenirs <- bc_souvenirs |>
mutate(Sales = difference(Sales))
diff_bc_souvenirs |>
gg_tsdisplay(plot_type = "partial") +
ggtitle(paste("Tasmania Sales w/ Box-Cox = ", as.character(lambda), " & Diff = 1"))
#Testing differenced data
diff_bc_souvenirs |>
features(Sales, unitroot_kpss) |>
print()
## # A tibble: 1 × 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 0.0631 0.1
For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
set.seed(12345678)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1)) |>
select(Turnover)
myseries |>
gg_tsdisplay(plot_type = "partial") +
ggtitle("Australian Turnover")
#Testing initial data
myseries |>
features(Turnover, unitroot_kpss) |>
print()
## # A tibble: 1 × 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 5.72 0.01
#Applying Box-Cox transform
lambda <- myseries |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
bc_myseries <- myseries |>
mutate(Turnover = box_cox(myseries$Turnover, lambda))
bc_myseries |>
gg_tsdisplay(plot_type = "partial") +
ggtitle(paste("Australian Turnover w/ Box-Cox = ", as.character(lambda)))
#Testing Box-Cox transform
bc_myseries |>
features(Turnover, unitroot_kpss) |>
print()
## # A tibble: 1 × 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 5.43 0.01
#Finding suggested difference
bc_myseries |>
features(Turnover, unitroot_ndiffs) |>
print()
## # A tibble: 1 × 1
## ndiffs
## <int>
## 1 1
#Applying difference
diff_bc_myseries <- bc_myseries |>
mutate(Turnover = difference(Turnover))
diff_bc_myseries |>
gg_tsdisplay(plot_type = "partial") +
ggtitle(paste("Australian Turnover w/ Box-Cox = ", as.character(lambda), " & Diff = 1"))
#Testing differenced data
diff_bc_myseries |>
features(Turnover, unitroot_kpss) |>
print()
## # A tibble: 1 × 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 0.0243 0.1
Simulate and plot some data from simple ARIMA models.
set.seed(314156295)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim |> autoplot()
for(i in 2:100)
y[i] <- 0.0*y[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) |> autoplot() + ggtitle(TeX('$\\phi_1 = 0$'))
for(i in 2:100)
y[i] <- 0.2*y[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) |> autoplot() + ggtitle(TeX('$\\phi_1 = 0.2$'))
for(i in 2:100)
y[i] <- 0.4*y[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) |> autoplot() + ggtitle(TeX('$\\phi_1 = 0.4$'))
sim |> autoplot() + ggtitle(TeX('$\\phi_1 = 0.6$'))
for(i in 2:100)
y[i] <- 0.8*y[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) |> autoplot() + ggtitle(TeX('$\\phi_1 = 0.8$'))
for(i in 2:100)
y[i] <- 1.0*y[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) |> autoplot() + ggtitle(TeX('$\\phi_1 = 1.0$'))
set.seed(314156295)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*e[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim |> autoplot()
for(i in 2:100)
y[i] <- 0.0*e[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) |> autoplot() + ggtitle(TeX('$\\theta_1 = 0$'))
for(i in 2:100)
y[i] <- 0.2*e[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) |> autoplot() + ggtitle(TeX('$\\theta_1 = 0.2$'))
for(i in 2:100)
y[i] <- 0.4*e[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) |> autoplot() + ggtitle(TeX('$\\theta_1 = 0.4$'))
sim |> autoplot() + ggtitle(TeX('$\\theta_1 = 0.6$'))
for(i in 2:100)
y[i] <- 0.8*e[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) |> autoplot() + ggtitle(TeX('$\\theta_1 = 0.8$'))
for(i in 2:100)
y[i] <- 1.0*e[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) |> autoplot() + ggtitle(TeX('$\\theta_1 = 1.0$'))
set.seed(314156295)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + 0.6*e[i-1] + e[i]
arma11 <- tsibble(idx = seq_len(100), y = y, index = idx)
set.seed(314156295)
y <- numeric(100)
e <- rnorm(100)
for(i in 3:100)
y[i] <- -0.8*y[i-1] + 0.3*y[i-2] + e[i]
ar2 <- tsibble(idx = seq_len(100), y = y, index = idx)
arma11 |> autoplot() + ggtitle(TeX('ARMA(1,1) $\\phi_1 = 0.6$, $\\theta_1 = 0.6$, $\\sigma^2 = 1$'))
ar2 |> autoplot() + ggtitle(TeX('AR(2) $\\phi_1 = -0.8$, $\\phi_2 = 0.3$, $\\sigma^2 = 1$'))
Consider aus_airpassengers
, the total number of
passengers (in millions) from Australian air carriers for the period
1970-2011.
ARIMA()
to find an appropriate ARIMA model. What
model was selected. Check that the residuals look like white noise. Plot
forecasts for the next 10 periods.ARIMA(0,2,1) was selected and the residuals do indeed look like white noise with a normal distribution and the ACF being within the dashed bounds.
arima_air <- aus_airpassengers|>
model(arima = ARIMA(Passengers)) |>
print()
## # A mable: 1 x 1
## arima
## <model>
## 1 <ARIMA(0,2,1)>
arima_air |>
select(arima) |>
gg_tsresiduals()
arima_air |>
forecast(h=10) |>
filter(.model=='arima') |>
autoplot(aus_airpassengers)
Write the model in terms of the backshift operator. \[\begin{align*} (1-B)^2y_t &= c+(1+\theta_1B)\varepsilon_t \\ (1-2B+B^2)y_t &= c+(1+\theta_1B)\varepsilon_t \\ y_t-2By_t+B^2y_t &= c+\varepsilon_t+\theta_1B\varepsilon_t \\ y_t-2y_{t-1}+y_{t-2} &= c+\varepsilon_t+\theta_1\varepsilon_{t-1} \\ y_t &= 2y_{t-1}-y_{t-2} + c+\varepsilon_t+\theta_1\varepsilon_{t-1} \end{align*}\]
Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.
The slope of the forecast is slightly shallower and the error projection has a different shape but it remains largely the same.
arima_air <- aus_airpassengers|>
model(arima = ARIMA(Passengers~ 1+pdq(0,1,0)))
arima_air |>
forecast(h=10) |>
filter(.model=='arima') |>
autoplot(aus_airpassengers)
This model looks a lot like c but instead of being linear it has a bit of a wobble likely due to the increase in the moving average order. Removing the constant leads to an error in the ARIMA model, the constant is needed for this order selection.
arima_air <- aus_airpassengers|>
model(arima = ARIMA(Passengers~ 1+pdq(2,1,2)))
arima_air |>
forecast(h=10) |>
filter(.model=='arima') |>
autoplot(aus_airpassengers)
arima_air <- aus_airpassengers|>
model(arima = ARIMA(Passengers~ pdq(2,1,2)))
The model appears to have a bit more of an exponential shape to it rather than the linear models from a and c. The error bounds appear to be smaller as well but it is hard to tell if this is from the model itself or because the scale of the axes changed.
arima_air <- aus_airpassengers|>
model(arima = ARIMA(Passengers~ 1+pdq(0,2,1)))
arima_air |>
forecast(h=10) |>
filter(.model=='arima') |>
autoplot(aus_airpassengers)
For the United States GDP series (from
global_economy
):
us_gdp <- global_economy |>
filter(Country == 'United States') |>
select(GDP)
us_gdp |> autoplot()
lambda <- us_gdp |>
features(GDP, features = guerrero) |>
pull(lambda_guerrero)
bc_us_gdp <- us_gdp |>
mutate(GDP = box_cox(us_gdp$GDP, lambda))
bc_us_gdp |>
gg_tsdisplay(plot_type = "partial") +
ggtitle(paste("Australian Turnover w/ Box-Cox = ", as.character(lambda)))
ARIMA()
arima_us_gdp <- bc_us_gdp |>
model(arima = ARIMA()) |>
print()
## # A mable: 1 x 1
## arima
## <model>
## 1 <ARIMA(1,1,0) w/ drift>
arima_us_gdp <- bc_us_gdp |>
model(arima100 = ARIMA(GDP ~ 1+pdq(1,0,0)),
arima011 = ARIMA(GDP ~ 1+pdq(0,1,1)),
arima001 = ARIMA(GDP ~ 1+pdq(0,0,1)),
arima210 = ARIMA(GDP ~ 1+pdq(2,1,0)),
arima220 = ARIMA(GDP ~ 1+pdq(2,2,0)),
arima120 = ARIMA(GDP ~ 1+pdq(1,2,0)),
arima = ARIMA()) |>
print()
## # A mable: 1 x 7
## arima100 arima011 arima001
## <model> <model> <model>
## 1 <ARIMA(1,0,0) w/ mean> <ARIMA(0,1,1) w/ drift> <ARIMA(0,0,1) w/ mean>
## # ℹ 4 more variables: arima210 <model>, arima220 <model>, arima120 <model>,
## # arima <model>
arima_us_gdp |>
glance() |>
arrange(AICc) |>
print()
## # A tibble: 7 × 8
## .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 arima 5479. -325. 657. 657. 663. <cpl [1]> <cpl [0]>
## 2 arima220 6592. -324. 656. 657. 665. <cpl [2]> <cpl [0]>
## 3 arima120 6898. -326. 658. 658. 664. <cpl [1]> <cpl [0]>
## 4 arima011 5689. -326. 659. 659. 665. <cpl [0]> <cpl [1]>
## 5 arima210 5580. -325. 659. 659. 667. <cpl [2]> <cpl [0]>
## 6 arima100 57172. -402. 810. 811. 817. <cpl [1]> <cpl [0]>
## 7 arima001 4263605. -526. 1058. 1058. 1064. <cpl [0]> <cpl [1]>
I chose the recommended ARIMA model because it has the best values for AIC, AICc and BIC. The residuals looks good and the acf values are well within the bounds of the blue dashed lines.
arima_us_gdp |>
select(arima) |>
gg_tsresiduals()
e) produce forecasts of your fitted model. Do the forecasts look
reasonable?
The forecast looks extremely reasonable, like a natural extension of the graph.
arima_us_gdp |>
forecast(h=10) |>
filter(.model=='arima') |>
autoplot(bc_us_gdp)
f) compare the results with what you would obtain using
ETS()
(with no transformation).
It seems that ARIMA has much tighter error bounds for the forecast than the ETS forecast.
bc_us_gdp |>
model(ETS(GDP)) |>
forecast(h = 10) |>
autoplot(bc_us_gdp)