Do the exercises 9.1, 9.2, 9.3, 9.5, 9.6, 9.7, 9.8 in Hyndman. (https://otexts.com/fpp3/)
library(fpp3)
library(tidyverse)
library(gridExtra)
library(urca)
Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
Explain the differences among these figures. Do they all indicate that the data are white noise? The variations among these figures lie in the fact that the significant thresholds (the horizontal dashed lines) differ between the models. All of them indicate white noise because, for white noise series, we anticipate each autocorrelation to be close to zero, which these are.
Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise? Critical values are at varying distances from the zero mean because white noise data is expected to fall within the bounds of +-2/(sqrt(T)). As T increases (36, 360, 1000), the bounds or limits shrink, leading to smaller critical values.
A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
gafa_stock_Amazon <- gafa_stock %>%
filter(Symbol == "AMZN")
gafa_stock_Amazon %>%
autoplot(Close)+
labs(y="Close", title="Closing Prices for Amazon Stock")
p1 <- gafa_stock_Amazon %>%
ACF(Close) %>%
autoplot() +
labs(title="ACF Closing Prices for Amazon Stock")
p2 <-gafa_stock_Amazon %>%
PACF(Close) %>%
autoplot() +
labs(title="PACF Closing Prices for Amazon Stock")
grid.arrange(p1, p2, nrow = 1)
In the graph depicting the closing prices for Amazon stock, a positive trend is evident. In the ACF graph, a gradual decrease of data towards zero is observed. The PACF graph exhibits a prominent initial spike. These observations indicate non-stationary series, implying that they need to be differenced to attain stationarity.
For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
global_economy_Turkey <- global_economy %>%
filter(Country == "Turkey")
global_economy_Turkey %>%
autoplot(GDP)+
labs(title="Turkish GDP")
lambda_Turkey <- global_economy_Turkey %>%
features(GDP, features = guerrero) %>%
pull(lambda_guerrero)
global_economy_Turkey %>%
autoplot(box_cox(GDP, lambda_Turkey)) +
labs(y = "",title = latex2exp::TeX(paste0(
"Transformed Turkish GDP with $\\lambda = ",
round(lambda_Turkey,2))))
global_economy_Turkey %>%
features(box_cox(GDP, lambda_Turkey), unitroot_ndiffs)
## # A tibble: 1 × 2
## Country ndiffs
## <fct> <int>
## 1 Turkey 1
aus_accommodation_Tasmania <- aus_accommodation%>%
filter(State == "Tasmania")
aus_accommodation_Tasmania %>%
autoplot(Takings)+
labs(title="Accommodation Takings in the State of Tasmania")
lambda_Tasmania <- aus_accommodation_Tasmania %>%
features(Takings, features = guerrero) %>%
pull(lambda_guerrero)
aus_accommodation_Tasmania %>%
autoplot(box_cox(Takings, lambda_Tasmania)) +
labs(y = "",title = latex2exp::TeX(paste0(
"Transformed Accommodation Takings in the State of Tasmania with $\\lambda = ",
round(lambda_Tasmania,2))))
aus_accommodation_Tasmania %>%
features(box_cox(Takings, lambda_Tasmania), unitroot_ndiffs)
## # A tibble: 1 × 2
## State ndiffs
## <chr> <int>
## 1 Tasmania 1
souvenirs_sales <- souvenirs
souvenirs_sales %>%
autoplot(Sales)+
labs(title="Sales from Souvenirs")
lambda_sales <- souvenirs_sales %>%
features(Sales, features = guerrero) %>%
pull(lambda_guerrero)
souvenirs_sales %>%
autoplot(box_cox(Sales, lambda_sales)) +
labs(y = "",title = latex2exp::TeX(paste0(
"Transformed Sales from Souvenirs with $\\lambda = ",
round(lambda_sales,2))))
souvenirs_sales %>%
features(box_cox(Sales, lambda_sales), unitroot_ndiffs)
## # A tibble: 1 × 1
## ndiffs
## <int>
## 1 1
For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
set.seed(12345678)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
autoplot(myseries,Turnover)
lambda_turnover <- myseries %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
myseries %>%
autoplot(box_cox(Turnover, lambda_turnover)) +
labs(y = "",title = latex2exp::TeX(paste0(
"Transformed with $\\lambda = ",
round(lambda_turnover,2))))
myseries %>%
features(box_cox(Turnover, lambda_turnover), unitroot_ndiffs)
## # A tibble: 1 × 3
## State Industry ndiffs
## <chr> <chr> <int>
## 1 Northern Territory Clothing, footwear and personal accessory retailing 1
myseries %>%
autoplot(difference(box_cox(Turnover, lambda_turnover), 12))
p3 <- myseries %>%
ACF(difference(box_cox(Turnover, lambda_turnover), 12)) %>%
autoplot() +
labs(title="ACF")
p4 <- myseries %>%
PACF(difference(box_cox(Turnover, lambda_turnover), 12)) %>%
autoplot() +
labs(title="PACF")
grid.arrange(p3, p4, nrow = 1)
Simulate and plot some data from simple ARIMA models.
set.seed(24)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
set.seed(24)
generate_tsibble <- function(phi) {
y <- numeric(100)
e <- rnorm(100)
y[1] <- e[1]
for(i in 2:100) {
y[i] <- phi * y[i-1] + e[i]
}
tsibble(idx = seq_len(100), y = y, index = idx)
}
phi_1 <- generate_tsibble(0.6)
phi_2 <- generate_tsibble(-0.6)
phi_3 <- generate_tsibble(0.01)
phi_4 <- generate_tsibble(0.99)
phi_5 <- generate_tsibble(0)
set.seed(24)
phi_values <- c(0.6, -0.6, 0.01, 0.99, 0)
plot_list <- list()
for (i in seq_along(phi_values)) {
plot_list[[i]] <- generate_tsibble(phi_values[i]) %>%
autoplot(y) +
labs(paste("Phi =", phi_values[i]))
}
p5 <- plot_list[[1]]
p6 <- plot_list[[2]]
p7 <- plot_list[[3]]
p8 <- plot_list[[4]]
p9 <- plot_list[[5]]
grid.arrange(p5, p6, p7, p8, p9, nrow = 2)
When phi is close to or equal to zero, the graph resembles
white noise. As phi approaches one, the graph resembles a random walk,
and when phi is less than zero, it tends to oscillate around the
mean.
set.seed(24)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- e[i] + 0.6 * e[i-1]
sim_ma <- tsibble(idx = seq_len(100), y = y, index = idx)
set.seed(24)
generate_tsibble2 <- function(theta) {
y <- numeric(100)
e <- rnorm(100)
y[1] <- e[1]
for(i in 2:100) {
y[i] <- e[i] + theta * e[i-1]
}
tsibble(idx = seq_len(100), y = y, index = idx)
}
theta_1 <- generate_tsibble2(0.6)
theta_2 <- generate_tsibble2(-0.6)
theta_3 <- generate_tsibble2(0.01)
theta_4 <- generate_tsibble2(0.99)
theta_5 <- generate_tsibble2(0)
set.seed(24)
theta_values <- c(0.6, -0.6, 0.01, 0.99, 0)
plot_list2 <- list()
for (i in seq_along(theta_values)) {
plot_list2[[i]] <- generate_tsibble2(theta_values[i]) %>%
autoplot(y) +
labs(paste("Theta =", theta_values[i]))
}
p10 <- plot_list2[[1]]
p11 <- plot_list2[[2]]
p12 <- plot_list2[[3]]
p13 <- plot_list2[[4]]
p14 <- plot_list2[[5]]
grid.arrange(p10, p11, p12, p13, p14, nrow = 2)
When theta is close to zero or zero itself, the MA(1) model
behaves more like white noise. When theta_1 is positive, the plot might
exhibit a dampened oscillatory pattern. When theta_1 is negative, the
plot may show an amplified oscillatory behavior.
set.seed(24)
y <- numeric(100)
e <- rnorm(100)
for(i in 3:100){
y[i] <- 0.6*y[i-1] + 0.6*e[i-1] + e[i]
}
arme <- tsibble(idx = seq_len(100), y = y, index = idx)
set.seed(24)
y <- numeric(100)
e <- rnorm(100)
for(i in 3:100){
y[i] <- -0.8*y[i-1] + 0.3*y[i-2] + e[i]
}
ar2 <- tsibble(idx = seq_len(100), y = y, index = idx)
autoplot(arme) +
labs(title="ARMA(1,1) Model")
autoplot(ar2) +
labs(title="AR(2) Model")
Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.
autoplot(aus_airpassengers)
fit_passengers <- aus_airpassengers %>%
model(ARIMA(Passengers))
report(fit_passengers)
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8963
## s.e. 0.0594
##
## sigma^2 estimated as 4.308: log likelihood=-97.02
## AIC=198.04 AICc=198.32 BIC=201.65
fit_passengers %>% gg_tsresiduals()
fit_passengers %>%
forecast(h = 10) %>%
autoplot(aus_airpassengers)
b. Write the model in terms of the backshift operator.
ARIMA(0,2,1)the backshift operator as: (1 - (theta)B^2)Y_t = (error term
at time t)
fit_passengers2 <- aus_airpassengers %>%
model(ARIMA(Passengers ~ 1 + pdq(0,1,0)))
report(fit_passengers2)
## Series: Passengers
## Model: ARIMA(0,1,0) w/ drift
##
## Coefficients:
## constant
## 1.4191
## s.e. 0.3014
##
## sigma^2 estimated as 4.271: log likelihood=-98.16
## AIC=200.31 AICc=200.59 BIC=203.97
fit_passengers2 %>%
forecast(h = 10) %>%
autoplot(aus_airpassengers)
The forecasts of part a and c appear quite similar, showing an
upward trend.
fit_passengers3 <- aus_airpassengers %>%
model(ARIMA(Passengers ~ 1 + pdq(2,1,2)))
report(fit_passengers3)
## Series: Passengers
## Model: ARIMA(2,1,2) w/ drift
##
## Coefficients:
## ar1 ar2 ma1 ma2 constant
## -0.5518 -0.7327 0.5895 1.0000 3.2216
## s.e. 0.1684 0.1224 0.0916 0.1008 0.7224
##
## sigma^2 estimated as 4.031: log likelihood=-96.23
## AIC=204.46 AICc=206.61 BIC=215.43
fit_passengers3 %>%
forecast(h = 10) %>%
autoplot(aus_airpassengers)
This forecast seems to incorporate some curvature, unlike
parts A and C where only a positive linear trend is present.
Nonetheless, it remains very similar to the other forecasts.
fit_passengers4 <- aus_airpassengers %>%
model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))
report(fit_passengers4)
## Series: Passengers
## Model: NULL model
## NULL model
When the constant is removed, the model becomes non-stationary and null.
fit_passengers5 <- aus_airpassengers %>%
model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))
## Warning: Model specification induces a quadratic or higher order polynomial trend.
## This is generally discouraged, consider removing the constant or reducing the number of differences.
report(fit_passengers5)
## Series: Passengers
## Model: ARIMA(0,2,1) w/ poly
##
## Coefficients:
## ma1 constant
## -1.0000 0.0571
## s.e. 0.0585 0.0213
##
## sigma^2 estimated as 3.855: log likelihood=-95.1
## AIC=196.21 AICc=196.79 BIC=201.63
R discourages forecasts with quadratic trends in ARIMA models because introducing higher-order polynomial trends can lead to overfitting and increased complexity.
For the United States GDP series (from global_economy):
global_economy_United_States <- global_economy %>%
filter(Country == "United States")
global_economy_United_States %>%
autoplot(GDP)+
labs(title="United States GDP")
if necessary, find a suitable Box-Cox transformation for the data; The Box-Cox transformation is unnecessary because the variance of the time series data remains relatively stable over time.
fit a suitable ARIMA model to the transformed data using ARIMA();
fit_United_States <- global_economy_United_States %>%
model(ARIMA(GDP))
report(fit_United_States)
## Series: GDP
## Model: ARIMA(0,2,2)
##
## Coefficients:
## ma1 ma2
## -0.4206 -0.3048
## s.e. 0.1197 0.1078
##
## sigma^2 estimated as 2.615e+22: log likelihood=-1524.08
## AIC=3054.15 AICc=3054.61 BIC=3060.23
fit_models <- global_economy_United_States %>%
model("ARIMA(0,2,2)" = ARIMA(GDP),
"ARIMA(0,2,1)" = ARIMA(GDP ~ pdq(0, 2, 1)),
"ARIMA(0,2,0)" = ARIMA(GDP ~ pdq(0, 2, 0)))
report(fit_models)
## # A tibble: 3 × 9
## Country .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 United States ARIMA(0,2,2) 2.61e22 -1524. 3054. 3055. 3060. <cpl> <cpl>
## 2 United States ARIMA(0,2,1) 2.92e22 -1528. 3059. 3059. 3063. <cpl> <cpl>
## 3 United States ARIMA(0,2,0) 3.10e22 -1530. 3061. 3061. 3063. <cpl> <cpl>
fit_models %>%
forecast(h = 10) %>%
autoplot(global_economy_United_States) +
ggtitle("US GDP Forecast")
fit_United_States %>%
gg_tsresiduals() +
labs("ARIMA(0, 2, 2) Residual Plots")
fit_United_States %>%
forecast(h = 10) %>%
autoplot(global_economy_United_States) +
ggtitle("US GDP Forecast")
fit_compare <- global_economy_United_States %>%
model(ARIMA = ARIMA(GDP),
ETS = ETS(GDP))
glance(fit_compare)
## # A tibble: 2 × 12
## Country .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots MSE
## <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list> <dbl>
## 1 United S… ARIMA 2.61e+22 -1524. 3054. 3055. 3060. <cpl> <cpl> NA
## 2 United S… ETS 6.78e- 4 -1590. 3191. 3192. 3201. <NULL> <NULL> 2.79e22
## # ℹ 2 more variables: AMSE <dbl>, MAE <dbl>
fit_compare %>%
forecast(h = 10) %>%
autoplot(global_economy_United_States) +
ggtitle("US GDP Forecast")