remove(list = ls())
library(fpp3)
library(tidyverse)
conflicted::conflicts_prefer(fabletools::report)
conflicted::conflicts_prefer(dplyr::filter)
conflicted::conflicts_prefer(lubridate::year)
conflicted::conflicts_prefer(lubridate::quarter)ARIMA Models: Theory and Implementation
Introduction to ARIMA Models
ARIMA (AutoRegressive Integrated Moving Average) models are a cornerstone of time series forecasting. They combine three key components to capture different patterns in temporal data:
- AR (AutoRegressive): Uses past values to predict future values
- I (Integrated): Applies differencing to achieve stationarity
- MA (Moving Average): Uses past forecast errors to improve predictions
ARIMA Notation: ARIMA(p, d, q)
The general ARIMA model is denoted as ARIMA(p, d, q) where:
- p: Order of autoregression (number of lagged observations)
- d: Degree of differencing (number of times data is differenced)
- q: Order of moving average (number of lagged forecast errors)
Mathematical Representation
The ARIMA(p, d, q) model can be written as:
\[(1 - \phi_1L - \phi_2L^2 - ... - \phi_pL^p)(1-L)^d y_t = c + (1 + \theta_1L + \theta_2L^2 + ... + \theta_qL^q)\epsilon_t\]
Where: - \(L\) is the lag operator - \(\phi_i\) are the autoregressive parameters - \(\theta_j\) are the moving average parameters - \(\epsilon_t\) is white noise - \(c\) is a constant
Key Concepts in ARIMA Modeling
1. Stationarity
A stationary time series has: - Constant mean over time - Constant variance over time - Autocovariance that doesn’t depend on time
Testing for Stationarity
2. Differencing
Differencing removes trends and achieves stationarity:
- First difference: \(\Delta y_t = y_t - y_{t-1}\) (removes linear trend)
- Second difference: \(\Delta^2 y_t = \Delta y_t - \Delta y_{t-1}\) (removes quadratic trend)
- Seasonal difference: \(\Delta_s y_t = y_t - y_{t-s}\) (removes seasonal pattern)
# Demonstrate differencing
aus_production %>%
mutate(
diff_beer = difference(Beer),
diff2_beer = difference(Beer, differences = 2),
seasonal_diff = difference(Beer, lag = 4)
) %>%
pivot_longer(
cols = c(Beer, diff_beer, diff2_beer, seasonal_diff),
names_to = "transformation",
values_to = "value"
) %>%
autoplot(value) +
facet_wrap(~ transformation, scales = "free_y", ncol = 2) +
labs(title = "Effects of Different Types of Differencing")3. ACF and PACF Analysis
The ACF (Autocorrelation Function) and PACF (Partial Autocorrelation Function) help identify ARIMA parameters:
- ACF: Helps identify MA(q) order
- PACF: Helps identify AR(p) order
Model Identification Guidelines
Box-Jenkins Method
- Plot the data: Identify trends, seasonality, and outliers
- Transform if needed: Apply log or Box-Cox transformation for variance stabilization
- Difference if needed: Achieve stationarity through differencing
- Examine ACF/PACF: Identify potential p and q values
- Estimate models: Fit several candidate models
- Diagnostic checking: Verify residuals are white noise
- Forecast: Use the best model for prediction
Pattern Recognition Table
| ACF Pattern | PACF Pattern | Suggested Model |
|---|---|---|
| Exponential decay or damped sine wave | Significant spike at lag 1, no correlation for other lags | AR(1) |
| Exponential decay or damped sine wave | Significant spikes at lags 1 to p | AR(p) |
| Significant spike at lag 1, no correlation for other lags | Exponential decay or damped sine wave | MA(1) |
| Significant spikes at lags 1 to q | Exponential decay or damped sine wave | MA(q) |
| Exponential decay | Exponential decay | ARMA(p,q) |
ARIMA Model Fitting in R
Manual ARIMA Specification
# Fit a specific ARIMA model
beer_model <- aus_production %>%
model(
arima_210 = ARIMA(Beer ~ 0 + pdq(2,1,0)),
arima_011 = ARIMA(Beer ~ 0 + pdq(0,1,1)),
arima_111 = ARIMA(Beer ~ 0 + pdq(1,1,1))
)
# Compare models using information criteria
glance(beer_model) %>%
arrange(AICc) %>%
select(.model, AIC, AICc, BIC)Automatic ARIMA Selection
The ARIMA() function in fpp3 automatically selects the best ARIMA model based on information criteria.
# Automatic ARIMA selection
auto_arima_model <- aus_production %>%
model(
auto = ARIMA(Beer),
stepwise = ARIMA(Beer, stepwise = TRUE),
search = ARIMA(Beer, stepwise = FALSE)
)
# Display selected models
auto_arima_modelSeries: Beer
Model: ARIMA(1,1,2)(0,1,1)[4]
Coefficients:
ar1 ma1 ma2 sma1
0.0495 -1.0091 0.3746 -0.7434
s.e. 0.1959 0.1826 0.1530 0.0502
sigma^2 estimated as 241.3: log likelihood=-886.41
AIC=1782.82 AICc=1783.11 BIC=1799.63
Understanding Auto ARIMA Algorithm
The Hyndman-Khandakar Algorithm
The automatic ARIMA selection follows these steps:
-
Determine differencing order (d):
- Uses unit root tests (KPSS test)
- Typically d ∈ {0, 1, 2}
-
Determine seasonal differencing (D) for seasonal data:
- Uses seasonal unit root tests
- Usually D ∈ {0, 1}
-
Select p and q by minimizing information criteria:
- Default uses AICc (corrected AIC)
- Can also use AIC or BIC
-
Stepwise search (default) or exhaustive search:
- Stepwise: Starts with simple model and adds/removes terms
- Exhaustive: Tests all combinations within constraints
# Demonstrate that auto ARIMA and best ARIMA give same results
set.seed(123)
tsibble_data <- aus_production %>%
filter(year(Quarter) >= 1990)
# Method 1: Auto ARIMA (automatic selection)
auto_model <- tsibble_data %>%
model(auto_arima = ARIMA(Beer))
# Method 2: Grid search to find best ARIMA
# Define search space
p_values <- 0:3
d_values <- 0:2
q_values <- 0:3
# Create all combinations
arima_grid <- expand.grid(p = p_values, d = d_values, q = q_values)
# Fit all models
all_models <- map_dfr(1:nrow(arima_grid), function(i) {
tryCatch({
model_fit <- tsibble_data %>%
model(ARIMA(Beer ~ 0 + pdq(arima_grid$p[i],
arima_grid$d[i],
arima_grid$q[i])))
tibble(
p = arima_grid$p[i],
d = arima_grid$d[i],
q = arima_grid$q[i],
AICc = glance(model_fit)$AICc
)
}, error = function(e) {
tibble(p = NA, d = NA, q = NA, AICc = NA)
})
})
# Find best model based on AICc
best_manual <- all_models %>%
filter(!is.na(AICc)) %>%
arrange(AICc) %>%
slice(1)
cat("Best model from grid search: ARIMA(",
best_manual$p, ",", best_manual$d, ",", best_manual$q, ")\n", sep = "")Best model from grid search: ARIMA(0,1,2)
cat("AICc:", best_manual$AICc, "\n\n")AICc: 641.4609
Auto ARIMA model:
report(auto_model)Series: Beer
Model: ARIMA(1,0,2)(0,1,1)[4]
Coefficients:
ar1 ma1 ma2 sma1
0.9805 -1.0080 0.2221 -0.6351
s.e. 0.0300 0.1182 0.1161 0.1019
sigma^2 estimated as 217.2: log likelihood=-319.53
AIC=649.06 AICc=649.89 BIC=660.84
cat("\nBest ARIMA from grid search:\n")
Best ARIMA from grid search:
report(best_model)Series: Beer
Model: ARIMA(0,1,2)(0,1,1)[4]
Coefficients:
ma1 ma2 sma1
-1.0202 0.2196 -0.6285
s.e. 0.1139 0.1136 0.1017
sigma^2 estimated as 215.7: log likelihood=-316.45
AIC=640.91 AICc=641.46 BIC=650.28
# Verify they produce the same forecasts
auto_forecast <- auto_model %>%
forecast(h = 8)
best_forecast <- best_model %>%
forecast(h = 8)
comparison <- bind_rows(
auto_forecast %>% mutate(method = "Auto ARIMA"),
best_forecast %>% mutate(method = "Best ARIMA")
)
comparison %>%
autoplot(tsibble_data) +
labs(title = "Comparison of Auto ARIMA and Best ARIMA Forecasts",
subtitle = "Both methods produce identical results")Seasonal ARIMA Models
SARIMA Notation: ARIMA(p,d,q)(P,D,Q)[m]
Seasonal ARIMA extends regular ARIMA to handle seasonal patterns:
- (p,d,q): Non-seasonal part
- (P,D,Q): Seasonal part
- [m]: Seasonal period (e.g., 12 for monthly, 4 for quarterly)
mdl_df [1 × 2] (S3: mdl_df/tbl_df/tbl/data.frame)
$ auto_seasonal : lst_mdl [1:1]
..$ :List of 5
.. ..$ fit :List of 6
.. .. ..$ par : tibble [4 × 5] (S3: tbl_df/tbl/data.frame)
.. .. .. ..$ term : chr [1:4] "ar1" "ma1" "ma2" "sma1"
.. .. .. ..$ estimate : num [1:4] 0.981 -1.008 0.222 -0.635
.. .. .. ..$ std.error: Named num [1:4] 0.03 0.118 0.116 0.102
.. .. .. .. ..- attr(*, "names")= chr [1:4] "ar1" "ma1" "ma2" "sma1"
.. .. .. ..$ statistic: Named num [1:4] 32.65 -8.53 1.91 -6.23
.. .. .. .. ..- attr(*, "names")= chr [1:4] "ar1" "ma1" "ma2" "sma1"
.. .. .. ..$ p.value : Named num [1:4] 3.04e-47 8.82e-13 5.94e-02 2.20e-08
.. .. .. .. ..- attr(*, "names")= chr [1:4] "ar1" "ma1" "ma2" "sma1"
.. .. ..$ est : tibble [82 × 3] (S3: tbl_df/tbl/data.frame)
.. .. .. ..$ .fitted : num [1:82] 485 441 429 598 482 ...
.. .. .. ..$ .resid : num [1:82] 0.485 0.441 0.429 0.599 -17.756 ...
.. .. .. ..$ .regression_resid: num [1:82] 485 441 429 599 464 424 436 574 443 410 ...
.. .. ..$ fit : tibble [1 × 7] (S3: tbl_df/tbl/data.frame)
.. .. .. ..$ sigma2 : Named num 217
.. .. .. .. ..- attr(*, "names")= chr "year"
.. .. .. ..$ log_lik : num -320
.. .. .. ..$ AIC : num 649
.. .. .. ..$ AICc : Named num 650
.. .. .. .. ..- attr(*, "names")= chr "year"
.. .. .. ..$ BIC : Named num 661
.. .. .. .. ..- attr(*, "names")= chr "year"
.. .. .. ..$ ar_roots:List of 1
.. .. .. .. ..$ : cplx 1.02+0i
.. .. .. ..$ ma_roots:List of 1
.. .. .. .. ..$ : cplx [1:6] -2.68e-17+1.12i -1.12+4.34e-17i -8.81e-17-1.12i ...
.. .. ..$ tsp :List of 2
.. .. .. ..$ range : qtr [1:2] 1990 Q1, 2010 Q2
.. .. .. .. ..@ fiscal_start: num 1
.. .. .. ..$ interval: interval [1:1] 1Q
.. .. .. .. ..@ .regular: logi TRUE
.. .. ..$ spec :'data.frame': 1 obs. of 8 variables:
.. .. .. ..$ p : int 1
.. .. .. ..$ d : int 0
.. .. .. ..$ q : int 2
.. .. .. ..$ P : int 0
.. .. .. ..$ D : int 1
.. .. .. ..$ Q : int 1
.. .. .. ..$ constant: logi FALSE
.. .. .. ..$ period : num 4
.. .. .. ..- attr(*, "out.attrs")=List of 2
.. .. .. .. ..$ dim : Named int [1:7] 6 3 6 3 2 3 2
.. .. .. .. .. ..- attr(*, "names")= chr [1:7] "p" "d" "q" "P" ...
.. .. .. .. ..$ dimnames:List of 7
.. .. .. .. .. ..$ p : chr [1:6] "p=0" "p=1" "p=2" "p=3" ...
.. .. .. .. .. ..$ d : chr [1:3] "d=0" "d=1" "d=2"
.. .. .. .. .. ..$ q : chr [1:6] "q=0" "q=1" "q=2" "q=3" ...
.. .. .. .. .. ..$ P : chr [1:3] "P=0" "P=1" "P=2"
.. .. .. .. .. ..$ D : chr [1:2] "D=0" "D=1"
.. .. .. .. .. ..$ Q : chr [1:3] "Q=0" "Q=1" "Q=2"
.. .. .. .. .. ..$ constant: chr [1:2] "constant=TRUE" "constant=FALSE"
.. .. ..$ model:List of 16
.. .. .. ..$ coef : Named num [1:4] 0.981 -1.008 0.222 -0.635
.. .. .. .. ..- attr(*, "names")= chr [1:4] "ar1" "ma1" "ma2" "sma1"
.. .. .. ..$ sigma2 : Named num 217
.. .. .. .. ..- attr(*, "names")= chr "year"
.. .. .. ..$ var.coef : num [1:4, 1:4] 0.000902 -0.001086 -0.000384 -0.000374 -0.001086 ...
.. .. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. .. ..$ : chr [1:4] "ar1" "ma1" "ma2" "sma1"
.. .. .. .. .. ..$ : chr [1:4] "ar1" "ma1" "ma2" "sma1"
.. .. .. ..$ mask : logi [1:4] TRUE TRUE TRUE TRUE
.. .. .. ..$ loglik : num -320
.. .. .. ..$ aic : num 649
.. .. .. ..$ arma : int [1:7] 1 2 0 1 4 0 1
.. .. .. ..$ residuals: Time-Series [1:82] from 1 to 21.2: 0.485 0.441 0.429 0.599 -17.756 ...
.. .. .. ..$ call : language .f(x = ..1, order = ..2, seasonal = ..3, xreg = ..4, include.mean = FALSE, fixed = ..7, method = ..5)
.. .. .. ..$ series : chr "y"
.. .. .. ..$ code : int 0
.. .. .. ..$ n.cond : int 0
.. .. .. ..$ nobs : int 78
.. .. .. ..$ model :List of 10
.. .. .. .. ..$ phi : num 0.981
.. .. .. .. ..$ theta: num [1:6] -1.008 0.222 0 -0.635 0.64 ...
.. .. .. .. ..$ Delta: num [1:4] 0 0 0 1
.. .. .. .. ..$ Z : num [1:11] 1 0 0 0 0 0 0 0 0 0 ...
.. .. .. .. ..$ a : num [1:11] -24 17.422 0.359 7.881 6.587 ...
.. .. .. .. ..$ P : num [1:11, 1:11] 0.00 0.00 8.33e-17 -2.71e-17 0.00 ...
.. .. .. .. ..$ T : num [1:11, 1:11] 0.981 0 0 0 0 ...
.. .. .. .. ..$ V : num [1:11, 1:11] 1 -1.008 0.222 0 -0.635 ...
.. .. .. .. ..$ h : num 0
.. .. .. .. ..$ Pn : num [1:11, 1:11] 1.00 -1.01 2.22e-01 3.20e-09 -6.35e-01 ...
.. .. .. ..$ bic : Named num 661
.. .. .. .. ..- attr(*, "names")= chr "year"
.. .. .. ..$ aicc : Named num 650
.. .. .. .. ..- attr(*, "names")= chr "year"
.. .. .. ..- attr(*, "class")= chr "Arima"
.. .. ..- attr(*, "class")= chr "ARIMA"
.. ..$ model :Classes 'mdl_defn', 'R6' <mdl_defn>
Public:
add_data: function (.data)
check: function (.data)
clone: function (deep = FALSE)
data: NULL
env: environment
extra: list
formula: name
initialize: function (formula, ..., .env)
model: ARIMA
origin: 1990 Q1
prepare: function (...)
print: function (...)
recall_lag: function (x, n = 1L, ...)
recent_data: NULL
remove_data: function ()
specials: environment
stage: NULL
train: function (.data, specials, ic = "aicc", selection_metric = function(x) x[[ic]],
.. ..$ data : tbl_ts [82 × 2] (S3: tbl_ts/tbl_df/tbl/data.frame)
.. .. ..$ Quarter: qtr [1:82] 1990 Q1, 1990 Q2, 1990 Q3, 1990 Q4, 1991 Q1, 1991 ...
.. .. ..$ Beer : num [1:82] 485 441 429 599 464 424 436 574 443 410 ...
.. .. ..- attr(*, "key")= tibble [1 × 1] (S3: tbl_df/tbl/data.frame)
.. .. .. ..$ .rows:List of 1
.. .. .. .. ..$ : int [1:82] 1 2 3 4 5 6 7 8 9 10 ...
.. .. ..- attr(*, "index")= chr "Quarter"
.. .. .. ..- attr(*, "ordered")= logi TRUE
.. .. ..- attr(*, "index2")= chr "Quarter"
.. .. ..- attr(*, "interval")= interval [1:1] 1Q
.. .. .. ..@ .regular: logi TRUE
.. ..$ response :List of 1
.. .. ..$ : symbol Beer
.. ..$ transformation:List of 1
.. .. ..$ :function (Beer)
.. .. .. ..- attr(*, "class")= chr "transformation"
.. .. .. ..- attr(*, "inverse")=function (Beer)
.. ..- attr(*, "class")= chr "mdl_ts"
$ manual_seasonal: lst_mdl [1:1]
..$ :List of 5
.. ..$ fit :List of 6
.. .. ..$ par : tibble [4 × 5] (S3: tbl_df/tbl/data.frame)
.. .. .. ..$ term : chr [1:4] "ar1" "ma1" "sar1" "sma1"
.. .. .. ..$ estimate : num [1:4] 0.984 -0.834 -0.082 -0.549
.. .. .. ..$ std.error: Named num [1:4] 0.0239 0.0703 0.1822 0.164
.. .. .. .. ..- attr(*, "names")= chr [1:4] "ar1" "ma1" "sar1" "sma1"
.. .. .. ..$ statistic: Named num [1:4] 41.17 -11.86 -0.45 -3.35
.. .. .. .. ..- attr(*, "names")= chr [1:4] "ar1" "ma1" "sar1" "sma1"
.. .. .. ..$ p.value : Named num [1:4] 1.14e-54 3.80e-19 6.54e-01 1.25e-03
.. .. .. .. ..- attr(*, "names")= chr [1:4] "ar1" "ma1" "sar1" "sma1"
.. .. ..$ est : tibble [82 × 3] (S3: tbl_df/tbl/data.frame)
.. .. .. ..$ .fitted : num [1:82] 485 441 429 598 482 ...
.. .. .. ..$ .resid : num [1:82] 0.485 0.441 0.429 0.599 -17.751 ...
.. .. .. ..$ .regression_resid: num [1:82] 485 441 429 599 464 424 436 574 443 410 ...
.. .. ..$ fit : tibble [1 × 7] (S3: tbl_df/tbl/data.frame)
.. .. .. ..$ sigma2 : Named num 226
.. .. .. .. ..- attr(*, "names")= chr "year"
.. .. .. ..$ log_lik : num -321
.. .. .. ..$ AIC : num 652
.. .. .. ..$ AICc : Named num 653
.. .. .. .. ..- attr(*, "names")= chr "year"
.. .. .. ..$ BIC : Named num 664
.. .. .. .. ..- attr(*, "names")= chr "year"
.. .. .. ..$ ar_roots:List of 1
.. .. .. .. ..$ : cplx [1:5] 1.02+2.02e-28i -1.32+1.32i -1.32-1.32i ...
.. .. .. ..$ ma_roots:List of 1
.. .. .. .. ..$ : cplx [1:5] -2.94e-17+1.16i -1.16-1.12e-16i 6.97e-16-1.16i ...
.. .. ..$ tsp :List of 2
.. .. .. ..$ range : qtr [1:2] 1990 Q1, 2010 Q2
.. .. .. .. ..@ fiscal_start: num 1
.. .. .. ..$ interval: interval [1:1] 1Q
.. .. .. .. ..@ .regular: logi TRUE
.. .. ..$ spec :'data.frame': 1 obs. of 8 variables:
.. .. .. ..$ p : num 1
.. .. .. ..$ d : num 0
.. .. .. ..$ q : num 1
.. .. .. ..$ P : num 1
.. .. .. ..$ D : num 1
.. .. .. ..$ Q : num 1
.. .. .. ..$ constant: logi FALSE
.. .. .. ..$ period : num 4
.. .. .. ..- attr(*, "out.attrs")=List of 2
.. .. .. .. ..$ dim : Named int [1:7] 1 1 1 1 1 1 2
.. .. .. .. .. ..- attr(*, "names")= chr [1:7] "p" "d" "q" "P" ...
.. .. .. .. ..$ dimnames:List of 7
.. .. .. .. .. ..$ p : chr "p=1"
.. .. .. .. .. ..$ d : chr "d=0"
.. .. .. .. .. ..$ q : chr "q=1"
.. .. .. .. .. ..$ P : chr "P=1"
.. .. .. .. .. ..$ D : chr "D=1"
.. .. .. .. .. ..$ Q : chr "Q=1"
.. .. .. .. .. ..$ constant: chr [1:2] "constant=TRUE" "constant=FALSE"
.. .. ..$ model:List of 16
.. .. .. ..$ coef : Named num [1:4] 0.984 -0.834 -0.082 -0.549
.. .. .. .. ..- attr(*, "names")= chr [1:4] "ar1" "ma1" "sar1" "sma1"
.. .. .. ..$ sigma2 : Named num 226
.. .. .. .. ..- attr(*, "names")= chr "year"
.. .. .. ..$ var.coef : num [1:4, 1:4] 0.000571 -0.000832 0.000231 -0.000441 -0.000832 ...
.. .. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. .. ..$ : chr [1:4] "ar1" "ma1" "sar1" "sma1"
.. .. .. .. .. ..$ : chr [1:4] "ar1" "ma1" "sar1" "sma1"
.. .. .. ..$ mask : logi [1:4] TRUE TRUE TRUE TRUE
.. .. .. ..$ loglik : num -321
.. .. .. ..$ aic : num 652
.. .. .. ..$ arma : int [1:7] 1 1 1 1 4 0 1
.. .. .. ..$ residuals: Time-Series [1:82] from 1 to 21.2: 0.485 0.441 0.429 0.599 -17.751 ...
.. .. .. ..$ call : language .f(x = ..1, order = ..2, seasonal = ..3, xreg = ..4, include.mean = FALSE, fixed = ..7, method = ..5)
.. .. .. ..$ series : chr "y"
.. .. .. ..$ code : int 0
.. .. .. ..$ n.cond : int 0
.. .. .. ..$ nobs : int 78
.. .. .. ..$ model :List of 10
.. .. .. .. ..$ phi : num [1:5] 0.9842 0 0 -0.082 0.0807
.. .. .. .. ..$ theta: num [1:5] -0.834 0 0 -0.549 0.458
.. .. .. .. ..$ Delta: num [1:4] 0 0 0 1
.. .. .. .. ..$ Z : num [1:10] 1 0 0 0 0 0 0 0 0 1
.. .. .. .. ..$ a : num [1:10] -24 12.84 5.41 7.07 5.85 ...
.. .. .. .. ..$ P : num [1:10, 1:10] 0.00 6.66e-16 -1.25e-15 2.47e-16 5.55e-16 ...
.. .. .. .. ..$ T : num [1:10, 1:10] 0.9842 0 0 -0.082 0.0807 ...
.. .. .. .. ..$ V : num [1:10, 1:10] 1 -0.834 0 0 -0.549 ...
.. .. .. .. ..$ h : num 0
.. .. .. .. ..$ Pn : num [1:10, 1:10] 1.00 -8.34e-01 -4.87e-12 6.45e-11 -5.49e-01 ...
.. .. .. ..$ bic : Named num 664
.. .. .. .. ..- attr(*, "names")= chr "year"
.. .. .. ..$ aicc : Named num 653
.. .. .. .. ..- attr(*, "names")= chr "year"
.. .. .. ..- attr(*, "class")= chr "Arima"
.. .. ..- attr(*, "class")= chr "ARIMA"
.. ..$ model :Classes 'mdl_defn', 'R6' <mdl_defn>
Public:
add_data: function (.data)
check: function (.data)
clone: function (deep = FALSE)
data: NULL
env: environment
extra: list
formula: formula
initialize: function (formula, ..., .env)
model: ARIMA
origin: 1990 Q1
prepare: function (...)
print: function (...)
recall_lag: function (x, n = 1L, ...)
recent_data: NULL
remove_data: function ()
specials: environment
stage: NULL
train: function (.data, specials, ic = "aicc", selection_metric = function(x) x[[ic]],
.. ..$ data : tbl_ts [82 × 2] (S3: tbl_ts/tbl_df/tbl/data.frame)
.. .. ..$ Quarter: qtr [1:82] 1990 Q1, 1990 Q2, 1990 Q3, 1990 Q4, 1991 Q1, 1991 ...
.. .. ..$ Beer : num [1:82] 485 441 429 599 464 424 436 574 443 410 ...
.. .. ..- attr(*, "key")= tibble [1 × 1] (S3: tbl_df/tbl/data.frame)
.. .. .. ..$ .rows:List of 1
.. .. .. .. ..$ : int [1:82] 1 2 3 4 5 6 7 8 9 10 ...
.. .. ..- attr(*, "index")= chr "Quarter"
.. .. .. ..- attr(*, "ordered")= logi TRUE
.. .. ..- attr(*, "index2")= chr "Quarter"
.. .. ..- attr(*, "interval")= interval [1:1] 1Q
.. .. .. ..@ .regular: logi TRUE
.. ..$ response :List of 1
.. .. ..$ : symbol Beer
.. ..$ transformation:List of 1
.. .. ..$ :function (Beer)
.. .. .. ..- attr(*, "class")= chr "transformation"
.. .. .. ..- attr(*, "inverse")=function (Beer)
.. ..- attr(*, "class")= chr "mdl_ts"
- attr(*, "key")= tibble [1 × 1] (S3: tbl_df/tbl/data.frame)
..$ .rows: list<int> [1:1]
.. ..$ : int 1
.. ..@ ptype: int(0)
- attr(*, "model")= chr [1:2] "auto_seasonal" "manual_seasonal"
- attr(*, "response")= chr "Beer"
Model Diagnostics
Residual Analysis
Good ARIMA models should have residuals that are:
- Uncorrelated (white noise)
- Normally distributed (approximately)
- Zero mean
- Constant variance
Forecasting with ARIMA
Point Forecasts and Prediction Intervals
Cross-Validation and Accuracy Assessment
# Time series cross-validation
cv_results <- tsibble_data %>%
stretch_tsibble(.init = 20, .step = 1) %>%
model(arima = ARIMA(Beer)) %>%
forecast(h = 4)
# Calculate accuracy metrics
cv_accuracy <- cv_results %>%
accuracy(tsibble_data)
cv_accuracy %>%
group_by(.model) %>%
summarise(
RMSE = mean(RMSE),
MAE = mean(MAE),
MAPE = mean(MAPE),
MASE = mean(MASE)
)Advanced Topics
1. ARIMAX: ARIMA with External Regressors
# 1. Prepare data ----------------------------------------------------------
enhanced_data <- aus_production %>%
filter(year(Quarter) >= 1990) %>%
mutate(
temperature = rnorm(n(), mean = 20, sd = 5),
holiday = if_else(quarter(Quarter) == 4, 1, 0)
)
# 2. Fit models ------------------------------------------------------------
arimax_model <- enhanced_data %>%
model(
arima_base = ARIMA(Beer),
arimax = ARIMA(Beer ~ temperature + holiday)
)
# 3. Compare fit statistics ------------------------------------------------
glance(arimax_model) %>%
select(.model, AIC, AICc, BIC)# 5. Create future regressors ---------------------------------------------
future_regressors <- enhanced_data %>%
new_data(12) %>%
mutate(
temperature = rnorm(12, mean = 20, sd = 5),
holiday = if_else(quarter(Quarter) == 4, 1, 0)
)
# 6. Generate forecasts with regressors ------------------------------------
arimax_forecasts <- arimax_model %>%
select(arimax) %>%
forecast(new_data = future_regressors)
# Plot forecasts
arimax_forecasts %>%
autoplot(enhanced_data) +
labs(title = "ARIMAX Forecasts for Beer Production",
subtitle = "12-quarter ahead forecast with 80% and 95% prediction intervals",
y = "Megalitres")2. Fourier Terms for Complex Seasonality
# ARIMA with Fourier terms for complex seasonal patterns
fourier_model <- tsibble_data %>%
model(
arima_fourier = ARIMA(Beer ~ fourier(K = 2))
)
report(fourier_model)Series: Beer
Model: LM w/ ARIMA(0,1,2)(1,0,1)[4] errors
Coefficients:
ma1 ma2 sar1 sma1 fourier(K = 2)C1_4 fourier(K = 2)S1_4
-1.0149 0.2281 0.8716 -0.5851 9.5225 -58.2040
s.e. 0.1111 0.1143 0.0968 0.1449 5.2062 5.2777
fourier(K = 2)C2_4
-16.4747
s.e. 4.5939
sigma^2 estimated as 217.9: log likelihood=-330.71
AIC=677.42 AICc=679.42 BIC=696.58
3. Model Comparison and Ensemble
# Fit multiple models
all_models <- tsibble_data %>%
model(
arima = ARIMA(Beer),
ets = ETS(Beer),
snaive = SNAIVE(Beer)
)
# Generate forecasts from all models
all_forecasts <- all_models %>%
forecast(h = 8)
# Plot comparison
all_forecasts %>%
autoplot(tsibble_data, level = NULL) +
labs(title = "Comparison of Different Forecasting Methods",
y = "Megalitres")Summary: Key Takeaways
ARIMA vs Auto ARIMA vs Best ARIMA
- Manual ARIMA: You specify the (p,d,q) orders based on analysis
- Auto ARIMA: Algorithm automatically selects optimal orders
- Best ARIMA: Exhaustive search through parameter space
Important: Auto ARIMA and exhaustive search for the best ARIMA produce the same model because: - Both optimize the same criterion (usually AICc) - Both search the same parameter space - The only difference is the search strategy (stepwise vs exhaustive)
When to Use Each Approach
-
Use Manual ARIMA when:
- You have domain knowledge about the process
- You want to test specific hypotheses
- You need a parsimonious model
-
Use Auto ARIMA when:
- You need quick, reliable results
- You’re processing many series
- You want data-driven model selection
-
Use Grid Search when:
- You want to ensure global optimum
- Computational time is not a constraint
- You need to document all alternatives
Best Practices
- Always check residuals: Even the “best” model may have issues
- Consider transformations: Log or Box-Cox for variance stabilization
- Don’t over-difference: Usually d ≤ 2 is sufficient
- Use holdout validation: Test on unseen data
- Consider ensemble methods: Combine multiple models
- Update models regularly: Refit as new data arrives
Common Pitfalls to Avoid
- Over-parameterization: More complex isn’t always better
- Ignoring seasonality: Use seasonal ARIMA for seasonal data
- Extrapolating too far: ARIMA assumes patterns continue
- Ignoring structural breaks: Check for regime changes
- Blind trust in auto selection: Always validate results
References and Further Reading
- Hyndman, R.J., & Athanasopoulos, G. (2021). Forecasting: principles and practice, 3rd edition, OTexts.
- Box, G. E. P., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). Time series analysis: forecasting and control. John Wiley & Sons.
- Shumway, R. H., & Stoffer, D. S. (2017). Time series analysis and its applications. Springer.