This report is a summary of lesson by Rob J.Hyndman, Data Camp

library(tidyverse)
library(astsa)
library(xts)
library(readxl)
library(ggfortify)
library(fpp2)
library(gt)

theme_set(theme_bw())
mydata <- read_excel("exercise1.xlsx")

1 Exploring and visualizing time series

1.0.1 Creating time series objects in R

Time series plots

  • autoplot(facets = ...)
head(mydata)
## # A tibble: 6 × 4
##   ...1   Sales AdBudget   GDP
##   <chr>  <dbl>    <dbl> <dbl>
## 1 Mar-81 1020.     659.  252.
## 2 Jun-81  889.     589   291.
## 3 Sep-81  795      512.  291.
## 4 Dec-81 1004.     614.  292.
## 5 Mar-82 1058.     647.  279.
## 6 Jun-82  944.     602   254
# Create a ts object called myts
myts <- ts(mydata[, 2:4], start = c(1981, 1), frequency = 4)

autoplot(myts, facets = F)

Seasonal plots

  • ggseasonplot()
  • ggsubseriesplot()
autoplot(AirPassengers)

ggseasonplot(AirPassengers)

ggseasonplot(AirPassengers, polar = TRUE)

ggsubseriesplot(AirPassengers)

1.2 White noise

Just randomness. “i.i.d data”로 trend, seasonality, cycle 모두 존재하지 않음

1.2.1 WN ACF

  • 모든 lag에서 유의미한 autocorr이 관측되지 않음
set.seed(3)
wn <- ts(rnorm(36))
autoplot(wn)

ggAcf(wn) +
  ggtitle("Sample ACF for white noise")


1.2.2 Ljung-Box text

The Ljung-Box test considers the first h autocorrelation values together

A significant test(samll p-value) indicates the data are probably not white noise

  • Box.test(data, lag = ..., fitdf = ..., type = ...)

Case 1. pigs

ggAcf(pigs)

Box.test(pigs, lag = 24, fitdf = 0, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  pigs
## X-squared = 634.15, df = 24, p-value < 2.2e-16

The p-value is very small, suggesting that this is not a white noise


Case 2. goog

autoplot(goog)

autoplot(diff(goog))

ggAcf(diff(goog))

Box.test(diff(goog), lag = 10, fitdf = 0, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  diff(goog)
## X-squared = 13.123, df = 10, p-value = 0.2169

The p-value is not significant, suggesting that this is a white noise


2 Benchmark methods and forecast accuracy

2.1 Forecasts and potential futures

  • forecast is the mean or median of simulated futures of a time series

2.1.1 Forecast intervals

  • 80% forecast interval should contain 80% of future observations
  • 95% forecast interval should contain 95% of future observations

2.1.2 Naive forecasting methods

Simple methods

  • naive method: use only the most recent observation as the forecast for all future periods
  • mean method: use the average of all observations as the forecast for all future periods

2.2 Fitted values and residuals

A fitted value is the forecast of an observation using all previous observations

  • That is, they are one-step forecasts
    • 각 시점에서 이전 데이터를 이용해 예측한 값으로 이는 실제 미래 예측이 아니라 모델의 성능을 평가하는 용도로 사용됨
  • Often not true forecasts since parameters are estimated on all data
    • fitted values는 실제 예측값이 아닌데 그 이유는 우리가 모르는 데이터가 있기 때문에 단지 추정일 뿐임


A residual is the difference between an observation and its fitted value

  • That is, they are one-step forecast errors

■ Residuals should look like white noise

Essential assumptions:

  • They should be uncorrelated
  • They should have mean zero

Useful properties(for computing prediction intervals)

  • They should have constant variance
  • They should be normally distributed

To test these assumptions using the checkresiduals() function

# Check the residuals from the naive forecasts applied to the goog series
goog %>% naive() %>% checkresiduals()

## 
##  Ljung-Box test
## 
## data:  Residuals from Naive method
## Q* = 13.123, df = 10, p-value = 0.2169
## 
## Model df: 0.   Total lags used: 10
# Check the residuals from the seasonal naive forecasts applied to the ausbeer series
ausbeer %>% snaive() %>% checkresiduals()

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 60.535, df = 8, p-value = 3.661e-10
## 
## Model df: 0.   Total lags used: 8

goog는 WN, ausbeer은 WN이 아니다.


2.3 Training and test sets

2.3.1 Forcasts errors

Forcasts “error” = the difference between observed value and its forecast in the test set.

\(\neq resiuals\)

  • which are errors on the training set (vs. test set)
  • which are based on one-step forecasts (vs. multi-step)

2.3.2 Measures of forecast accuracy

  • MAE, MSE
    • depends on the scale of the data
    • can’t compare forecast accuracy between two series on very different scales
  • MAPE
    • better for comparison, but need some assumptions
      • data are all positive and have no zeros or small values
      • there is a natural zero(can’t be used with temperature forecasts)
  • MASE
    • scaled MSE

In all cases, a small value indicates a better forcast.

  • accuracy(forecast_object, actual_data): compute all of these measures

Evaluating forecast accuracy of non-seasonal methods: gold

# Create the training data as train
train <- subset(gold, end = 1000)

# Compute naive forecasts and save to naive_fc
naive_fc <- naive(train, h = 108)

# Compute mean forecasts and save to mean_fc
mean_fc <- meanf(train, h = 108)

# Use accuracy() to compute RMSE statistics
accuracy(naive_fc, gold)
##                      ME      RMSE      MAE        MPE      MAPE     MASE
## Training set  0.1079897  6.358087  3.20366  0.0201449 0.8050646 1.014334
## Test set     -6.5383495 15.842361 13.63835 -1.7462269 3.4287888 4.318139
##                    ACF1 Theil's U
## Training set -0.3086638        NA
## Test set      0.9793153  5.335899
accuracy(mean_fc, gold)
##                         ME     RMSE      MAE       MPE      MAPE      MASE
## Training set -4.239671e-15 59.17809 53.63397 -2.390227 14.230224 16.981449
## Test set      1.319363e+01 19.55255 15.66875  3.138577  3.783133  4.960998
##                   ACF1 Theil's U
## Training set 0.9907254        NA
## Test set     0.9793153  6.123788
# Assign one of the two forecasts as bestforecasts
bestforecasts <- naive_fc

Evaluating forecast accuracy of seasonal methods: visnights

# Create three training series omitting the last 1, 2, and 3 years
train1 <- window(visnights[, "VICMetro"], end = c(2015, 4))
train2 <- window(visnights[, "VICMetro"], end = c(2014, 4))
train3 <- window(visnights[, "VICMetro"], end = c(2013, 4))

# Produce forecasts using snaive()
fc1 <- snaive(train1, h = 4)
fc2 <- snaive(train2, h = 4)
fc3 <- snaive(train3, h = 4)

# Use accuracy() to compare the MAPE of each series
accuracy(fc1, visnights[, "VICMetro"])["Test set", "MAPE"]
## [1] 4.691658
accuracy(fc2, visnights[, "VICMetro"])["Test set", "MAPE"]
## [1] 2.503255
accuracy(fc3, visnights[, "VICMetro"])["Test set", "MAPE"]
## [1] 11.92176

2.4 Time series cross-validation

[1 Step]

[2 Steps]

[3 Steps]

  • tsCV(): MSE using time series cross-validation, forecast errors 반환
sq <- function(u) {u^2}

tsCV(oil, forecastfunction = naive, h = 10) %>% 
  sq() %>% 
  colMeans(na.rm = TRUE)
##       h=1       h=2       h=3       h=4       h=5       h=6       h=7       h=8 
##  2355.753  5734.838  9842.239 14299.997 18560.887 23264.410 26932.799 30766.136 
##       h=9      h=10 
## 32892.200 32986.214

The MSE increases with the forecast horizon


3. Exponential smoothing

3.1 Exponentially weighted forecasts

3.1.1 Simple exponential smoothing [SES]

Forecasting Notation:

\(\hat{y}_{t+h|t}\) = point forest of \(\hat{y}_{t+h}\) given data \(y_1, ..., t_t\)

Forecast Equation:

\(\hat{y}_{t+h|t} = ay_t + a(1-a)y_{t-1} + a(1-a)^2y_{t-2} = ...\)  

\(where 0 \leq a \leq 1\)


  • 시계열이 길어지더라도 모든 관측치를 반영하지만 가중치가 감소해서 최근치를 더 많이 반영하는 방식
  • \(a\)가 커질수록 가장 최근 관측치 가중치도 커지고 더 빠르게 가중치가 감소

  • \(l_t\) is the level (or the smoothed value) of the series at time \(t\)
  • We choose \(a\) and \(l_0\) by minimizing SSE:

\(SSE = \sum_{t=1}^{T} (y_t - \hat{y}_{t|t-1})\)

  • ses() function
oildata <- window(oil, start = 1996)
fc <- ses(oildata, h = 5)
summary(fc)
## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
## ses(y = oildata, h = 5)
## 
##   Smoothing parameters:
##     alpha = 0.8339 
## 
##   Initial states:
##     l = 446.5868 
## 
##   sigma:  29.8282
## 
##      AIC     AICc      BIC 
## 178.1430 179.8573 180.8141 
## 
## Error measures:
##                    ME     RMSE     MAE      MPE     MAPE      MASE        ACF1
## Training set 6.401975 28.12234 22.2587 1.097574 4.610635 0.9256774 -0.03377748
## 
## Forecasts:
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2014       542.6806 504.4541 580.9070 484.2183 601.1429
## 2015       542.6806 492.9073 592.4539 466.5589 618.8023
## 2016       542.6806 483.5747 601.7864 452.2860 633.0752
## 2017       542.6806 475.5269 609.8343 439.9778 645.3834
## 2018       542.6806 468.3452 617.0159 428.9945 656.3667
autoplot(fc) +
  ylab("Oil (million of tonnes)") +
  xlab("Year") +
  autolayer(fitted(fc))

\(a = 0.8339\)로 예상치의 83%가 최근 관측치에 기반해 계산되었으며 \(initial~l_0=446.5868\)
\(a\)가 높기 때문에 예측치도 기존 관측치와 비슷한 수준을 보이고 있음


3.1.2 SES vs. naive

# Create a training set using subset()
train <- subset(marathon, end = length(marathon) - 20)

# Compute SES and naive forecasts, save to fcses and fcnaive
fcses <- ses(train, h = 20)
fcnaive <- naive(train, h = 20)

# Calculate forecast accuracy measures
accuracy(fcses, marathon)
##                      ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -1.0851741 5.863790 4.155948 -0.8603998 2.827993 0.8990906
## Test set      0.4574579 2.493971 1.894237  0.3171919 1.463862 0.4097960
##                     ACF1 Theil's U
## Training set -0.01595953        NA
## Test set     -0.12556096 0.6870735
accuracy(fcnaive, marathon)
##                      ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -0.4638047 6.904742 4.622391 -0.4086317 3.123559 1.0000000
## Test set      0.2266667 2.462113 1.846667  0.1388780 1.429608 0.3995047
##                    ACF1 Theil's U
## Training set -0.3589323        NA
## Test set     -0.1255610 0.6799062
# Save the best forecasts as fcbest
fcbest <- fcnaive

RMSEfcses, fcnaive 각각 2.49, 2.46으로 fcnaive가 better model!!!


3.2 Exponential smoothing with trend

3.2.1 Holt’s linear trend

  • The \(\beta^*\) parameter controls how quickly the slope can change
  • Two smoothing parameters \(\alpha\) and \(\beta^*\) where \(0~\le~\alpha\) and \(\beta^*~\le~1\)
  • A small \(\beta^*\) value means that the slope hardly changes
    • so the data will have a trend that is close to linear throughts the series
  • A large \(\beta^*\) value means the slope changes rapidly, allowing for highly nonlinear trend
  • Four parameter to estimate for minimizing SSE
    1. \(\alpha\)
    2. \(\beta^*\)
    3. \(l_0\)
    4. \(b_0\)

  • holt() function

3.2.2 Damped trend method

  • A variation of Holt’s method
  • Damping parameter \(0~<~\phi~<1\)
  • If \(\phi=1\), identical to Holt’s linear trend
  • To allow the trend to dampen over time
  • Under this method, the short-run forecasts are trended, but the long-run forecasts are constant

  • holt(damped = TRUE) function

Holt’s trend methods

# Produce 10 year forecasts of austa using holt()
fcholt <- holt(austa, h = 10)

# Look at fitted model using summary()
summary(fcholt)
## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
## holt(y = austa, h = 10)
## 
##   Smoothing parameters:
##     alpha = 0.9999 
##     beta  = 0.0085 
## 
##   Initial states:
##     l = 0.656 
##     b = 0.1706 
## 
##   sigma:  0.1952
## 
##      AIC     AICc      BIC 
## 17.14959 19.14959 25.06719 
## 
## Error measures:
##                      ME      RMSE       MAE       MPE     MAPE      MASE
## Training set 0.00372838 0.1840662 0.1611085 -1.222083 5.990319 0.7907078
##                   ACF1
## Training set 0.2457733
## 
## Forecasts:
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2016       7.030683 6.780483 7.280882 6.648036 7.413330
## 2017       7.202446 6.847114 7.557778 6.659013 7.745879
## 2018       7.374209 6.937169 7.811249 6.705814 8.042604
## 2019       7.545972 7.039179 8.052765 6.770899 8.321045
## 2020       7.717736 7.148723 8.286748 6.847506 8.587965
## 2021       7.889499 7.263543 8.515455 6.932181 8.846816
## 2022       8.061262 7.382302 8.740222 7.022882 9.099642
## 2023       8.233025 7.504136 8.961915 7.118285 9.347766
## 2024       8.404788 7.628444 9.181133 7.217472 9.592105
## 2025       8.576552 7.754792 9.398311 7.319779 9.833324
# Plot the forecasts
autoplot(fcholt)

# Check that the residuals look like white noise
checkresiduals(fcholt)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt's method
## Q* = 4.8886, df = 7, p-value = 0.6736
## 
## Model df: 0.   Total lags used: 7

3.3 Exponential smoothing with trend and seasonality

Holt-Winters’ method has two version.

3.3.1 Holt-Winter’s additive method

  • \(S_{t-m+h_m^+}\) = seasonal component from final year of data
  • Smoothing parameters:

\(0 \le \alpha \le 1,~0 \le \beta^* \le 1,~0 \le \gamma \le 1-\alpha\)

  • \(m\) = period of seasonality(e.g. \(m=4\) for quarterly data)
  • Seasonal component averages zero

3.3.2 Holt-Winter’s multiplicative method

seasonal variation이 점점 더 커지는 경우

  • Seasonal component averages one

  • hw(seasonal = "additive" or "multiplicative") function

3.3.3 Taxonomy of exponential smoothing methods

Trend Component N (None) A (Additive) M (Multiplicative)
N (None) (N, N) (N, A) (N, M)
N (Additive) (A, N) (A, A) (A, M)
A_d (Additive damped) (A_d, N) (A_d, A) (A_d, M)
Model (Trend, Seasonality) Description Function
(N, N) Simple exponential smoothing ses()
(A, N) Holt’s linear method holt()
(A_d, N) Additive damped trend method hw()
(A, A) Additive Holt-Winters’ method hw()
(A, M) Multiplicative Holt-Winters’ method hw()
(A_d, M) Damped multiplicative Holt-Winters’ method hw()

Holt-Winters with monthly data

# Produce 3 year forecasts
fc <- hw(a10, seasonal = "multiplicative", h = 36)

# Check if residuals look like white noise
checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' multiplicative method
## Q* = 75.764, df = 24, p-value = 2.835e-07
## 
## Model df: 0.   Total lags used: 24
whitenoise <- FALSE

# Plot forecasts
autoplot(fc)

Exercise: Holt-Winters method with daily data

# Create training data with subset()
train <- subset(hyndsight, end = 337)

# Holt-Winters additive forecasts as fchw
fchw <- hw(train, seasonal = "additive", h = 28)

# Seasonal naive forecasts as fcsn
fcsn <- snaive(train, h = 28)

# Find better forecasts with accuracy()
accuracy(fchw, hyndsight)
##                     ME     RMSE      MAE       MPE    MAPE      MASE      ACF1
## Training set -3.976241 228.2440 165.0244 -2.407211 13.9955 0.7492131 0.1900853
## Test set     -3.999460 201.7656 152.9584 -3.218292 10.5558 0.6944332 0.3013328
##              Theil's U
## Training set        NA
## Test set     0.4868701
accuracy(fcsn, hyndsight)
##                 ME     RMSE      MAE        MPE     MAPE      MASE      ACF1
## Training set 10.50 310.3282 220.2636 -2.1239387 18.01077 1.0000000 0.4255730
## Test set      0.25 202.7610 160.4643 -0.6888732 10.25880 0.7285101 0.3089795
##              Theil's U
## Training set        NA
## Test set      0.450266
# Plot the better forecasts
autoplot(fchw)

3.4 State space models for exponential smoothing

3.4.1 Innovations state-space models

  • Each exponential smoothing method can be written as an “innovations state-space model”
  • State-space model은 관측된 데이터 뒤에 숨겨진 상태(state)를 수학적으로 모델링하는 기법을 의미

  • ETS models: Error, Trend, Seasonal

3.4.2 ETS models

  • Parameters: estimated using the “likelihood”, the probability of the data arising from the specified model
  • For models with additive errors, this is equivalent to minimizing SSE
    • ETS model의 parameters는 데이터가 해당 모델에서 나올 확률(likelihood)를 최대로 만드는 방식으로 찾으며 특히 error가 additive하다면 이는 SSE를 최소화하는 것과 같음
  • Choose the best model by minimizing a corrected version of Akaike’s Information Criterion (\(AIC_c\))

  • ets() function
    • ts를 제공하면 \(AIC_c\)를 최소화하는 최상의 모델을 자동으로 반환

Example: Australian air traffic

autoplot(ausair)

ets(ausair)
## ETS(M,A,N) 
## 
## Call:
## ets(y = ausair)
## 
##   Smoothing parameters:
##     alpha = 0.9999 
##     beta  = 0.0269 
## 
##   Initial states:
##     l = 6.5431 
##     b = 0.7393 
## 
##   sigma:  0.0755
## 
##      AIC     AICc      BIC 
## 241.6910 243.1544 250.9417

ETS(M,A,N)으로 multiplicative errors, additive trend and no seasonality


Point

  • 매개변수가 holt()를 사용했을 때와 동일하게 추정되지만, “SSE”를 최소화하는 대신 “Likelihood”를 최대화하는 방식임
  • 또한, holt()와 다르게 ets()는 예측값을 계산해주지 않고 단지 model을 반환하기에 개별적으로 예측함수에 model을 적용시켜야함
ausair %>% ets() %>% forecast() %>% autoplot()

3.4.3 When does ETS fail? : lynx

Computing the ETS does not work well for all series.

# Plot the lynx series
autoplot(lynx)

# Use ets() to model the lynx series
fit <- ets(lynx)

# Use summary() to look at model and parameters
summary(fit)
## ETS(M,N,N) 
## 
## Call:
## ets(y = lynx)
## 
##   Smoothing parameters:
##     alpha = 0.9999 
## 
##   Initial states:
##     l = 2372.8047 
## 
##   sigma:  0.9594
## 
##      AIC     AICc      BIC 
## 2058.138 2058.356 2066.346 
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE     MAPE     MASE      ACF1
## Training set 8.975647 1198.452 842.0649 -52.12968 101.3686 1.013488 0.3677583
# Plot 20-year forecasts of the lynx series
fit %>% forecast(h = 20) %>% autoplot()

모든 ts에서 ETS가 제대로 작동하는 것은 아니다!!!


4. Forcasting with ARIMA models

ARIMA models provide another approach to time series forecasting. Exponential smoothing and ARIMA models are the two most widely-used approaches to time series forecasting, and provide complementary approaches to the problem. While exponential smoothing models are based on a description of the trend and seasonality in the data, ARIMA models aim to describe the autocorrelations in the data.

4.1 Transformations for variance stabilization

ETS 모델에서는 ts의 분산이 시간에 따라 확장되는 경우, multiplicative errormultiplicative seasonality를 사용했다. 이에 대한 다른 방법은 ts 자체를 transformation하는 것이다.

4.1.1 Variance stabilization

  • If the data show increasing variation as the level of the series increase, then a transformation can be useful
  • \(y_1, ..., y_n\): original observations, \(w1,...,wn\): transformed observations

4.1.2 Box-Cox transformations

  • Each of these transformations is close to a member of the family of Box-Cox transformations
  • 데이터의 정규성(Normality)분산 안정성(Variance Stability)을 개선하기 위해 사용되는 변환 기법

\[ w_t = \begin{cases} \log(y_t), & \lambda = 0 \\ \frac{y_t^\lambda - 1}{\lambda}, & \lambda \neq 0 \end{cases} \] * \(\lambda = 1\): no transformation * \(\lambda = 0\) * Just \(\log(y_t)\) * \(\lambda \neq 1\) * \(\frac{y_t^\lambda - 1}{\lambda}\) * \(\lambda\) 값에 따라 변환 정도가 달라진다. * \(\lambda = 1/2\): Square root plus linear transformation * \(\lambda = 1/3\): Cube root plus linear transformation * \(\lambda = -1\): Inverse transformation


  • BoxCox.lambda() function: returns an estimate of lambda that should roughly balance the variation across the series
BoxCox.lambda(a10)
## [1] 0.1313326

Non-seasonal differencing for stationarity

# Plot the US female murder rate
autoplot(wmurders)

# Plot the differenced murder rate
autoplot(diff(wmurders))

# Plot the ACF of the differenced murder rate
ggAcf(diff(wmurders))

Seasonal differencing for stationarity

# Plot the data
autoplot(h02)

# Take logs and seasonal differences of h02
difflogh02 <- diff(log(h02), lag = 12)

# Plot difflogh02
autoplot(difflogh02)

# Take another difference and plot
ddifflogh02 <- diff(difflogh02)
autoplot(ddifflogh02)

# Plot ACF of ddifflogh02
ggAcf(ddifflogh02)

변환 이후에도 여전히 데이터가 WN처럼 보이지는 않는다. 하지만, ARIMA model을 적용할 수 있다.


4.2 ARIMA models

Autoregressive (AR) models:

  • Mulitple regression with lagged observations as predicors
  • \(y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + ... + \phi_p y_{t-p} + e_t\)

Moving average (MA) models:

  • Multiple regression with lagged errors as predictors
  • \(y_t = c + \theta_1 e_{t-1} + \theta_2 e_{t-2} + ... + \theta_q e_{t-q}\)

Autoregressive moving average (ARMA) models:

  • Multiple regression with lagged observations and errors as predictors
  • \(y_t = c + \phi_1 y_{t-1} + ... + \phi_p y_{t-p} + \theta_1 e_{t-1} + ... + \theta_q e_{t-q} + e_t\)

ARIMA(p, d, q) models:

  • Combine ARMA model with d - lots of differencing

  • auto.arima() function: 자동으로 ARIMA 계수를 반환
    • ets()와 동일하게 \(AIC_c\)가 최소화시키는 모델을 반환하지만 \(AIC_c\)끼리는 비교가 불가능하다.
    • 즉 same class model간에서만 비교해야한다!!!
  • Arima(order = c(p, d, q), include.constant = T/F) function: 사용자 정의 ARIMA 모델 반환
    • order: set to a vector that specifies the values of \(p, d, q\)
    • include.constant: a boolean that determines if the constant \(c\), or drift, should be included

Automatic ARIMA models for non-seasonal time series

# Fit an automatic ARIMA model to the austa series
fit <- auto.arima(austa)

# Check that the residuals look like white noise
checkresiduals(fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1) with drift
## Q* = 2.297, df = 6, p-value = 0.8905
## 
## Model df: 1.   Total lags used: 7
residualsok <- TRUE

# Summarize the model
summary(fit)
## Series: austa 
## ARIMA(0,1,1) with drift 
## 
## Coefficients:
##          ma1   drift
##       0.3006  0.1735
## s.e.  0.1647  0.0390
## 
## sigma^2 = 0.03376:  log likelihood = 10.62
## AIC=-15.24   AICc=-14.46   BIC=-10.57
## 
## Training set error measures:
##                        ME      RMSE       MAE       MPE     MAPE      MASE
## Training set 0.0008313383 0.1759116 0.1520309 -1.069983 5.513269 0.7461559
##                      ACF1
## Training set -0.000571993
# Find the AICc value and the number of differences used
AICc <- -14.46
d <- 1

# Plot forecasts of fit
fit %>% forecast(h = 10) %>% autoplot()

Comparing auto.arima() and ets() on non-seasonal data

# Set up forecast functions for ETS and ARIMA models
fets <- function(x, h) {
  forecast(ets(x), h = h)
}
farima <- function(x, h) {
  forecast(auto.arima(x), h = h)
}

# Compute CV errors for ETS on austa as e1
e1 <- tsCV(austa, fets, 1)

# Compute CV errors for ARIMA on austa as e2
e2 <- tsCV(austa, farima, 1)

# Find MSE of each model class
mean(e1^2, na.rm = TRUE)
## [1] 0.05623684
mean(e2^2, na.rm = TRUE)
## [1] 0.04336277
# Plot 10-year forecasts using the best model class
austa %>% farima(h = 10) %>% autoplot()

4.3 Seasonal ARIMA models

\(ARIMA(p, d, q)(P, D, Q)_m\)

4.3.1 Example: Monthly retail debit card usage in Iceland

fit <- auto.arima(debitcards, lambda = 0)
fit
## Series: debitcards 
## ARIMA(0,1,4)(0,1,1)[12] 
## Box Cox transformation: lambda= 0 
## 
## Coefficients:
##           ma1     ma2     ma3      ma4     sma1
##       -0.7959  0.0856  0.2628  -0.1751  -0.8144
## s.e.   0.0825  0.0988  0.0999   0.0798   0.1118
## 
## sigma^2 = 0.002321:  log likelihood = 239.33
## AIC=-466.67   AICc=-466.08   BIC=-448.56
fit %>% 
  forecast(h = 36) %>% 
  autoplot() +
    xlab("Year")

Automatic ARIMA models for seasonal time series

# Check that the logged h02 data have stable variance
h02 %>% log() %>% autoplot()

# Fit a seasonal ARIMA model to h02 with lambda = 0
fit <- auto.arima(h02, lambda = 0)

# Summarize the fitted model
summary(fit)
## Series: h02 
## ARIMA(2,1,1)(0,1,2)[12] 
## Box Cox transformation: lambda= 0 
## 
## Coefficients:
##           ar1      ar2     ma1     sma1     sma2
##       -1.1358  -0.5753  0.3683  -0.5318  -0.1817
## s.e.   0.1608   0.0965  0.1884   0.0838   0.0881
## 
## sigma^2 = 0.004278:  log likelihood = 248.25
## AIC=-484.51   AICc=-484.05   BIC=-465
## 
## Training set error measures:
##                        ME      RMSE        MAE        MPE     MAPE      MASE
## Training set -0.003931805 0.0501571 0.03629816 -0.5323365 4.611253 0.5987988
##                      ACF1
## Training set -0.003740267
# Record the amount of lag-1 differencing and seasonal differencing used
d <- 1
D <- 1

# Plot 2-year forecasts
fit %>% forecast(h = 24) %>% autoplot()

Exploring auto.arima() options

  • auto.arima(stepwise = TF)
    • stepwise의 기본값은 TRUE로 빠르게 최적의 ARIMA 모델을 찾는다.
    • 속도가 빠른 대신, 최적의 모델을 놓칠 가능성이 있다.
    • FALSE인 경우 모든 가능한 모델을 평가하여 최적의 모델을 찾아 더 정확해진다. 하지만 계산비용이 증가한다.
# Find an ARIMA model for a10
fit1 <- auto.arima(a10)

# Don't use a stepwise search
fit2 <- auto.arima(a10, stepwise = FALSE)

# AICc of better model
AICc <- 523.55

# Compute 2-year forecasts from better model
fit2 %>% forecast(h=24) %>% autoplot()

Comparing auto.arima() and ets() on seasonal data

# Use 20 years of the qcement data beginning in 1988
train <- window(qcement, start = 1988, end = c(2007, 4))

# Fit an ARIMA and an ETS model to the training data
fit1 <- auto.arima(train)
fit2 <- ets(train)

# Check that both models have white noise residuals
checkresiduals(fit1)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,1)(2,1,1)[4] with drift
## Q* = 0.78311, df = 3, p-value = 0.8535
## 
## Model df: 5.   Total lags used: 8
checkresiduals(fit2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(M,N,M)
## Q* = 6.0952, df = 8, p-value = 0.6366
## 
## Model df: 0.   Total lags used: 8
# Produce forecasts for each model
fc1 <- forecast(fit1, h = 25)
fc2 <- forecast(fit2, h = 25)

# Use accuracy() to find better model based on RMSE
accuracy(fc1, qcement)
##                        ME      RMSE        MAE        MPE     MAPE      MASE
## Training set -0.006205705 0.1001195 0.07988903 -0.6704455 4.372443 0.5458078
## Test set     -0.158835253 0.1996098 0.16882205 -7.3332836 7.719241 1.1534049
##                     ACF1 Theil's U
## Training set -0.01133907        NA
## Test set      0.29170452 0.7282225
accuracy(fc2, qcement)
##                       ME      RMSE        MAE        MPE     MAPE      MASE
## Training set  0.01406512 0.1022079 0.07958478  0.4938163 4.371823 0.5437292
## Test set     -0.13495515 0.1838791 0.15395141 -6.2508975 6.986077 1.0518075
##                     ACF1 Theil's U
## Training set -0.03346295        NA
## Test set      0.53438371  0.680556
bettermodel <- fit2

RMSE가 각각 0.1996, 0.1838 이므로 ETS model이 better model!!!


5. Advanced methods

The time series models in the previous chapters work well for many time series, but they are often not good for weekly or hourly data, and they do not allow for the inclusion of other information such as the effects of holidays, competitor activity, changes in the law, etc. In this chapter, you will look at some methods that handle more complicated seasonality, and you consider how to extend ARIMA models in order to allow other information to be included in the them.

5.1 Dynamic regression

Dynamic regression is one way of combining external information with the history of the time series in a single model.

Regression model with ARIMA errors:

\(y_t = \beta_0 + \beta_1x_{1,t} + ... + \beta_rx_{r,t} + e_t\)

  • \(y_t\) modeled as function of \(r\) explanatory variables: \(x_{1,t},...,x_{r,t}\)
  • In dynamic regression, we allow \(e_t\) to be an ARIMA process
  • In ordinary regression, we assume that \(e_t\) is white noise

  • auto.arima(xreg = ...)
    • xreg: contains a matrix of predictor variables that you want to include in the model

5.1.1 US personal consumption and income

  • 일반적으로 소득이 감소하면 소비도 감소한다.
autoplot(uschange[,1:2], facets = TRUE) +
  xlab("Year") +
  ylab("") +
  ggtitle("Quarterly changes in US consumption and personal income")

as.data.frame(uschange) %>% 
  ggplot(aes(x = Income, y = Consumption)) +
  geom_point()

fit <- auto.arima(uschange[, "Consumption"],
                  xreg = uschange[, "Income"])
fit
## Series: uschange[, "Consumption"] 
## Regression with ARIMA(1,0,2) errors 
## 
## Coefficients:
##          ar1      ma1     ma2  intercept    xreg
##       0.6922  -0.5758  0.1984     0.5990  0.2028
## s.e.  0.1159   0.1301  0.0756     0.0884  0.0461
## 
## sigma^2 = 0.3219:  log likelihood = -156.95
## AIC=325.91   AICc=326.37   BIC=345.29

소비변화는 시계열적 패턴을 가지고 있으며, 이를 ARIMA(1,0,2) 모델이 보정을 하는 방식이다.
소득이 1% 증가하면 소비가 약 0.2028% 증가한다.


Forecasts from dynamic rregression model

설명변수(predictor)의 미래 값을 제공해야 한다.

  1. 설명변수도 별도의 모델을 사용하여 예측한 후 사용
  2. 시나리오 예측(Scenario forecasting)
  • e.g. 향후 8분기 동안 Income이 0.8% 증가한다고 가정
fcast <- forecast(fit, xreg = rep(0.8, 8))
autoplot(fcast) + 
  xlab("Year") +
  ylab("Percentage change")

5.1.2 Forecasting sales allowing for advertising expenditure

  • 매월 10 units 의 광고비 시나리오 예상
# Time plot of both variables
autoplot(advert, facets = TRUE)

# Fit ARIMA model
fit <- auto.arima(advert[, "sales"], xreg = advert[, "advert"], stationary = TRUE)

# Check model. Increase in sales for each unit increase in advertising
salesincrease <- coefficients(fit)[3]

# Forecast fit as fc
fc <- forecast(fit, xreg = rep(10, 6))

# Plot fc with x and y labels
autoplot(fc) + xlab("Month") + ylab("Sales")

5.1.3 Forecasting electricity demand

# Time plots of demand and temperatures
autoplot(elecdaily[, c("Demand", "Temperature")], facets = TRUE)

# Matrix of regressors
xreg <- cbind(MaxTemp = elecdaily[, "Temperature"],
              MaxTempSq = elecdaily[, "Temperature"]^2,
              Workday = elecdaily[, "WorkDay"])
# Fit model
fit <- auto.arima(elecdaily[, "Demand"], xreg = xreg)

# Forecast fit one day ahead
forecast(fit, xreg = cbind(20, 20^2, 1))
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 53.57143       185.4008 176.9271 193.8745 172.4414 198.3602

If the next day is a working day (indicator is 1) with maximum temperature forecast to be 20°C, what is the forecast demand?
185.4008 !!!


5.2 Dynamic harmonic regression [DHR]

계절성이 강한 시계열 데이터를 예측할 때 사용되는 방법으로 일반적인 ARIMA 모델은 계절성을 차분으로 처리하지만, DHR은 Fourier 변환을 이용해 계절 패턴을 모형에 직접 포함시킨다.

5.2.1 Why DHR?

  • 계절 주기가 매우 큰 경우
  • 비정상적인 계절 변화
  • 차분 시 데이터 손실이 발생하는 경우

5.2.2 Formula

Periodic seasonality can be handeled using pairs of Fourier terms:

\[ s_k(t) = \sin \left(\frac{2\pi k t}{m}\right) ~~~~~~~~ c_k(t) = \cos \left(\frac{2\pi k t}{m}\right) \\ y_t = \beta_0 + \sum_{k = 1}^{k} [\alpha_ks_k(t) + \gamma_kc_k(t)] +e_t \]

  • \(y_t\): 시계열 데이터
  • \(m\): seasonal period(계절 주기) e.g. 12, 365, …
  • \(k\): 푸리에항 개수(클수록 더 복잡한 계절 패턴 가능)
    • Every periodic function can be approximated by sums of sin and cos terms for large enough \(k\)
  • \(\alpha_k, \gamma_k\): regression coefficients
  • \(e_t\): error terms
    • can be modeled as a non-seasonal ARIMA process
  • Assumes seasonal pattern is unchanging

\[ y_t = \beta_0 + \beta_1x_{t,1} + ... + \beta_{t,r}x_{t,r} + \sum_{k = 1}^{k} [\alpha_ks_k(t) + \gamma_kc_k(t)] +e_t \]

  • Other predictor variables can be added as well:\(x_{t,1},...,x_{t,r}\)
  • Choose K to minimize the \(AIC_c\)
  • K can not be more than m/2
  • This is particularly useful for weekly data, daily data, and sub-daily data.
    • 즉 계절주기가 클 경우 적합함

  • fourier(x, K, h = NULL): easy to generate the required harmonics

5.2.3 Exercise

Forecasting weekly data

With weekly data, it is difficult to handle seasonality using ETS or ARIMA models as the seasonal length is too large (approximately 52). Instead, you can use harmonic regression which uses sines and cosines to model the seasonality.

# Set up harmonic regressors of order 13
harmonics <- fourier(gasoline, K = 13)

# Fit regression model with ARIMA errors
fit <- auto.arima(gasoline, xreg = harmonics, seasonal = FALSE)

# Forecasts next 3 years
newharmonics <- fourier(gasoline, K = 13, h = 156)
fc <- forecast(fit, xreg = newharmonics)

# Plot forecasts fc
autoplot(fc)

Harmonic regression for multiple seasonality

  • taylor dataset
    • contains half-hourly electricity demand in England and Wales over a few months in the year 2000.
    • The seasonal periods are 48 (daily seasonality) and 7 x 48 = 336 (weekly seasonality).
    • auto.arima는 이처럼 긴 시계열을 가진 데이터의 경우 계산 시간이 너무 길어짐
    • Instead you will fit a standard regression model with Fourier terms using the tslm() function.
      • Need to specify the order \(K\) for each of the seasonal periods.
# Fit a harmonic regression using order 10 for each type of seasonality
fit <- tslm(taylor ~ fourier(taylor, K = c(10, 10)))

# Forecast 20 working days ahead
fc <- forecast(fit, newdata = data.frame(fourier(taylor, K = c(10, 10), h = 20 * 48)))

# Plot the forecasts
autoplot(fc)

# Check the residuals of fit
checkresiduals(fit)

## 
##  Breusch-Godfrey test for serial correlation of order up to 672
## 
## data:  Residuals from Linear regression model
## LM test = 3938.9, df = 672, p-value < 2.2e-16

Forecasting call bookings

  • calls dataset
    • 20일간의 연속 데이터
    • 하루에 169개의 5분 간격 데이터가 있음
    • 주간으로는 5 x 169 = 845 개의 데이터가 있음
    • 주간 계절성과 일일 계절성이 있지만 주간 계절성은 비교적 약하기에 일일 계절성만 모델링할 예정
    • stationary = TRUE를 사용해서 차분을 수행하지 않는 비계절성 ARMA 모델을 사용(계절성은 푸리에로 처리하기에)
subset_calls <- window(calls, start = 30, end = 33)

# Plot the calls data
autoplot(subset_calls)

# Set up the xreg matrix
xreg <- fourier(subset_calls, K = c(10, 0))

# Fit a dynamic regression model
fit <- auto.arima(subset_calls, xreg = xreg, seasonal = FALSE, stationary = TRUE)

# Check the residuals
checkresiduals(fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from Regression with ARIMA(1,0,1) errors
## Q* = 602.6, df = 505, p-value = 0.001787
## 
## Model df: 2.   Total lags used: 507
# Plot forecasts for 10 working days ahead
fc <- forecast(fit, xreg =  fourier(subset_calls, c(10, 0), h = 1690))
autoplot(fc)

5.3 TBATS models

지금까지 배운 모든 것들이 합쳐진 자동화툴

  • tbats() function
  • Trigonometric terms for seasonality
  • Box-Cox transformations for heterogeneity
  • ARMA errors for short-term dynamics
  • Trend(possibly damped)
  • Seasonal(including multiple and non-integer periods)

But somewhat dangerous as sometimes the automatic choices are not so good!!!

Example

US Gasoline data

gasoline %>% tbats() %>% forecast() %>% autoplot() +
  xlab("Year") + ylab("thousand barrels per day")

TBATS(1, {0,0}, -, {<52.18, 12>})

  • 1: Box-Cox transformation parameter, \(\lambda=1\)로 변환이 필요하지 않음
  • {0,0}: ARMA errors로 p와 q를 의미하며 단순히 White noise가 사용됨
    • : demping parameter, no damping 의미
  • {<52.18, 12>}: Fourier terms로 계절주기 52.18과 order K는 12가 선택됨


  • Handles non-integer seasonality, multiple seasonal periods
  • Entirely automated
  • Prediction intervals often too wide
  • Very slow on long series


TBATS models for electricity demand

  • gas dataset
    • 점점 분산이 커지기에 transformation이 필요해보임
    • 계절성 점점 변하고 강력한 추세가 존재함
# Plot the gas data
autoplot(gas)

# Fit a TBATS model to the gas data
fit <- tbats(gas)

# Forecast the series for the next 5 years
fc <- forecast(fit, h = 60)

# Plot the forecasts
autoplot(fc)

# Record the Box-Cox parameter and the order of the Fourier terms
lambda <- 0.082
K <- 5