This report is a summary of lesson by Rob J.Hyndman, Data Camp

library(tidyverse)
library(astsa)
library(xts)
library(readxl)
library(ggfortify)
library(fpp2)
library(gt)

theme_set(theme_bw())
mydata <- read_excel("exercise1.xlsx")

1 Exploring and visualizing time series

1.0.1 Creating time series objects in R

Time series plots

autoplot(facets = ...)

head(mydata)

## # A tibble: 6 × 4
##   ...1   Sales AdBudget   GDP
##   <chr>  <dbl>    <dbl> <dbl>
## 1 Mar-81 1020.     659.  252.
## 2 Jun-81  889.     589   291.
## 3 Sep-81  795      512.  291.
## 4 Dec-81 1004.     614.  292.
## 5 Mar-82 1058.     647.  279.
## 6 Jun-82  944.     602   254

# Create a ts object called myts
myts <- ts(mydata[, 2:4], start = c(1981, 1), frequency = 4)

autoplot(myts, facets = F)

Seasonal plots

ggseasonplot()
ggsubseriesplot()

autoplot(AirPassengers)

ggseasonplot(AirPassengers)

ggseasonplot(AirPassengers, polar = TRUE)

ggsubseriesplot(AirPassengers)

1.1 Trends, seasonality, and cyclicity

1.1.1 Time series patterns

Trend: A pattern exists involving a long-term increase or decrease in the data
Seasonal: A periodic pattern exists due to the calendar(e.g., the quarter, month, or day of week)
Cyclic: A pattern exists where the data exhibits rises and falls that are not of fixed period(duration usually of at least 2 years)

1.1.2 Seasonal or cyclic?

Differences between seasonal and cyclic patterns:

Seasonal pattern constant length vs. cyclic pattern variable length
Average length of cycle longer than length of seasonal pattern
Magnitude of cycle more variable than magnitude of seasonal pattern

The timing of peaks and troughs is predictable with seasonal data, but unpredictable in the long term with cyclic data.

1.1.3 Autocorrelation of non-seasonal time series

forcast::gglagplot(): 개별 lag 간의 관계를 시각적으로 분석
forcast::ggAcf(): ACF 그래프 출력

fpp2::oil

## Time Series:
## Start = 1965 
## End = 2013 
## Frequency = 1 
##  [1] 111.0091 130.8284 141.2871 154.2278 162.7409 192.1665 240.7997 304.2174
##  [9] 384.0046 429.6622 359.3169 437.2519 468.4008 424.4353 487.9794 509.8284
## [17] 506.3473 340.1842 240.2589 219.0328 172.0747 252.5901 221.0711 276.5188
## [25] 271.1480 342.6186 428.3558 442.3946 432.7851 437.2497 437.2092 445.3641
## [33] 453.1950 454.4096 422.3789 456.0371 440.3866 425.1944 486.2052 500.4291
## [41] 521.2759 508.9476 488.8889 509.8706 456.7229 473.8166 525.9509 549.8338
## [49] 542.3405

# Create an autoplot of the oil data
autoplot(oil)

# Create a lag plot of the oil data
gglagplot(oil)

# Create an ACF plot of the oil data
ggAcf(oil)

ggAcf(AirPassengers)

1.2 White noise

Just randomness. “i.i.d data”로 trend, seasonality, cycle 모두 존재하지 않음

1.2.1 WN ACF

모든 lag에서 유의미한 autocorr이 관측되지 않음

set.seed(3)
wn <- ts(rnorm(36))
autoplot(wn)

ggAcf(wn) +
  ggtitle("Sample ACF for white noise")

1.2.2 Ljung-Box text

The Ljung-Box test considers the first h autocorrelation values together

A significant test(samll p-value) indicates the data are probably not white noise

Box.test(data, lag = ..., fitdf = ..., type = ...)

Case 1. `pigs`

ggAcf(pigs)

Box.test(pigs, lag = 24, fitdf = 0, type = "Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  pigs
## X-squared = 634.15, df = 24, p-value < 2.2e-16

The p-value is very small, suggesting that this is not a white noise

Case 2. `goog`

autoplot(goog)

autoplot(diff(goog))

ggAcf(diff(goog))

Box.test(diff(goog), lag = 10, fitdf = 0, type = "Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  diff(goog)
## X-squared = 13.123, df = 10, p-value = 0.2169

The p-value is not significant, suggesting that this is a white noise

2 Benchmark methods and forecast accuracy

2.1 Forecasts and potential futures

forecast is the mean or median of simulated futures of a time series

2.1.1 Forecast intervals

80% forecast interval should contain 80% of future observations
95% forecast interval should contain 95% of future observations

2.1.2 Naive forecasting methods

Simple methods

naive method: use only the most recent observation as the forecast for all future periods
mean method: use the average of all observations as the forecast for all future periods

Functions related

naive(h = ...)
- h: the number of values you want to forecast
snaive()

# Use naive() to forecast the goog series
fcgoog <- naive(goog, h = 20)

# Plot and summarize the forecasts
summary(fcgoog)

## 
## Forecast method: Naive method
## 
## Model Information:
## Call: naive(y = goog, h = 20) 
## 
## Residual sd: 8.7343 
## 
## Error measures:
##                     ME     RMSE      MAE        MPE      MAPE MASE       ACF1
## Training set 0.4212612 8.734286 5.829407 0.06253998 0.9741428    1 0.03871446
## 
## Forecasts:
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1001         813.67 802.4765 824.8634 796.5511 830.7889
## 1002         813.67 797.8401 829.4999 789.4602 837.8797
## 1003         813.67 794.2824 833.0576 784.0192 843.3208
## 1004         813.67 791.2831 836.0569 779.4322 847.9078
## 1005         813.67 788.6407 838.6993 775.3910 851.9490
## 1006         813.67 786.2518 841.0882 771.7374 855.6025
## 1007         813.67 784.0549 843.2850 768.3777 858.9623
## 1008         813.67 782.0102 845.3298 765.2505 862.0895
## 1009         813.67 780.0897 847.2503 762.3133 865.0266
## 1010         813.67 778.2732 849.0667 759.5353 867.8047
## 1011         813.67 776.5456 850.7944 756.8931 870.4469
## 1012         813.67 774.8948 852.4452 754.3684 872.9715
## 1013         813.67 773.3115 854.0285 751.9470 875.3930
## 1014         813.67 771.7880 855.5520 749.6170 877.7230
## 1015         813.67 770.3180 857.0220 747.3688 879.9711
## 1016         813.67 768.8962 858.4437 745.1944 882.1455
## 1017         813.67 767.5183 859.8217 743.0870 884.2530
## 1018         813.67 766.1802 861.1597 741.0407 886.2993
## 1019         813.67 764.8789 862.4610 739.0505 888.2895
## 1020         813.67 763.6114 863.7286 737.1120 890.2280

autoplot(fcgoog)

# Use snaive() to forecast the ausbeer series
fcbeer <- snaive(ausbeer, h = 16)

# Plot and summarize the forecasts
summary(fcbeer)

## 
## Forecast method: Seasonal naive method
## 
## Model Information:
## Call: snaive(y = ausbeer, h = 16) 
## 
## Residual sd: 19.3259 
## 
## Error measures:
##                    ME     RMSE      MAE      MPE    MAPE MASE       ACF1
## Training set 3.098131 19.32591 15.50935 0.838741 3.69567    1 0.01093868
## 
## Forecasts:
##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2010 Q3            419 394.2329 443.7671 381.1219 456.8781
## 2010 Q4            488 463.2329 512.7671 450.1219 525.8781
## 2011 Q1            414 389.2329 438.7671 376.1219 451.8781
## 2011 Q2            374 349.2329 398.7671 336.1219 411.8781
## 2011 Q3            419 383.9740 454.0260 365.4323 472.5677
## 2011 Q4            488 452.9740 523.0260 434.4323 541.5677
## 2012 Q1            414 378.9740 449.0260 360.4323 467.5677
## 2012 Q2            374 338.9740 409.0260 320.4323 427.5677
## 2012 Q3            419 376.1020 461.8980 353.3932 484.6068
## 2012 Q4            488 445.1020 530.8980 422.3932 553.6068
## 2013 Q1            414 371.1020 456.8980 348.3932 479.6068
## 2013 Q2            374 331.1020 416.8980 308.3932 439.6068
## 2013 Q3            419 369.4657 468.5343 343.2438 494.7562
## 2013 Q4            488 438.4657 537.5343 412.2438 563.7562
## 2014 Q1            414 364.4657 463.5343 338.2438 489.7562
## 2014 Q2            374 324.4657 423.5343 298.2438 449.7562

autoplot(fcbeer)

2.2 Fitted values and residuals

A fitted value is the forecast of an observation using all previous observations

That is, they are one-step forecasts
- 각 시점에서 이전 데이터를 이용해 예측한 값으로 이는 실제 미래 예측이 아니라 모델의 성능을 평가하는 용도로 사용됨
Often not true forecasts since parameters are estimated on all data
- fitted values는 실제 예측값이 아닌데 그 이유는 우리가 모르는 데이터가 있기 때문에 단지 추정일 뿐임

A residual is the difference between an observation and its fitted value

That is, they are one-step forecast errors

■ Residuals should look like white noise

Essential assumptions:

They should be uncorrelated
They should have mean zero

Useful properties(for computing prediction intervals)

They should have constant variance
They should be normally distributed

To test these assumptions using the checkresiduals() function

# Check the residuals from the naive forecasts applied to the goog series
goog %>% naive() %>% checkresiduals()

## 
##  Ljung-Box test
## 
## data:  Residuals from Naive method
## Q* = 13.123, df = 10, p-value = 0.2169
## 
## Model df: 0.   Total lags used: 10

# Check the residuals from the seasonal naive forecasts applied to the ausbeer series
ausbeer %>% snaive() %>% checkresiduals()

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 60.535, df = 8, p-value = 3.661e-10
## 
## Model df: 0.   Total lags used: 8

goog는 WN, ausbeer은 WN이 아니다.

2.3 Training and test sets

2.3.1 Forcasts errors

Forcasts “error” = the difference between observed value and its forecast in the test set.

\(\neq resiuals\)

which are errors on the training set (vs. test set)
which are based on one-step forecasts (vs. multi-step)

2.3.2 Measures of forecast accuracy

MAE, MSE
- depends on the scale of the data
- can’t compare forecast accuracy between two series on very different scales
MAPE
- better for comparison, but need some assumptions
  - data are all positive and have no zeros or small values
  - there is a natural zero(can’t be used with temperature forecasts)
MASE
- scaled MSE

In all cases, a small value indicates a better forcast.

accuracy(forecast_object, actual_data): compute all of these measures

Evaluating forecast accuracy of non-seasonal methods: `gold`

# Create the training data as train
train <- subset(gold, end = 1000)

# Compute naive forecasts and save to naive_fc
naive_fc <- naive(train, h = 108)

# Compute mean forecasts and save to mean_fc
mean_fc <- meanf(train, h = 108)

# Use accuracy() to compute RMSE statistics
accuracy(naive_fc, gold)

##                      ME      RMSE      MAE        MPE      MAPE     MASE
## Training set  0.1079897  6.358087  3.20366  0.0201449 0.8050646 1.014334
## Test set     -6.5383495 15.842361 13.63835 -1.7462269 3.4287888 4.318139
##                    ACF1 Theil's U
## Training set -0.3086638        NA
## Test set      0.9793153  5.335899

accuracy(mean_fc, gold)

##                         ME     RMSE      MAE       MPE      MAPE      MASE
## Training set -4.239671e-15 59.17809 53.63397 -2.390227 14.230224 16.981449
## Test set      1.319363e+01 19.55255 15.66875  3.138577  3.783133  4.960998
##                   ACF1 Theil's U
## Training set 0.9907254        NA
## Test set     0.9793153  6.123788

# Assign one of the two forecasts as bestforecasts
bestforecasts <- naive_fc

Evaluating forecast accuracy of seasonal methods: `visnights`

# Create three training series omitting the last 1, 2, and 3 years
train1 <- window(visnights[, "VICMetro"], end = c(2015, 4))
train2 <- window(visnights[, "VICMetro"], end = c(2014, 4))
train3 <- window(visnights[, "VICMetro"], end = c(2013, 4))

# Produce forecasts using snaive()
fc1 <- snaive(train1, h = 4)
fc2 <- snaive(train2, h = 4)
fc3 <- snaive(train3, h = 4)

# Use accuracy() to compare the MAPE of each series
accuracy(fc1, visnights[, "VICMetro"])["Test set", "MAPE"]

## [1] 4.691658

accuracy(fc2, visnights[, "VICMetro"])["Test set", "MAPE"]

## [1] 2.503255

accuracy(fc3, visnights[, "VICMetro"])["Test set", "MAPE"]

## [1] 11.92176

2.4 Time series cross-validation

[1 Step]

[2 Steps]

[3 Steps]

tsCV(): MSE using time series cross-validation, forecast errors 반환

sq <- function(u) {u^2}

tsCV(oil, forecastfunction = naive, h = 10) %>% 
  sq() %>% 
  colMeans(na.rm = TRUE)

##       h=1       h=2       h=3       h=4       h=5       h=6       h=7       h=8 
##  2355.753  5734.838  9842.239 14299.997 18560.887 23264.410 26932.799 30766.136 
##       h=9      h=10 
## 32892.200 32986.214

The MSE increases with the forecast horizon

3. Exponential smoothing

A forecast based on all observations, but where the most recent observations are more heavily weighted

3.1 Exponentially weighted forecasts

3.1.1 Simple exponential smoothing [SES]

Forecasting Notation:

\(\hat{y}_{t+h|t}\) = point forest of \(\hat{y}_{t+h}\) given data \(y_1, ..., t_t\)

Forecast Equation:

\(\hat{y}_{t+h|t} = ay_t + a(1-a)y_{t-1} + a(1-a)^2y_{t-2} = ...\)

\(where 0 \leq a \leq 1\)

시계열이 길어지더라도 모든 관측치를 반영하지만 가중치가 감소해서 최근치를 더 많이 반영하는 방식
\(a\)가 커질수록 가장 최근 관측치 가중치도 커지고 더 빠르게 가중치가 감소

\(l_t\) is the level (or the smoothed value) of the series at time \(t\)
We choose \(a\) and \(l_0\) by minimizing SSE:

\(SSE = \sum_{t=1}^{T} (y_t - \hat{y}_{t|t-1})\)

ses() function

oildata <- window(oil, start = 1996)
fc <- ses(oildata, h = 5)
summary(fc)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
## ses(y = oildata, h = 5)
## 
##   Smoothing parameters:
##     alpha = 0.8339 
## 
##   Initial states:
##     l = 446.5868 
## 
##   sigma:  29.8282
## 
##      AIC     AICc      BIC 
## 178.1430 179.8573 180.8141 
## 
## Error measures:
##                    ME     RMSE     MAE      MPE     MAPE      MASE        ACF1
## Training set 6.401975 28.12234 22.2587 1.097574 4.610635 0.9256774 -0.03377748
## 
## Forecasts:
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2014       542.6806 504.4541 580.9070 484.2183 601.1429
## 2015       542.6806 492.9073 592.4539 466.5589 618.8023
## 2016       542.6806 483.5747 601.7864 452.2860 633.0752
## 2017       542.6806 475.5269 609.8343 439.9778 645.3834
## 2018       542.6806 468.3452 617.0159 428.9945 656.3667

autoplot(fc) +
  ylab("Oil (million of tonnes)") +
  xlab("Year") +
  autolayer(fitted(fc))

\(a = 0.8339\)로 예상치의 83%가 최근 관측치에 기반해 계산되었으며 \(initial~l_0=446.5868\)
\(a\)가 높기 때문에 예측치도 기존 관측치와 비슷한 수준을 보이고 있음

3.1.2 SES vs. naive

# Create a training set using subset()
train <- subset(marathon, end = length(marathon) - 20)

# Compute SES and naive forecasts, save to fcses and fcnaive
fcses <- ses(train, h = 20)
fcnaive <- naive(train, h = 20)

# Calculate forecast accuracy measures
accuracy(fcses, marathon)

##                      ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -1.0851741 5.863790 4.155948 -0.8603998 2.827993 0.8990906
## Test set      0.4574579 2.493971 1.894237  0.3171919 1.463862 0.4097960
##                     ACF1 Theil's U
## Training set -0.01595953        NA
## Test set     -0.12556096 0.6870735

accuracy(fcnaive, marathon)

##                      ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -0.4638047 6.904742 4.622391 -0.4086317 3.123559 1.0000000
## Test set      0.2266667 2.462113 1.846667  0.1388780 1.429608 0.3995047
##                    ACF1 Theil's U
## Training set -0.3589323        NA
## Test set     -0.1255610 0.6799062

# Save the best forecasts as fcbest
fcbest <- fcnaive

RMSE는 fcses, fcnaive 각각 2.49, 2.46으로 fcnaive가 better model!!!

3.2 Exponential smoothing with trend

3.2.1 Holt’s linear trend

The \(\beta^*\) parameter controls how quickly the slope can change
Two smoothing parameters \(\alpha\) and \(\beta^*\) where \(0~\le~\alpha\) and \(\beta^*~\le~1\)
A small \(\beta^*\) value means that the slope hardly changes
- so the data will have a trend that is close to linear throughts the series
A large \(\beta^*\) value means the slope changes rapidly, allowing for highly nonlinear trend
Four parameter to estimate for minimizing SSE
1. \(\alpha\)
2. \(\beta^*\)
3. \(l_0\)
4. \(b_0\)

holt() function

3.2.2 Damped trend method

A variation of Holt’s method
Damping parameter \(0~<~\phi~<1\)
If \(\phi=1\), identical to Holt’s linear trend
To allow the trend to dampen over time
Under this method, the short-run forecasts are trended, but the long-run forecasts are constant

holt(damped = TRUE) function

Holt’s trend methods

# Produce 10 year forecasts of austa using holt()
fcholt <- holt(austa, h = 10)

# Look at fitted model using summary()
summary(fcholt)

## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
## holt(y = austa, h = 10)
## 
##   Smoothing parameters:
##     alpha = 0.9999 
##     beta  = 0.0085 
## 
##   Initial states:
##     l = 0.656 
##     b = 0.1706 
## 
##   sigma:  0.1952
## 
##      AIC     AICc      BIC 
## 17.14959 19.14959 25.06719 
## 
## Error measures:
##                      ME      RMSE       MAE       MPE     MAPE      MASE
## Training set 0.00372838 0.1840662 0.1611085 -1.222083 5.990319 0.7907078
##                   ACF1
## Training set 0.2457733
## 
## Forecasts:
##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2016       7.030683 6.780483 7.280882 6.648036 7.413330
## 2017       7.202446 6.847114 7.557778 6.659013 7.745879
## 2018       7.374209 6.937169 7.811249 6.705814 8.042604
## 2019       7.545972 7.039179 8.052765 6.770899 8.321045
## 2020       7.717736 7.148723 8.286748 6.847506 8.587965
## 2021       7.889499 7.263543 8.515455 6.932181 8.846816
## 2022       8.061262 7.382302 8.740222 7.022882 9.099642
## 2023       8.233025 7.504136 8.961915 7.118285 9.347766
## 2024       8.404788 7.628444 9.181133 7.217472 9.592105
## 2025       8.576552 7.754792 9.398311 7.319779 9.833324

# Plot the forecasts
autoplot(fcholt)

# Check that the residuals look like white noise
checkresiduals(fcholt)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt's method
## Q* = 4.8886, df = 7, p-value = 0.6736
## 
## Model df: 0.   Total lags used: 7

3.3 Exponential smoothing with trend and seasonality

Holt-Winters’ method has two version.

3.3.1 Holt-Winter’s additive method

\(S_{t-m+h_m^+}\) = seasonal component from final year of data
Smoothing parameters:

\(0 \le \alpha \le 1,~0 \le \beta^* \le 1,~0 \le \gamma \le 1-\alpha\)

\(m\) = period of seasonality(e.g. \(m=4\) for quarterly data)
Seasonal component averages zero

3.3.2 Holt-Winter’s multiplicative method

seasonal variation이 점점 더 커지는 경우

Seasonal component averages one

hw(seasonal = "additive" or "multiplicative") function

3.3.3 Taxonomy of exponential smoothing methods

Trend Component	N (None)	A (Additive)	M (Multiplicative)
N (None)	(N, N)	(N, A)	(N, M)
N (Additive)	(A, N)	(A, A)	(A, M)
A_d (Additive damped)	(A_d, N)	(A_d, A)	(A_d, M)

Model (Trend, Seasonality)	Description	Function
(N, N)	Simple exponential smoothing	ses()
(A, N)	Holt’s linear method	holt()
(A_d, N)	Additive damped trend method	hw()
(A, A)	Additive Holt-Winters’ method	hw()
(A, M)	Multiplicative Holt-Winters’ method	hw()
(A_d, M)	Damped multiplicative Holt-Winters’ method	hw()

Holt-Winters with monthly data

# Produce 3 year forecasts
fc <- hw(a10, seasonal = "multiplicative", h = 36)

# Check if residuals look like white noise
checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' multiplicative method
## Q* = 75.764, df = 24, p-value = 2.835e-07
## 
## Model df: 0.   Total lags used: 24

whitenoise <- FALSE

# Plot forecasts
autoplot(fc)

Exercise: Holt-Winters method with daily data

# Create training data with subset()
train <- subset(hyndsight, end = 337)

# Holt-Winters additive forecasts as fchw
fchw <- hw(train, seasonal = "additive", h = 28)

# Seasonal naive forecasts as fcsn
fcsn <- snaive(train, h = 28)

# Find better forecasts with accuracy()
accuracy(fchw, hyndsight)

##                     ME     RMSE      MAE       MPE    MAPE      MASE      ACF1
## Training set -3.976241 228.2440 165.0244 -2.407211 13.9955 0.7492131 0.1900853
## Test set     -3.999460 201.7656 152.9584 -3.218292 10.5558 0.6944332 0.3013328
##              Theil's U
## Training set        NA
## Test set     0.4868701

accuracy(fcsn, hyndsight)

##                 ME     RMSE      MAE        MPE     MAPE      MASE      ACF1
## Training set 10.50 310.3282 220.2636 -2.1239387 18.01077 1.0000000 0.4255730
## Test set      0.25 202.7610 160.4643 -0.6888732 10.25880 0.7285101 0.3089795
##              Theil's U
## Training set        NA
## Test set      0.450266

# Plot the better forecasts
autoplot(fchw)

3.4 State space models for exponential smoothing

3.4.1 Innovations state-space models

Each exponential smoothing method can be written as an “innovations state-space model”
State-space model은 관측된 데이터 뒤에 숨겨진 상태(state)를 수학적으로 모델링하는 기법을 의미

ETS models: Error, Trend, Seasonal

3.4.2 ETS models

Parameters: estimated using the “likelihood”, the probability of the data arising from the specified model
For models with additive errors, this is equivalent to minimizing SSE
- ETS model의 parameters는 데이터가 해당 모델에서 나올 확률(likelihood)를 최대로 만드는 방식으로 찾으며 특히 error가 additive하다면 이는 SSE를 최소화하는 것과 같음
Choose the best model by minimizing a corrected version of Akaike’s Information Criterion (\(AIC_c\))

ets() function
- ts를 제공하면 \(AIC_c\)를 최소화하는 최상의 모델을 자동으로 반환

Example: Australian air traffic

autoplot(ausair)

ets(ausair)

## ETS(M,A,N) 
## 
## Call:
## ets(y = ausair)
## 
##   Smoothing parameters:
##     alpha = 0.9999 
##     beta  = 0.0269 
## 
##   Initial states:
##     l = 6.5431 
##     b = 0.7393 
## 
##   sigma:  0.0755
## 
##      AIC     AICc      BIC 
## 241.6910 243.1544 250.9417

ETS(M,A,N)으로 multiplicative errors, additive trend and no seasonality

Point

매개변수가 holt()를 사용했을 때와 동일하게 추정되지만, “SSE”를 최소화하는 대신 “Likelihood”를 최대화하는 방식임
또한, holt()와 다르게 ets()는 예측값을 계산해주지 않고 단지 model을 반환하기에 개별적으로 예측함수에 model을 적용시켜야함

ausair %>% ets() %>% forecast() %>% autoplot()

3.4.3 When does ETS fail? : `lynx`

Computing the ETS does not work well for all series.

# Plot the lynx series
autoplot(lynx)

# Use ets() to model the lynx series
fit <- ets(lynx)

# Use summary() to look at model and parameters
summary(fit)

## ETS(M,N,N) 
## 
## Call:
## ets(y = lynx)
## 
##   Smoothing parameters:
##     alpha = 0.9999 
## 
##   Initial states:
##     l = 2372.8047 
## 
##   sigma:  0.9594
## 
##      AIC     AICc      BIC 
## 2058.138 2058.356 2066.346 
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE     MAPE     MASE      ACF1
## Training set 8.975647 1198.452 842.0649 -52.12968 101.3686 1.013488 0.3677583

# Plot 20-year forecasts of the lynx series
fit %>% forecast(h = 20) %>% autoplot()

모든 ts에서 ETS가 제대로 작동하는 것은 아니다!!!

4. Forcasting with ARIMA models

ARIMA models provide another approach to time series forecasting. Exponential smoothing and ARIMA models are the two most widely-used approaches to time series forecasting, and provide complementary approaches to the problem. While exponential smoothing models are based on a description of the trend and seasonality in the data, ARIMA models aim to describe the autocorrelations in the data.

4.1 Transformations for variance stabilization

ETS 모델에서는 ts의 분산이 시간에 따라 확장되는 경우, multiplicative error와 multiplicative seasonality를 사용했다. 이에 대한 다른 방법은 ts 자체를 transformation하는 것이다.

4.1.1 Variance stabilization

If the data show increasing variation as the level of the series increase, then a transformation can be useful
\(y_1, ..., y_n\): original observations, \(w1,...,wn\): transformed observations

4.1.2 Box-Cox transformations

Each of these transformations is close to a member of the family of Box-Cox transformations
데이터의 정규성(Normality)과 분산 안정성(Variance Stability)을 개선하기 위해 사용되는 변환 기법

\[ w_t = \begin{cases} \log(y_t), & \lambda = 0 \\ \frac{y_t^\lambda - 1}{\lambda}, & \lambda \neq 0 \end{cases} \] * \(\lambda = 1\): no transformation * \(\lambda = 0\) * Just \(\log(y_t)\) * \(\lambda \neq 1\) * \(\frac{y_t^\lambda - 1}{\lambda}\) * \(\lambda\) 값에 따라 변환 정도가 달라진다. * \(\lambda = 1/2\): Square root plus linear transformation * \(\lambda = 1/3\): Cube root plus linear transformation * \(\lambda = -1\): Inverse transformation

BoxCox.lambda() function: returns an estimate of lambda that should roughly balance the variation across the series

BoxCox.lambda(a10)

## [1] 0.1313326

Non-seasonal differencing for stationarity

# Plot the US female murder rate
autoplot(wmurders)

# Plot the differenced murder rate
autoplot(diff(wmurders))

# Plot the ACF of the differenced murder rate
ggAcf(diff(wmurders))

Seasonal differencing for stationarity

# Plot the data
autoplot(h02)

# Take logs and seasonal differences of h02
difflogh02 <- diff(log(h02), lag = 12)

# Plot difflogh02
autoplot(difflogh02)

# Take another difference and plot
ddifflogh02 <- diff(difflogh02)
autoplot(ddifflogh02)

# Plot ACF of ddifflogh02
ggAcf(ddifflogh02)

변환 이후에도 여전히 데이터가 WN처럼 보이지는 않는다. 하지만, ARIMA model을 적용할 수 있다.

4.2 ARIMA models

Autoregressive (AR) models:

Mulitple regression with lagged observations as predicors
\(y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + ... + \phi_p y_{t-p} + e_t\)

Moving average (MA) models:

Multiple regression with lagged errors as predictors
\(y_t = c + \theta_1 e_{t-1} + \theta_2 e_{t-2} + ... + \theta_q e_{t-q}\)

Autoregressive moving average (ARMA) models:

Multiple regression with lagged observations and errors as predictors
\(y_t = c + \phi_1 y_{t-1} + ... + \phi_p y_{t-p} + \theta_1 e_{t-1} + ... + \theta_q e_{t-q} + e_t\)

ARIMA(p, d, q) models:

Combine ARMA model with d - lots of differencing

auto.arima() function: 자동으로 ARIMA 계수를 반환
- ets()와 동일하게 \(AIC_c\)가 최소화시키는 모델을 반환하지만 \(AIC_c\)끼리는 비교가 불가능하다.
- 즉 same class model간에서만 비교해야한다!!!
Arima(order = c(p, d, q), include.constant = T/F) function: 사용자 정의 ARIMA 모델 반환
- order: set to a vector that specifies the values of \(p, d, q\)
- include.constant: a boolean that determines if the constant \(c\), or drift, should be included

Automatic ARIMA models for non-seasonal time series

# Fit an automatic ARIMA model to the austa series
fit <- auto.arima(austa)

# Check that the residuals look like white noise
checkresiduals(fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1) with drift
## Q* = 2.297, df = 6, p-value = 0.8905
## 
## Model df: 1.   Total lags used: 7

residualsok <- TRUE

# Summarize the model
summary(fit)

## Series: austa 
## ARIMA(0,1,1) with drift 
## 
## Coefficients:
##          ma1   drift
##       0.3006  0.1735
## s.e.  0.1647  0.0390
## 
## sigma^2 = 0.03376:  log likelihood = 10.62
## AIC=-15.24   AICc=-14.46   BIC=-10.57
## 
## Training set error measures:
##                        ME      RMSE       MAE       MPE     MAPE      MASE
## Training set 0.0008313383 0.1759116 0.1520309 -1.069983 5.513269 0.7461559
##                      ACF1
## Training set -0.000571993

# Find the AICc value and the number of differences used
AICc <- -14.46
d <- 1

# Plot forecasts of fit
fit %>% forecast(h = 10) %>% autoplot()

Comparing auto.arima() and ets() on non-seasonal data

# Set up forecast functions for ETS and ARIMA models
fets <- function(x, h) {
  forecast(ets(x), h = h)
}
farima <- function(x, h) {
  forecast(auto.arima(x), h = h)
}

# Compute CV errors for ETS on austa as e1
e1 <- tsCV(austa, fets, 1)

# Compute CV errors for ARIMA on austa as e2
e2 <- tsCV(austa, farima, 1)

# Find MSE of each model class
mean(e1^2, na.rm = TRUE)

## [1] 0.05623684

mean(e2^2, na.rm = TRUE)

## [1] 0.04336277

# Plot 10-year forecasts using the best model class
austa %>% farima(h = 10) %>% autoplot()

4.3 Seasonal ARIMA models

\(ARIMA(p, d, q)(P, D, Q)_m\)

4.3.1 Example: Monthly retail debit card usage in Iceland

fit <- auto.arima(debitcards, lambda = 0)
fit

## Series: debitcards 
## ARIMA(0,1,4)(0,1,1)[12] 
## Box Cox transformation: lambda= 0 
## 
## Coefficients:
##           ma1     ma2     ma3      ma4     sma1
##       -0.7959  0.0856  0.2628  -0.1751  -0.8144
## s.e.   0.0825  0.0988  0.0999   0.0798   0.1118
## 
## sigma^2 = 0.002321:  log likelihood = 239.33
## AIC=-466.67   AICc=-466.08   BIC=-448.56

fit %>% 
  forecast(h = 36) %>% 
  autoplot() +
    xlab("Year")

Automatic ARIMA models for seasonal time series

# Check that the logged h02 data have stable variance
h02 %>% log() %>% autoplot()

# Fit a seasonal ARIMA model to h02 with lambda = 0
fit <- auto.arima(h02, lambda = 0)

# Summarize the fitted model
summary(fit)

## Series: h02 
## ARIMA(2,1,1)(0,1,2)[12] 
## Box Cox transformation: lambda= 0 
## 
## Coefficients:
##           ar1      ar2     ma1     sma1     sma2
##       -1.1358  -0.5753  0.3683  -0.5318  -0.1817
## s.e.   0.1608   0.0965  0.1884   0.0838   0.0881
## 
## sigma^2 = 0.004278:  log likelihood = 248.25
## AIC=-484.51   AICc=-484.05   BIC=-465
## 
## Training set error measures:
##                        ME      RMSE        MAE        MPE     MAPE      MASE
## Training set -0.003931805 0.0501571 0.03629816 -0.5323365 4.611253 0.5987988
##                      ACF1
## Training set -0.003740267

# Record the amount of lag-1 differencing and seasonal differencing used
d <- 1
D <- 1

# Plot 2-year forecasts
fit %>% forecast(h = 24) %>% autoplot()

Exploring auto.arima() options

auto.arima(stepwise = TF)
- stepwise의 기본값은 TRUE로 빠르게 최적의 ARIMA 모델을 찾는다.
- 속도가 빠른 대신, 최적의 모델을 놓칠 가능성이 있다.
- FALSE인 경우 모든 가능한 모델을 평가하여 최적의 모델을 찾아 더 정확해진다. 하지만 계산비용이 증가한다.

# Find an ARIMA model for a10
fit1 <- auto.arima(a10)

# Don't use a stepwise search
fit2 <- auto.arima(a10, stepwise = FALSE)

# AICc of better model
AICc <- 523.55

# Compute 2-year forecasts from better model
fit2 %>% forecast(h=24) %>% autoplot()

Comparing auto.arima() and ets() on seasonal data

# Use 20 years of the qcement data beginning in 1988
train <- window(qcement, start = 1988, end = c(2007, 4))

# Fit an ARIMA and an ETS model to the training data
fit1 <- auto.arima(train)
fit2 <- ets(train)

# Check that both models have white noise residuals
checkresiduals(fit1)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,1)(2,1,1)[4] with drift
## Q* = 0.78311, df = 3, p-value = 0.8535
## 
## Model df: 5.   Total lags used: 8

checkresiduals(fit2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(M,N,M)
## Q* = 6.0952, df = 8, p-value = 0.6366
## 
## Model df: 0.   Total lags used: 8

# Produce forecasts for each model
fc1 <- forecast(fit1, h = 25)
fc2 <- forecast(fit2, h = 25)

# Use accuracy() to find better model based on RMSE
accuracy(fc1, qcement)

##                        ME      RMSE        MAE        MPE     MAPE      MASE
## Training set -0.006205705 0.1001195 0.07988903 -0.6704455 4.372443 0.5458078
## Test set     -0.158835253 0.1996098 0.16882205 -7.3332836 7.719241 1.1534049
##                     ACF1 Theil's U
## Training set -0.01133907        NA
## Test set      0.29170452 0.7282225

accuracy(fc2, qcement)

##                       ME      RMSE        MAE        MPE     MAPE      MASE
## Training set  0.01406512 0.1022079 0.07958478  0.4938163 4.371823 0.5437292
## Test set     -0.13495515 0.1838791 0.15395141 -6.2508975 6.986077 1.0518075
##                     ACF1 Theil's U
## Training set -0.03346295        NA
## Test set      0.53438371  0.680556

bettermodel <- fit2

RMSE가 각각 0.1996, 0.1838 이므로 ETS model이 better model!!!

5. Advanced methods

The time series models in the previous chapters work well for many time series, but they are often not good for weekly or hourly data, and they do not allow for the inclusion of other information such as the effects of holidays, competitor activity, changes in the law, etc. In this chapter, you will look at some methods that handle more complicated seasonality, and you consider how to extend ARIMA models in order to allow other information to be included in the them.

5.1 Dynamic regression

Dynamic regression is one way of combining external information with the history of the time series in a single model.

Regression model with ARIMA errors:

\(y_t = \beta_0 + \beta_1x_{1,t} + ... + \beta_rx_{r,t} + e_t\)

\(y_t\) modeled as function of \(r\) explanatory variables: \(x_{1,t},...,x_{r,t}\)
In dynamic regression, we allow \(e_t\) to be an ARIMA process
In ordinary regression, we assume that \(e_t\) is white noise

auto.arima(xreg = ...)
- xreg: contains a matrix of predictor variables that you want to include in the model

5.1.1 US personal consumption and income

일반적으로 소득이 감소하면 소비도 감소한다.

autoplot(uschange[,1:2], facets = TRUE) +
  xlab("Year") +
  ylab("") +
  ggtitle("Quarterly changes in US consumption and personal income")

as.data.frame(uschange) %>% 
  ggplot(aes(x = Income, y = Consumption)) +
  geom_point()

fit <- auto.arima(uschange[, "Consumption"],
                  xreg = uschange[, "Income"])
fit

## Series: uschange[, "Consumption"] 
## Regression with ARIMA(1,0,2) errors 
## 
## Coefficients:
##          ar1      ma1     ma2  intercept    xreg
##       0.6922  -0.5758  0.1984     0.5990  0.2028
## s.e.  0.1159   0.1301  0.0756     0.0884  0.0461
## 
## sigma^2 = 0.3219:  log likelihood = -156.95
## AIC=325.91   AICc=326.37   BIC=345.29

소비변화는 시계열적 패턴을 가지고 있으며, 이를 ARIMA(1,0,2) 모델이 보정을 하는 방식이다.
소득이 1% 증가하면 소비가 약 0.2028% 증가한다.

Forecasts from dynamic rregression model

설명변수(predictor)의 미래 값을 제공해야 한다.

설명변수도 별도의 모델을 사용하여 예측한 후 사용
시나리오 예측(Scenario forecasting)

e.g. 향후 8분기 동안 Income이 0.8% 증가한다고 가정

fcast <- forecast(fit, xreg = rep(0.8, 8))
autoplot(fcast) + 
  xlab("Year") +
  ylab("Percentage change")

5.1.2 Forecasting sales allowing for advertising expenditure

매월 10 units 의 광고비 시나리오 예상

# Time plot of both variables
autoplot(advert, facets = TRUE)

# Fit ARIMA model
fit <- auto.arima(advert[, "sales"], xreg = advert[, "advert"], stationary = TRUE)

# Check model. Increase in sales for each unit increase in advertising
salesincrease <- coefficients(fit)[3]

# Forecast fit as fc
fc <- forecast(fit, xreg = rep(10, 6))

# Plot fc with x and y labels
autoplot(fc) + xlab("Month") + ylab("Sales")

5.1.3 Forecasting electricity demand

# Time plots of demand and temperatures
autoplot(elecdaily[, c("Demand", "Temperature")], facets = TRUE)

# Matrix of regressors
xreg <- cbind(MaxTemp = elecdaily[, "Temperature"],
              MaxTempSq = elecdaily[, "Temperature"]^2,
              Workday = elecdaily[, "WorkDay"])
# Fit model
fit <- auto.arima(elecdaily[, "Demand"], xreg = xreg)

# Forecast fit one day ahead
forecast(fit, xreg = cbind(20, 20^2, 1))

##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 53.57143       185.4008 176.9271 193.8745 172.4414 198.3602

If the next day is a working day (indicator is 1) with maximum temperature forecast to be 20°C, what is the forecast demand?
185.4008 !!!

5.2 Dynamic harmonic regression [DHR]

계절성이 강한 시계열 데이터를 예측할 때 사용되는 방법으로 일반적인 ARIMA 모델은 계절성을 차분으로 처리하지만, DHR은 Fourier 변환을 이용해 계절 패턴을 모형에 직접 포함시킨다.

5.2.1 Why DHR?

계절 주기가 매우 큰 경우
비정상적인 계절 변화
차분 시 데이터 손실이 발생하는 경우

5.2.2 Formula

Periodic seasonality can be handeled using pairs of Fourier terms:

\[ s_k(t) = \sin \left(\frac{2\pi k t}{m}\right) ~~~~~~~~ c_k(t) = \cos \left(\frac{2\pi k t}{m}\right) \\ y_t = \beta_0 + \sum_{k = 1}^{k} [\alpha_ks_k(t) + \gamma_kc_k(t)] +e_t \]

\(y_t\): 시계열 데이터
\(m\): seasonal period(계절 주기) e.g. 12, 365, …
\(k\): 푸리에항 개수(클수록 더 복잡한 계절 패턴 가능)
- Every periodic function can be approximated by sums of sin and cos terms for large enough \(k\)
\(\alpha_k, \gamma_k\): regression coefficients
\(e_t\): error terms
- can be modeled as a non-seasonal ARIMA process
Assumes seasonal pattern is unchanging

\[ y_t = \beta_0 + \beta_1x_{t,1} + ... + \beta_{t,r}x_{t,r} + \sum_{k = 1}^{k} [\alpha_ks_k(t) + \gamma_kc_k(t)] +e_t \]

Other predictor variables can be added as well:\(x_{t,1},...,x_{t,r}\)
Choose K to minimize the \(AIC_c\)
K can not be more than m/2
This is particularly useful for weekly data, daily data, and sub-daily data.
- 즉 계절주기가 클 경우 적합함

fourier(x, K, h = NULL): easy to generate the required harmonics

5.2.3 Exercise

Forecasting weekly data

With weekly data, it is difficult to handle seasonality using ETS or ARIMA models as the seasonal length is too large (approximately 52). Instead, you can use harmonic regression which uses sines and cosines to model the seasonality.

# Set up harmonic regressors of order 13
harmonics <- fourier(gasoline, K = 13)

# Fit regression model with ARIMA errors
fit <- auto.arima(gasoline, xreg = harmonics, seasonal = FALSE)

# Forecasts next 3 years
newharmonics <- fourier(gasoline, K = 13, h = 156)
fc <- forecast(fit, xreg = newharmonics)

# Plot forecasts fc
autoplot(fc)

Harmonic regression for multiple seasonality

taylor dataset
- contains half-hourly electricity demand in England and Wales over a few months in the year 2000.
- The seasonal periods are 48 (daily seasonality) and 7 x 48 = 336 (weekly seasonality).
- auto.arima는 이처럼 긴 시계열을 가진 데이터의 경우 계산 시간이 너무 길어짐
- Instead you will fit a standard regression model with Fourier terms using the tslm() function.
  - Need to specify the order \(K\) for each of the seasonal periods.

# Fit a harmonic regression using order 10 for each type of seasonality
fit <- tslm(taylor ~ fourier(taylor, K = c(10, 10)))

# Forecast 20 working days ahead
fc <- forecast(fit, newdata = data.frame(fourier(taylor, K = c(10, 10), h = 20 * 48)))

# Plot the forecasts
autoplot(fc)

# Check the residuals of fit
checkresiduals(fit)

## 
##  Breusch-Godfrey test for serial correlation of order up to 672
## 
## data:  Residuals from Linear regression model
## LM test = 3938.9, df = 672, p-value < 2.2e-16

Forecasting call bookings

calls dataset
- 20일간의 연속 데이터
- 하루에 169개의 5분 간격 데이터가 있음
- 주간으로는 5 x 169 = 845 개의 데이터가 있음
- 주간 계절성과 일일 계절성이 있지만 주간 계절성은 비교적 약하기에 일일 계절성만 모델링할 예정
- stationary = TRUE를 사용해서 차분을 수행하지 않는 비계절성 ARMA 모델을 사용(계절성은 푸리에로 처리하기에)

subset_calls <- window(calls, start = 30, end = 33)

# Plot the calls data
autoplot(subset_calls)

# Set up the xreg matrix
xreg <- fourier(subset_calls, K = c(10, 0))

# Fit a dynamic regression model
fit <- auto.arima(subset_calls, xreg = xreg, seasonal = FALSE, stationary = TRUE)

# Check the residuals
checkresiduals(fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from Regression with ARIMA(1,0,1) errors
## Q* = 602.6, df = 505, p-value = 0.001787
## 
## Model df: 2.   Total lags used: 507

# Plot forecasts for 10 working days ahead
fc <- forecast(fit, xreg =  fourier(subset_calls, c(10, 0), h = 1690))
autoplot(fc)

5.3 TBATS models

지금까지 배운 모든 것들이 합쳐진 자동화툴

tbats() function
Trigonometric terms for seasonality
Box-Cox transformations for heterogeneity
ARMA errors for short-term dynamics
Trend(possibly damped)
Seasonal(including multiple and non-integer periods)

But somewhat dangerous as sometimes the automatic choices are not so good!!!

Example

US Gasoline data

gasoline %>% tbats() %>% forecast() %>% autoplot() +
  xlab("Year") + ylab("thousand barrels per day")

TBATS(1, {0,0}, -, {<52.18, 12>})

1: Box-Cox transformation parameter, \(\lambda=1\)로 변환이 필요하지 않음
{0,0}: ARMA errors로 p와 q를 의미하며 단순히 White noise가 사용됨
- : demping parameter, no damping 의미
{<52.18, 12>}: Fourier terms로 계절주기 52.18과 order K는 12가 선택됨

Handles non-integer seasonality, multiple seasonal periods
Entirely automated
Prediction intervals often too wide
Very slow on long series

TBATS models for electricity demand

gas dataset
- 점점 분산이 커지기에 transformation이 필요해보임
- 계절성 점점 변하고 강력한 추세가 존재함

# Plot the gas data
autoplot(gas)

# Fit a TBATS model to the gas data
fit <- tbats(gas)

# Forecast the series for the next 5 years
fc <- forecast(fit, h = 60)

# Plot the forecasts
autoplot(fc)

# Record the Box-Cox parameter and the order of the Fourier terms
lambda <- 0.082
K <- 5

Forcasting in R

lumpen95

2025-02-16