This report is a summary of lesson by David Stoffer, Data Camp

library(tidyverse)
library(astsa)
library(xts)

1 Time series data and models

1.1 ARIMA

The model is autoregression with autocorrelated errors

plot(AirPassengers, main = "Airpassengers plot")

  • There is seasonality
  • There is trend
  • There is heteroscedasticity

1.2 Stationarity

It means that we can use simple averaging to estimate correlation

1.2.1 Dealing with trend and heteroscedasticity

# Plot GNP series (gnp) and its growth rate
par(mfrow = c(2,1))
plot(gnp)
plot(diff(log(gnp)))

# Plot DJIA closings (djia$Close) and its returns
par(mfrow = c(2,1))
plot(djia$Close)
plot(diff(log(djia$Close)))

1.3 Stationary time series: ARMA

1.3.1 Why it is valid to use ARIMA models for stationary time series data

Wold Decomposition

Wold proved that any stationary time series may be represented as a linear combination of white noise:

\(X_t = W_t + a_1W_{t-1} + a_2W_{t-2} + ...\)

For constants \(a_1, a_2, ...\)

Any ARIMA model has this form, which means they are suited to modeling time series

Note: Special case of MA(q) is already of this form, where constants are 0 after q-th term

1.3.2 Generating and plottin MA(1)

\(X_t = W_t + 0.9W_{t-1}\)

x <- arima.sim(model = list(order = c(0, 0, 1), ma = 0.9), n = 100)
plot(x)

1.3.3 Generating and plottin AR(2)

\(X_t = -0.9X_{t-2} + W_t\)

x <- arima.sim(model = list(order = c(2, 0, 0), ar = c(0, -0.9)), n = 100)
plot(x)

\(X_t = 1.5X_{t-1} -0.75X_{t-2} + W_t\)

x <- arima.sim(model = list(order = c(2, 0, 0), ar = c(1.5, -0.75)), n = 100)
plot(x)

2 Fitting ARMA models with astsa

2.1 ACF and PACF

plot으로는 AR인지 MA인지 구분이 어려움

  • acf2(): both ACF and PACF

2.2 Estimation

  • 생성된 데이터를 바탕으로 모델을 학습하는 과정
  • sarima함수를 사용해 데이터에 가장 잘 맞는 모델의 계수를 찾음
  • AR(2) with mean 50:

\(X_t = 50 + 1.5(X_{t-1}-50) -0.75(X_{t-2}-50) + W_t\)

AR_2 <- arima.sim(model = list(order = c(2, 0, 0), ar = c(1.5, -0.75)), n = 100) + 50
plot(AR_2)

acf2(AR_2)

##      [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7] [,8]  [,9] [,10] [,11] [,12]
## ACF  0.77  0.34 -0.09 -0.37 -0.44 -0.32 -0.10 0.12  0.20  0.14 -0.01 -0.15
## PACF 0.77 -0.59 -0.19 -0.02  0.00  0.07  0.05 0.00 -0.22  0.04 -0.10  0.01
##      [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
## ACF  -0.21 -0.14  0.02  0.15  0.21  0.17  0.05 -0.06
## PACF  0.04  0.12 -0.01 -0.07  0.04 -0.05  0.02  0.09
AR_2_fit <- sarima(AR_2, p =2, d = 0, q = 0)
## initial  value 0.875805 
## iter   2 value 0.706187
## iter   3 value 0.368232
## iter   4 value 0.247738
## iter   5 value 0.119223
## iter   6 value 0.041274
## iter   7 value 0.009812
## iter   8 value -0.000002
## iter   9 value -0.000006
## iter  10 value -0.000018
## iter  11 value -0.000020
## iter  12 value -0.000020
## iter  13 value -0.000020
## iter  14 value -0.000020
## iter  14 value -0.000020
## iter  14 value -0.000020
## final  value -0.000020 
## converged
## initial  value 0.028972 
## iter   2 value 0.028404
## iter   3 value 0.028180
## iter   4 value 0.028153
## iter   5 value 0.028150
## iter   6 value 0.028150
## iter   7 value 0.028150
## iter   7 value 0.028150
## iter   7 value 0.028150
## final  value 0.028150 
## converged
## <><><><><><><><><><><><><><>
##  
## Coefficients: 
##       Estimate     SE  t.value p.value
## ar1     1.4099 0.0689  20.4578       0
## ar2    -0.7552 0.0684 -11.0481       0
## xmean  50.0757 0.2947 169.9014       0
## 
## sigma^2 estimated as 1.029467 on 97 degrees of freedom 
##  
## AIC = 2.974177  AICc = 2.976677  BIC = 3.078384 
## 

AR_2_fit$ttable
##       Estimate     SE  t.value p.value
## ar1     1.4099 0.0689  20.4578       0
## ar2    -0.7552 0.0684 -11.0481       0
## xmean  50.0757 0.2947 169.9014       0
  • true parameter 1.5, -0.75와 estimated parameter 1.4135, -0.6738이 비슷한 수준임을 확인 가능하다!!!

2.3 AR and MA together: ARMA

2.3.1 ARMA(1,1) estimation

\(X_t = 0.9X_{t-1} + W_t - 0.4W_{t-1}\)

ARMA_11 <- arima.sim(model = list(order = c(1, 0, 1), ar = 0.9, ma = -0.4), n = 100)
plot(ARMA_11)

acf2(ARMA_11)

##      [,1] [,2] [,3]  [,4] [,5]  [,6]  [,7] [,8]  [,9] [,10] [,11] [,12] [,13]
## ACF  0.81 0.67 0.57  0.44 0.37  0.26  0.18 0.15  0.09  0.03  0.03  0.07  0.11
## PACF 0.81 0.03 0.05 -0.11 0.09 -0.16 -0.01 0.09 -0.08 -0.07  0.12  0.17  0.00
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20]
## ACF   0.13  0.14  0.15  0.17  0.14  0.13  0.10
## PACF  0.02 -0.01  0.00  0.02 -0.09  0.07 -0.11
ARMA_11_fit <- sarima(ARMA_11, p =1, d = 0, q = 1)
## initial  value 0.501963 
## iter   2 value 0.313757
## iter   3 value -0.018408
## iter   4 value -0.032802
## iter   5 value -0.042273
## iter   6 value -0.047098
## iter   7 value -0.047131
## iter   8 value -0.047165
## iter   9 value -0.047173
## iter  10 value -0.047208
## iter  11 value -0.047230
## iter  12 value -0.047230
## iter  13 value -0.047230
## iter  13 value -0.047230
## iter  13 value -0.047230
## final  value -0.047230 
## converged
## initial  value -0.039826 
## iter   2 value -0.039988
## iter   3 value -0.040303
## iter   4 value -0.040502
## iter   5 value -0.040531
## iter   6 value -0.040534
## iter   7 value -0.040534
## iter   8 value -0.040535
## iter   9 value -0.040535
## iter  10 value -0.040535
## iter  11 value -0.040535
## iter  12 value -0.040535
## iter  12 value -0.040535
## iter  12 value -0.040535
## final  value -0.040535 
## converged
## <><><><><><><><><><><><><><>
##  
## Coefficients: 
##       Estimate     SE t.value p.value
## ar1     0.8316 0.0681 12.2203  0.0000
## ma1    -0.0553 0.1289 -0.4291  0.6688
## xmean   0.3597 0.5119  0.7028  0.4839
## 
## sigma^2 estimated as 0.9121771 on 97 degrees of freedom 
##  
## AIC = 2.836807  AICc = 2.839307  BIC = 2.941014 
## 

ARMA_11_fit$ttable
##       Estimate     SE t.value p.value
## ar1     0.8316 0.0681 12.2203  0.0000
## ma1    -0.0553 0.1289 -0.4291  0.6688
## xmean   0.3597 0.5119  0.7028  0.4839

2.3.2 Identify an ARMA model

dl_varve <- diff(log(varve))

plot(varve)

plot(dl_varve)

acf2(dl_varve)

##      [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10] [,11] [,12]
## ACF  -0.4 -0.04 -0.06  0.01  0.00  0.04 -0.04  0.04  0.01 -0.05  0.06 -0.06
## PACF -0.4 -0.24 -0.23 -0.18 -0.15 -0.08 -0.11 -0.05 -0.01 -0.07  0.02 -0.05
##      [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## ACF  -0.04  0.08 -0.02  0.01  0.00  0.03 -0.05 -0.06  0.07  0.04 -0.06  0.05
## PACF -0.12 -0.03 -0.05 -0.04 -0.03  0.03 -0.03 -0.13 -0.04  0.01 -0.06  0.01
##      [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36]
## ACF  -0.01 -0.04  0.05 -0.05  0.03 -0.02  0.00  0.06 -0.05 -0.03  0.04 -0.05
## PACF  0.02 -0.04  0.03 -0.02  0.00 -0.03 -0.02  0.04 -0.02 -0.02  0.01 -0.07
  • In this case, the ACF cuts off at lag 1 and the PACF tails off, suggesting an MA(1) model

2.4 Model choice and residual analysis

2.4.1 AIC & BIC

  • AIC and BIC measure the error and penalize (differently) for adding parameters
  • BIC has a bigger penalty and tends to choose a model with fewer parameters
  • The goal is to find the model with the smallest AIC or BIC

2.4.2 Residual analysis

To make sure the residuals are white Gaussian noise

AR_1 <- arima.sim(model = list(order = c(1, 0, 0), ar = 0.9), n = 100)
sarima(AR_1, 1, 0, 0)
## initial  value 0.501392 
## iter   2 value -0.036888
## iter   3 value -0.036992
## iter   4 value -0.037045
## iter   5 value -0.037079
## iter   6 value -0.037126
## iter   7 value -0.037134
## iter   8 value -0.037137
## iter   9 value -0.037139
## iter  10 value -0.037142
## iter  11 value -0.037142
## iter  12 value -0.037142
## iter  13 value -0.037142
## iter  13 value -0.037142
## iter  13 value -0.037142
## final  value -0.037142 
## converged
## initial  value -0.030549 
## iter   2 value -0.030673
## iter   3 value -0.031082
## iter   4 value -0.031090
## iter   5 value -0.031095
## iter   6 value -0.031095
## iter   7 value -0.031095
## iter   8 value -0.031095
## iter   9 value -0.031095
## iter  10 value -0.031095
## iter  10 value -0.031095
## iter  10 value -0.031095
## final  value -0.031095 
## converged
## <><><><><><><><><><><><><><>
##  
## Coefficients: 
##       Estimate     SE t.value p.value
## ar1     0.8085 0.0573 14.1069   0.000
## xmean  -1.6484 0.4841 -3.4053   0.001
## 
## sigma^2 estimated as 0.9297902 on 98 degrees of freedom 
##  
## AIC = 2.835687  AICc = 2.836924  BIC = 2.913842 
## 

sarima() includes residual analysis graphic showing:

  1. Standardized residuals
  • Should be inspected for patterns
  1. Sample ACF of residuals
  • 95% of the ACF values should be between the blue dashed lines
  1. Normal Q-Q plot
  • If the residuals are normal, the points will line up with the line
  1. Q-statistic p-values
  • As long as most points are above the blue dashed line, then you can safely assume the noise is white

2.4.3 Exercise

oil_returns <- diff(log(oil))

plot(oil_returns)

acf2(oil_returns)

##      [,1]  [,2] [,3]  [,4] [,5]  [,6]  [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF  0.13 -0.07 0.13 -0.01 0.02 -0.03 -0.03 0.13 0.08  0.02  0.01     0 -0.02
## PACF 0.13 -0.09 0.16 -0.06 0.05 -0.08  0.00 0.12 0.05  0.03 -0.02     0 -0.03
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF   0.06 -0.05 -0.09  0.03  0.05 -0.05 -0.07  0.04  0.09 -0.05 -0.08 -0.07
## PACF  0.09 -0.07 -0.06  0.01  0.04 -0.05 -0.05  0.05  0.06 -0.06 -0.05 -0.08
##      [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37]
## ACF   0.00 -0.11 -0.07  0.02 -0.02 -0.03 -0.05 -0.03  0.00 -0.09 -0.01 -0.04
## PACF  0.02 -0.11  0.01  0.00 -0.01 -0.05 -0.04  0.02  0.02 -0.08  0.02 -0.04
##      [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48] [,49]
## ACF  -0.01  0.02 -0.01 -0.06  0.01  0.00 -0.01  0.04  0.01  0.05  0.07 -0.01
## PACF  0.04 -0.01 -0.01 -0.05  0.03 -0.03  0.00  0.08  0.00  0.05  0.01  0.04
##      [,50] [,51] [,52] [,53] [,54] [,55] [,56] [,57] [,58] [,59] [,60] [,61]
## ACF  -0.03  0.01 -0.04 -0.04 -0.03     0 -0.01 -0.10 -0.01 -0.05 -0.04 -0.03
## PACF -0.08  0.01 -0.07  0.00 -0.06     0 -0.06 -0.11  0.01 -0.09 -0.01 -0.04
##      [,62] [,63] [,64] [,65] [,66] [,67] [,68] [,69] [,70] [,71] [,72] [,73]
## ACF   0.01  0.01 -0.01 -0.04  0.02     0 -0.01 -0.03 -0.02 -0.05 -0.01 -0.01
## PACF  0.04 -0.01  0.00 -0.04  0.03     0  0.00 -0.04 -0.02 -0.04  0.00 -0.01
##      [,74] [,75] [,76] [,77] [,78] [,79] [,80] [,81] [,82] [,83] [,84] [,85]
## ACF  -0.02  0.01  0.02  0.04 -0.01  0.03  0.02 -0.04 -0.01  0.02  0.03  0.01
## PACF  0.00  0.02 -0.01  0.04 -0.02  0.08 -0.03 -0.03 -0.03  0.03 -0.03 -0.02
##      [,86] [,87] [,88] [,89] [,90] [,91] [,92] [,93] [,94] [,95] [,96] [,97]
## ACF   0.03  0.08 -0.04 -0.02  0.01 -0.04  0.05  0.07 -0.04  0.02  0.05  0.01
## PACF  0.03  0.04 -0.09 -0.01 -0.02 -0.03  0.03  0.05 -0.11  0.02 -0.01  0.02
##      [,98] [,99] [,100] [,101] [,102] [,103] [,104] [,105] [,106] [,107] [,108]
## ACF   0.00  0.01   0.04   0.01  -0.03  -0.04  -0.01   0.02   0.01   0.01   0.06
## PACF -0.03  0.06   0.01  -0.05   0.02  -0.03   0.01   0.00   0.04  -0.01   0.07
##      [,109] [,110] [,111] [,112] [,113] [,114] [,115] [,116] [,117] [,118]
## ACF    0.08   0.04   0.02   0.01   0.03   0.02  -0.02  -0.04  -0.01   0.04
## PACF   0.04   0.04   0.00   0.05  -0.01   0.00  -0.04  -0.03  -0.03   0.02
##      [,119] [,120] [,121] [,122] [,123] [,124] [,125] [,126] [,127] [,128]
## ACF    0.05  -0.02  -0.02   0.03   0.01  -0.04  -0.08   0.02   0.00  -0.04
## PACF   0.04  -0.01  -0.04  -0.01   0.03  -0.03  -0.07   0.00  -0.02  -0.04
##      [,129] [,130] [,131] [,132] [,133] [,134] [,135] [,136] [,137] [,138]
## ACF    0.01   0.02   0.01   0.02   0.00  -0.01   0.00  -0.03  -0.06   0.01
## PACF   0.01   0.01  -0.01   0.02   0.05   0.02   0.01   0.02  -0.02   0.04
##      [,139] [,140] [,141] [,142] [,143] [,144] [,145] [,146] [,147] [,148]
## ACF   -0.02  -0.02   0.02  -0.01  -0.03   0.00   0.00  -0.04  -0.01  -0.02
## PACF   0.01  -0.03  -0.02   0.02  -0.01   0.02  -0.01   0.02  -0.02  -0.03
##      [,149] [,150] [,151] [,152] [,153] [,154] [,155] [,156] [,157] [,158]
## ACF   -0.04  -0.04   0.01   0.01   0.04   0.03   0.01   0.05   0.01  -0.06
## PACF  -0.02  -0.01   0.02  -0.01   0.04   0.03  -0.04   0.03   0.00  -0.05
##      [,159] [,160] [,161] [,162] [,163] [,164] [,165] [,166] [,167] [,168]
## ACF    0.02   0.05  -0.02   0.05   0.00  -0.01      0  -0.01  -0.02  -0.01
## PACF   0.02   0.03   0.00   0.03   0.01   0.00      0   0.02   0.01  -0.01
##      [,169] [,170] [,171] [,172] [,173] [,174] [,175] [,176] [,177] [,178]
## ACF    0.00  -0.03  -0.01  -0.02  -0.02   0.04  -0.01  -0.03   0.02   0.01
## PACF   0.03  -0.03   0.00  -0.04   0.00   0.02  -0.03  -0.01   0.01   0.02
##      [,179] [,180] [,181] [,182] [,183] [,184] [,185] [,186] [,187] [,188]
## ACF   -0.01  -0.01  -0.04   0.07  -0.01  -0.04   0.05  -0.02  -0.01   0.01
## PACF  -0.01   0.00  -0.04   0.08  -0.05   0.02  -0.01  -0.02   0.03   0.00
##      [,189] [,190] [,191] [,192] [,193] [,194] [,195] [,196] [,197] [,198]
## ACF   -0.05  -0.04  -0.01   0.01   0.04  -0.01   0.00   0.06  -0.06  -0.02
## PACF  -0.03  -0.04   0.01  -0.01   0.07  -0.01   0.02  -0.01  -0.03   0.01
##      [,199] [,200] [,201] [,202] [,203] [,204] [,205] [,206] [,207] [,208]
## ACF    0.02      0   0.00   0.00  -0.04   0.00   0.04   0.04   0.04   0.01
## PACF   0.00      0  -0.02  -0.01  -0.05  -0.01   0.02   0.04   0.05  -0.01
sarima(oil_returns, 1, 0, 1)
## initial  value -3.057594 
## iter   2 value -3.061420
## iter   3 value -3.067360
## iter   4 value -3.067479
## iter   5 value -3.071834
## iter   6 value -3.074359
## iter   7 value -3.074843
## iter   8 value -3.076656
## iter   9 value -3.080467
## iter  10 value -3.081546
## iter  11 value -3.081603
## iter  12 value -3.081615
## iter  13 value -3.081642
## iter  14 value -3.081643
## iter  14 value -3.081643
## iter  14 value -3.081643
## final  value -3.081643 
## converged
## initial  value -3.082345 
## iter   2 value -3.082345
## iter   3 value -3.082346
## iter   4 value -3.082346
## iter   5 value -3.082346
## iter   5 value -3.082346
## iter   5 value -3.082346
## final  value -3.082346 
## converged
## <><><><><><><><><><><><><><>
##  
## Coefficients: 
##       Estimate     SE t.value p.value
## ar1    -0.5264 0.0871 -6.0422  0.0000
## ma1     0.7146 0.0683 10.4699  0.0000
## xmean   0.0018 0.0022  0.7981  0.4252
## 
## sigma^2 estimated as 0.002101997 on 541 degrees of freedom 
##  
## AIC = -3.312109  AICc = -3.312027  BIC = -3.280499 
## 


3 ARIMA Models

3.1 ARIMA - integrated ARMA

  • p(자기회귀 계수, AR): 과거의 값이 현재 값에 영향을 주는 정도
  • d(차분 차수, I): 데이터를 몇 번 차분해야 정상성을 갖게 되는지
  • q(이동평균 차수, MA): 과거의 오차(노이즈)가 현재 값에 영향을 주는 정도

3.1.1 Identifying ARIMA

  • A time series exhibits ARIMA behavior if the differenced data has ARMA behavior
  • arima.sim(model = list(order = c(1, 1, 0), ar = 0.9), n = 100)
  • d = 1 means that if we difference once, we will get stationary
    • 즉, 원래 데이터는 non-stationary를 보이지만, 한 번 차분하면 AR(1) 모델이 됨
x <- arima.sim(model = list(order = c(1, 1, 0), ar = 0.9), n = 100)
plot(x, main = "ARIMA(p = 1, d = 1, q = 0) -> non stationary")

plot(diff(x), main = "ARMA(p = 1, d = 0, q = 0) -> stationary")


3.1.2 ACF and PCF of an Integrated ARMA

acf2(x)

##      [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10] [,11] [,12]
## ACF  0.98  0.95  0.92  0.90  0.87  0.84  0.81  0.79  0.76  0.73  0.70  0.68
## PACF 0.98 -0.05 -0.05 -0.03 -0.02 -0.01 -0.01 -0.01 -0.01 -0.01 -0.01 -0.02
##      [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21]
## ACF   0.65  0.62  0.59  0.56  0.53  0.50  0.47  0.44  0.42
## PACF -0.02 -0.03 -0.02 -0.03 -0.03 -0.02 -0.01 -0.02 -0.03
PACF의 1 lag가 거의 "1"이다.


3.1.3 ACF and PCF of a Differenced ARIMA

acf2(diff(x))

##      [,1]  [,2]  [,3]  [,4] [,5]  [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF  0.86  0.73  0.60  0.48 0.42  0.32 0.26 0.21 0.18  0.16  0.11  0.08  0.07
## PACF 0.86 -0.01 -0.08 -0.04 0.14 -0.18 0.04 0.06 0.01 -0.02 -0.07  0.02  0.06
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20]
## ACF   0.06  0.06  0.04  0.02  0.00  0.00 -0.01
## PACF  0.00 -0.02 -0.01 -0.04 -0.04  0.09 -0.04
  • 전형적인 AR(1)의 ACF, PACF이다.


3.1.4 Exercise Ⅰ Simulated ARIMA

\(Y_t = 1 + 1.5Y_{t-1} - 0.75Y_{t-2} + W_t\)

where \(Y_t = \nabla X_t = X_t - X_{t-1}\)

x <- arima.sim(model = list(order = c(2, 1, 0), ar = c(1.5, -0.75)), n = 250) + 1

plot(x, main = expression(X[t]))

plot(diff(x), main = expression(Y[t]))

sarima(x, 2, 1, 0)
## initial  value 1.049537 
## iter   2 value 0.909539
## iter   3 value 0.514055
## iter   4 value 0.305928
## iter   5 value 0.285037
## iter   6 value 0.012520
## iter   7 value 0.010980
## iter   8 value 0.004197
## iter   9 value 0.004137
## iter  10 value 0.004023
## iter  11 value 0.003978
## iter  12 value 0.003976
## iter  13 value 0.003975
## iter  14 value 0.003975
## iter  14 value 0.003975
## iter  14 value 0.003975
## final  value 0.003975 
## converged
## initial  value 0.010300 
## iter   2 value 0.010291
## iter   3 value 0.010279
## iter   4 value 0.010279
## iter   5 value 0.010279
## iter   6 value 0.010279
## iter   7 value 0.010279
## iter   7 value 0.010279
## iter   7 value 0.010279
## final  value 0.010279 
## converged
## <><><><><><><><><><><><><><>
##  
## Coefficients: 
##          Estimate     SE  t.value p.value
## ar1        1.4832 0.0411  36.1235  0.0000
## ar2       -0.7576 0.0410 -18.4906  0.0000
## constant   0.1984 0.2316   0.8565  0.3926
## 
## sigma^2 estimated as 1.00879 on 247 degrees of freedom 
##  
## AIC = 2.890435  AICc = 2.890825  BIC = 2.946779 
## 


3.1.5 Exercise Ⅱ Global warming

  • astsa::gtemp_land dataset
astsa::gtemp_land
## Time Series:
## Start = 1850 
## End = 2023 
## Frequency = 1 
##   [1] -0.50 -0.60 -0.50 -0.50 -0.20 -0.50 -0.80 -0.40 -0.40 -0.10 -0.70 -0.20
##  [13] -0.50 -0.20 -0.30 -0.60 -0.40 -0.60 -0.30 -0.50 -0.20 -0.10 -0.40 -0.30
##  [25] -0.40 -0.50 -0.20 -0.10  0.26 -0.20 -0.50 -0.40 -0.20 -0.40 -0.70 -0.40
##  [37] -0.50 -0.30 -0.70  0.00 -0.20 -0.30 -0.50 -0.40  0.10 -0.50 -0.50 -0.70
##  [49] -1.10 -0.50  0.00  0.21 -0.40  0.05 -0.60 -0.70 -0.30 -0.30 -0.80 -0.70
##  [61] -0.50 -0.80 -0.70 -0.40 -0.10 -0.20 -0.50 -1.00 -0.30 -0.50  0.05 -0.10
##  [73]  0.00 -0.30  0.01 -0.20  0.53 -0.60 -0.20 -0.30  0.21 -0.10 -0.10 -0.30
##  [85] -0.30  0.13 -0.30 -0.40  0.38 -0.20  0.06 -0.10 -0.10  0.00  0.36  0.00
##  [97]  0.00  0.32 -0.30  0.24  0.00 -0.20 -0.30  0.33 -0.30 -0.60 -0.20 -0.30
## [109]  0.03  0.49 -0.90  0.08  0.36 -0.40 -0.30 -0.10 -0.10  0.19  0.80 -0.20
## [121]  0.00  0.00  0.13  0.61  0.18  0.32 -0.20  0.29  0.37  0.10  0.19  0.85
## [133]  0.00  0.64  0.36  0.42  0.67 -0.20  0.51  0.66  1.61  0.45  0.78  0.64
## [145]  0.41  0.75  0.44  0.81  0.94  0.36  1.14  0.85  1.58  0.84  1.02  1.36
## [157]  1.20  1.27  1.65  0.83  1.54  1.27  0.93  1.10  1.52  1.76  2.50  2.15
## [169]  1.63  2.14  1.95  1.58  2.13  2.26
plot(gtemp_land)

plot(diff(gtemp_land))

acf2(diff(gtemp_land))

##       [,1]  [,2] [,3]  [,4]  [,5]  [,6]  [,7]  [,8] [,9] [,10] [,11] [,12]
## ACF  -0.51  0.04  0.0 -0.05  0.00  0.09 -0.06 -0.06 0.13 -0.05 -0.13  0.24
## PACF -0.51 -0.30 -0.2 -0.22 -0.22 -0.06 -0.04 -0.15 0.01  0.07 -0.16  0.10
##      [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## ACF  -0.15  0.01  0.07 -0.05 -0.02  0.03  0.05 -0.08  0.03     0 -0.08  0.14
## PACF  0.07  0.00  0.06  0.09  0.05 -0.03  0.10  0.09 -0.02     0 -0.04  0.00


In this case, you can say that:

  1. The ACF and the PACF are both tailing off, implying an ARIMA(1,1,1) model.
  2. The ACF cuts off at lag 1, and the PACF is tailing off, implying an ARIMA(0,1,1) model.

3.1.6 Which one is better model???

sarima(gtemp_land, 1, 1, 1)
## initial  value -0.839760 
## iter   2 value -1.032017
## iter   3 value -1.070147
## iter   4 value -1.084198
## iter   5 value -1.096002
## iter   6 value -1.103816
## iter   7 value -1.103823
## iter   8 value -1.105269
## iter   9 value -1.105358
## iter  10 value -1.105364
## iter  11 value -1.105385
## iter  12 value -1.105387
## iter  13 value -1.105387
## iter  13 value -1.105387
## iter  13 value -1.105387
## final  value -1.105387 
## converged
## initial  value -1.105486 
## iter   2 value -1.105497
## iter   3 value -1.105511
## iter   4 value -1.105513
## iter   5 value -1.105513
## iter   5 value -1.105513
## iter   5 value -1.105513
## final  value -1.105513 
## converged
## <><><><><><><><><><><><><><>
##  
## Coefficients: 
##          Estimate     SE  t.value p.value
## ar1       -0.0519 0.0890  -0.5839  0.5601
## ma1       -0.7829 0.0495 -15.8110  0.0000
## constant   0.0146 0.0053   2.7568  0.0065
## 
## sigma^2 estimated as 0.1089368 on 170 degrees of freedom 
##  
## AIC = 0.6730943  AICc = 0.6739152  BIC = 0.7460028 
## 

sarima(gtemp_land, 0, 1, 1)
## initial  value -0.842449 
## iter   2 value -1.043995
## iter   3 value -1.086437
## iter   4 value -1.105162
## iter   5 value -1.106533
## iter   6 value -1.107112
## iter   7 value -1.107288
## iter   8 value -1.107352
## iter   9 value -1.107368
## iter  10 value -1.107368
## iter  10 value -1.107368
## iter  10 value -1.107368
## final  value -1.107368 
## converged
## initial  value -1.104513 
## iter   2 value -1.104517
## iter   3 value -1.104520
## iter   4 value -1.104533
## iter   4 value -1.104533
## iter   4 value -1.104533
## final  value -1.104533 
## converged
## <><><><><><><><><><><><><><>
##  
## Coefficients: 
##          Estimate     SE  t.value p.value
## ma1       -0.7974 0.0395 -20.1909  0.0000
## constant   0.0145 0.0052   2.7905  0.0059
## 
## sigma^2 estimated as 0.1091637 on 171 degrees of freedom 
##  
## AIC = 0.663493  AICc = 0.663901  BIC = 0.7181744 
## 

ARIMA(0, 1, 1)의 AIC, BIC가 모두 ARIMA(1, 1, 1)보다 작으므로 better model !!!

3.2 ARIMA diagnostics

  1. ACF를 바탕으로 적합한 모델 선정
  2. 추가 AR, MA를 추가해서 리모델 실시 - 의도적으로 overfitting
  3. 추가된 인수가 유의미한지 모델 계수가 변하는지 확인 - p-value 확인
  4. 만약 유의미한 변화가 관측된다면 기존 모델 재검토

3.3 Forcasting ARIMA

  • The model describes how the dynamics of the time series behave over time
  • Forcasting simply continues the model dynamics into the future
  • Use astsa::sarima.for()
oil <- window(astsa::oil, end = 2006)
oilf <- window(astsa::oil, end = 2007)

sarima.for(oil, n.ahead = 52, 1, 1, 1)
## $pred
## Time Series:
## Start = c(2006, 2) 
## End = c(2007, 1) 
## Frequency = 52 
##  [1] 60.71882 60.43909 60.74214 60.75455 60.91190 60.99696 61.11808 61.22122
##  [9] 61.33332 61.44095 61.55081 61.65956 61.76887 61.87790 61.98706 62.09616
## [17] 62.20529 62.31441 62.42353 62.53265 62.64177 62.75089 62.86001 62.96913
## [25] 63.07825 63.18737 63.29649 63.40561 63.51473 63.62385 63.73297 63.84209
## [33] 63.95121 64.06033 64.16945 64.27857 64.38769 64.49681 64.60593 64.71505
## [41] 64.82417 64.93329 65.04241 65.15153 65.26065 65.36977 65.47889 65.58801
## [49] 65.69713 65.80625 65.91537 66.02449
## 
## $se
## Time Series:
## Start = c(2006, 2) 
## End = c(2007, 1) 
## Frequency = 52 
##  [1]  1.430557  2.270997  2.776644  3.245575  3.636008  3.996918  4.323903
##  [8]  4.629671  4.915600  5.186193  5.443159  5.688620  5.923876  6.150160
## [15]  6.368398  6.579407  6.783853  6.982317  7.175292  7.363212  7.546454
## [22]  7.725351  7.900198  8.071258  8.238767  8.402937  8.563961  8.722013
## [29]  8.877251  9.029820  9.179855  9.327476  9.472797  9.615922  9.756948
## [36]  9.895965 10.033055 10.168297 10.301764 10.433524 10.563640 10.692173
## [43] 10.819180 10.944712 11.068821 11.191554 11.312955 11.433067 11.551931
## [50] 11.669584 11.786062 11.901401
lines(oilf)

  • dark grey swatch: \(\pm 1 sd\)
  • light grey swatch: \(\pm 2 sd\)(95% cf.level)

3.3.1 Forecasting global temperatures

# Fit an ARIMA(0,1,2) to globtemp and check the fit
sarima(gtemp_land, 0, 1, 1)
## initial  value -0.842449 
## iter   2 value -1.043995
## iter   3 value -1.086437
## iter   4 value -1.105162
## iter   5 value -1.106533
## iter   6 value -1.107112
## iter   7 value -1.107288
## iter   8 value -1.107352
## iter   9 value -1.107368
## iter  10 value -1.107368
## iter  10 value -1.107368
## iter  10 value -1.107368
## final  value -1.107368 
## converged
## initial  value -1.104513 
## iter   2 value -1.104517
## iter   3 value -1.104520
## iter   4 value -1.104533
## iter   4 value -1.104533
## iter   4 value -1.104533
## final  value -1.104533 
## converged
## <><><><><><><><><><><><><><>
##  
## Coefficients: 
##          Estimate     SE  t.value p.value
## ma1       -0.7974 0.0395 -20.1909  0.0000
## constant   0.0145 0.0052   2.7905  0.0059
## 
## sigma^2 estimated as 0.1091637 on 171 degrees of freedom 
##  
## AIC = 0.663493  AICc = 0.663901  BIC = 0.7181744 
## 

# Forecast data 35 years into the future
sarima.for(gtemp_land, 35, 0, 1, 1)

## $pred
## Time Series:
## Start = 2024 
## End = 2058 
## Frequency = 1 
##  [1] 1.993396 2.007945 2.022493 2.037041 2.051590 2.066138 2.080686 2.095235
##  [9] 2.109783 2.124331 2.138880 2.153428 2.167976 2.182525 2.197073 2.211622
## [17] 2.226170 2.240718 2.255267 2.269815 2.284363 2.298912 2.313460 2.328008
## [25] 2.342557 2.357105 2.371654 2.386202 2.400750 2.415299 2.429847 2.444395
## [33] 2.458944 2.473492 2.488040
## 
## $se
## Time Series:
## Start = 2024 
## End = 2058 
## Frequency = 1 
##  [1] 0.3303993 0.3371092 0.3436881 0.3501434 0.3564818 0.3627094 0.3688320
##  [8] 0.3748545 0.3807818 0.3866183 0.3923679 0.3980345 0.4036215 0.4091323
## [15] 0.4145698 0.4199369 0.4252362 0.4304704 0.4356416 0.4407522 0.4458042
## [22] 0.4507996 0.4557402 0.4606279 0.4654642 0.4702508 0.4749891 0.4796807
## [29] 0.4843268 0.4889288 0.4934878 0.4980051 0.5024818 0.5069190 0.5113177

4 Seasonal ARIMA

4.1 Pure seasonal models

  • Often collect data with a known seasonal component
  • Air Passengers(1 cycle every S = 12 months)
  • Johnson & Johnson Earnings(1 cycle every S = 4 quarters)

Consider pure seasonal models such as an SAR\((P = 1)_{s=12}\)

\(X_t = \phi X_{t-12} + W_t\)

4.1.1 Fit a pure seasonal model

SARMA(P=1, Q=1)S=12

\(X_t = 0.9X_{t-12} + W_t + 0.5W_{t-12}\)

# create simulation data
s_12 <- arima.sim(n = 252, model = list(order = c(0, 0, 0),
                                     seasonal = list(order = c(1, 0, 1),
                                                     ar = 0.9,
                                                     ma = 0.5,
                                                     period = 12)))
s_12 <- ts(s_12, frequency = 12)
# Plot sample P/ACF to lag 60 and compare to the true values
acf2(s_12, max.lag = 60)

##      [,1] [,2]  [,3]  [,4] [,5]  [,6] [,7]  [,8]  [,9] [,10] [,11] [,12] [,13]
## ACF  0.09 0.02 -0.04 -0.02 0.06 -0.03 0.08 -0.07 -0.06 -0.03  0.10  0.05 -0.10
## PACF 0.09 0.01 -0.05 -0.01 0.06 -0.05 0.08 -0.08 -0.06 -0.01  0.11  0.01 -0.11
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF  -0.08 -0.09 -0.04 -0.03  0.04  0.05 -0.02  -0.1 -0.02 -0.05 -0.01  -0.1
## PACF -0.06 -0.06 -0.04 -0.03  0.04  0.05  0.00  -0.1 -0.02 -0.07  0.00  -0.1
##      [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37]
## ACF  -0.03 -0.07  0.03 -0.07 -0.02 -0.02 -0.01 -0.03  0.06  0.03  0.05  0.14
## PACF -0.01 -0.06  0.05 -0.13 -0.04 -0.05  0.04 -0.05  0.08 -0.02  0.06  0.12
##      [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48] [,49]
## ACF   0.12  0.05  0.02  0.02  0.11 -0.03 -0.02  0.03  0.05  0.01  0.06  0.03
## PACF  0.08 -0.01  0.04  0.03  0.10 -0.06 -0.04  0.05  0.05 -0.01  0.04  0.03
##      [,50] [,51] [,52] [,53] [,54] [,55] [,56] [,57] [,58] [,59] [,60]
## ACF   0.05 -0.07 -0.05 -0.03 -0.06  0.09 -0.03  0.05 -0.07 -0.08  0.05
## PACF  0.08 -0.04 -0.05 -0.03 -0.05  0.15 -0.10  0.07 -0.07  0.00  0.03
sarima(s_12, 0, 0, 0, 1, 0, 1, 12)
## initial  value 0.106191 
## iter   2 value 0.105910
## iter   3 value 0.105146
## iter   4 value 0.104987
## iter   5 value 0.103842
## iter   6 value 0.103069
## iter   7 value 0.102479
## iter   8 value 0.102385
## iter   9 value 0.102365
## iter  10 value 0.102352
## iter  11 value 0.102351
## iter  12 value 0.102351
## iter  13 value 0.102351
## iter  14 value 0.102350
## iter  15 value 0.102350
## iter  16 value 0.102350
## iter  17 value 0.102350
## iter  17 value 0.102350
## iter  17 value 0.102350
## final  value 0.102350 
## converged
## initial  value 0.106479 
## iter   2 value 0.106388
## iter   3 value 0.106167
## iter   4 value 0.106145
## iter   5 value 0.106132
## iter   6 value 0.106130
## iter   7 value 0.106122
## iter   8 value 0.106102
## iter   9 value 0.106093
## iter  10 value 0.106083
## iter  11 value 0.106076
## iter  12 value 0.106071
## iter  13 value 0.106056
## iter  14 value 0.106013
## iter  15 value 0.105973
## iter  16 value 0.105803
## iter  17 value 0.105489
## iter  18 value 0.105394
## iter  19 value 0.105366
## iter  20 value 0.105345
## iter  21 value 0.105295
## iter  22 value 0.105120
## iter  23 value 0.105032
## iter  24 value 0.104919
## iter  25 value 0.104888
## iter  26 value 0.104850
## iter  27 value 0.104696
## iter  28 value 0.104361
## iter  29 value 0.103898
## iter  30 value 0.103752
## iter  31 value 0.103532
## iter  32 value 0.103512
## iter  33 value 0.103407
## iter  34 value 0.103247
## iter  35 value 0.103033
## iter  36 value 0.102753
## iter  37 value 0.102706
## iter  38 value 0.102658
## iter  39 value 0.102608
## iter  40 value 0.102602
## iter  41 value 0.102595
## iter  42 value 0.102579
## iter  43 value 0.102550
## iter  44 value 0.102534
## iter  45 value 0.102526
## iter  46 value 0.102515
## iter  47 value 0.102511
## iter  48 value 0.102506
## iter  49 value 0.102497
## iter  50 value 0.102480
## iter  51 value 0.102472
## iter  52 value 0.102466
## iter  53 value 0.102457
## iter  54 value 0.102456
## iter  55 value 0.102453
## iter  56 value 0.102449
## iter  57 value 0.102439
## iter  58 value 0.102435
## iter  59 value 0.102432
## iter  60 value 0.102429
## iter  61 value 0.102428
## iter  62 value 0.102427
## iter  63 value 0.102425
## iter  64 value 0.102420
## iter  65 value 0.102418
## iter  66 value 0.102417
## iter  67 value 0.102416
## iter  68 value 0.102415
## iter  69 value 0.102415
## iter  70 value 0.102414
## iter  71 value 0.102411
## iter  72 value 0.102410
## iter  73 value 0.102410
## iter  74 value 0.102409
## iter  75 value 0.102409
## iter  76 value 0.102409
## iter  77 value 0.102408
## iter  78 value 0.102407
## iter  79 value 0.102407
## iter  80 value 0.102407
## iter  81 value 0.102406
## iter  82 value 0.102406
## iter  83 value 0.102406
## iter  84 value 0.102406
## iter  85 value 0.102406
## iter  86 value 0.102406
## iter  87 value 0.102405
## iter  88 value 0.102405
## iter  89 value 0.102405
## iter  90 value 0.102405
## iter  91 value 0.102405
## iter  92 value 0.102405
## iter  93 value 0.102405
## iter  94 value 0.102405
## iter  95 value 0.102405
## iter  96 value 0.102405
## iter  97 value 0.102405
## iter  98 value 0.102405
## iter  99 value 0.102405
## iter 100 value 0.102405
## final  value 0.102405 
## stopped after 100 iterations
## <><><><><><><><><><><><><><>
##  
## Coefficients: 
##       Estimate     SE t.value p.value
## sar1    0.9300 0.1105  8.4195  0.0000
## sma1   -0.8877 0.1350 -6.5743  0.0000
## xmean  -0.0305 0.0919 -0.3321  0.7401
## 
## sigma^2 estimated as 1.22378 on 249 degrees of freedom 
##  
## AIC = 3.074433  AICc = 3.074817  BIC = 3.130456 
## 


4.1.2 Mixed seasonal models

  • Mixed model: SARIMA\((p,d,q) \times (P,D,Q)_s\) model
  • Consider a SARIMA\((0,0,1) \times (1,0,0)_{12}\) model

\(X_t = \phi X_{t-12} + W_t + \theta W_{t-1}\)

  • SAR(1): Value this month is related to last year’s value \(X_{t-12}\)
  • MA(1): This month’s value related to last month’s shock \(W_{t-1}\)

4.2 ACF and PACF of SARIMA(0,0,1) x (1,0,0)s=12

  • The ACF and PACF for this mixed model:

\(X_t = 0.8 X_{t-12} + W_t - 0.5 W_{t-1}\)


  • ACF가 12 lag 마다 tail off하고 PACF가 12 lag에서 cut off 된 형태 => SAR(12) 필요



  • ACF가 12 lag 이전 1 lag에서 cut off 하고 PACF가 12 lag 이전에서 tail off 된 형태 => MA(1) 필요


4.2.1 Seasonal Persistence

4.2.2 Exercise: Air Passengers

STEP 1: Plotting

x <- AirPassengers
x_list <- list(x, log(x), diff(log(x)), diff(diff(log(x), 12)))
map(x_list, plot)


STEP 2: ACF and PACF of ddlx

ddlx <- diff(diff(log(x), 12))
acf2(ddlx)

  • Seansonal: ACF cutting off at lag 1s (s = 12); PACF tailing off at lags 1s, 2s, 3s … => SMA(1)
  • Non-Seasonal: ACF and PACF both tailing off => ARMA(1, 1)
  • AS RESULT, SARIMA(1,1,1)(0,1,1)12 model

STEP 3: Fitting

airpass_fit1 <- sarima(log(AirPassengers), p = 1, d = 1, q = 1,
                       P = 0, D = 1, Q = 1, S = 12)
## initial  value -3.085211 
## iter   2 value -3.225399
## iter   3 value -3.276697
## iter   4 value -3.276902
## iter   5 value -3.282134
## iter   6 value -3.282524
## iter   7 value -3.282990
## iter   8 value -3.286319
## iter   9 value -3.286413
## iter  10 value -3.288141
## iter  11 value -3.288262
## iter  12 value -3.288394
## iter  13 value -3.288768
## iter  14 value -3.288969
## iter  15 value -3.289089
## iter  16 value -3.289094
## iter  17 value -3.289100
## iter  17 value -3.289100
## iter  17 value -3.289100
## final  value -3.289100 
## converged
## initial  value -3.288388 
## iter   2 value -3.288459
## iter   3 value -3.288530
## iter   4 value -3.288649
## iter   5 value -3.288753
## iter   6 value -3.288781
## iter   7 value -3.288784
## iter   7 value -3.288784
## iter   7 value -3.288784
## final  value -3.288784 
## converged
## <><><><><><><><><><><><><><>
##  
## Coefficients: 
##      Estimate     SE t.value p.value
## ar1    0.1960 0.2475  0.7921  0.4298
## ma1   -0.5784 0.2132 -2.7127  0.0076
## sma1  -0.5643 0.0747 -7.5544  0.0000
## 
## sigma^2 estimated as 0.001341097 on 128 degrees of freedom 
##  
## AIC = -3.678622  AICc = -3.677179  BIC = -3.59083 
## 

airpass_fit1$ttable
##      Estimate     SE t.value p.value
## ar1    0.1960 0.2475  0.7921  0.4298
## ma1   -0.5784 0.2132 -2.7127  0.0076
## sma1  -0.5643 0.0747 -7.5544  0.0000

Fitting 이후, ar1p.value0.4298로 유의하지 않음 => SARIMA(0,1,1)(0,1,1)12 으로 re-Fitting


STEP 4: re-Fitting

airpass_fit2 <- sarima(log(AirPassengers), p = 0, d = 1, q = 1,
                       P = 0, D = 1, Q = 1, S = 12)
## initial  value -3.086228 
## iter   2 value -3.267980
## iter   3 value -3.279950
## iter   4 value -3.285996
## iter   5 value -3.289332
## iter   6 value -3.289665
## iter   7 value -3.289672
## iter   8 value -3.289676
## iter   8 value -3.289676
## iter   8 value -3.289676
## final  value -3.289676 
## converged
## initial  value -3.286464 
## iter   2 value -3.286855
## iter   3 value -3.286872
## iter   4 value -3.286874
## iter   4 value -3.286874
## iter   4 value -3.286874
## final  value -3.286874 
## converged
## <><><><><><><><><><><><><><>
##  
## Coefficients: 
##      Estimate     SE t.value p.value
## ma1   -0.4018 0.0896 -4.4825       0
## sma1  -0.5569 0.0731 -7.6190       0
## 
## sigma^2 estimated as 0.001348035 on 129 degrees of freedom 
##  
## AIC = -3.690069  AICc = -3.689354  BIC = -3.624225 
## 

airpass_fit2$ttable
##      Estimate     SE t.value p.value
## ma1   -0.4018 0.0896 -4.4825       0
## sma1  -0.5569 0.0731 -7.6190       0

ma1, sma1 계수 모두 유의한 수준을 보이고 있기에 better model


4.2.3 Exercise: Unemployment

STEP 1: Plotting

x <- unemp
x_list <- list(x, diff(x), diff(diff(x, 12)))
map(x_list, plot)


STEP 2: ACF and PACF of ddlx

ddlx <- diff(diff(x), 12)
acf2(ddlx, max.lag = 60)

  • Seansonal: ACF cutting off at lag 1s (s = 12); PACF tailing off at lags 1s, 2s, 3s … => SMA(1)
  • Non-Seasonal: ACF tailing off; PACF cutting off at lags 2 => AR(2)
  • AS RESULT, SARIMA(2,1,0)(0,1,1)12 model

STEP 3: Fitting

unemp_fit <- sarima(unemp, p = 2, d = 1, q = 0,
                       P = 0, D = 1, Q = 1, S = 12)
## initial  value 3.340809 
## iter   2 value 3.105512
## iter   3 value 3.086631
## iter   4 value 3.079778
## iter   5 value 3.069447
## iter   6 value 3.067659
## iter   7 value 3.067426
## iter   8 value 3.067418
## iter   8 value 3.067418
## final  value 3.067418 
## converged
## initial  value 3.065481 
## iter   2 value 3.065478
## iter   3 value 3.065477
## iter   3 value 3.065477
## iter   3 value 3.065477
## final  value 3.065477 
## converged
## <><><><><><><><><><><><><><>
##  
## Coefficients: 
##      Estimate     SE  t.value p.value
## ar1    0.1351 0.0513   2.6326  0.0088
## ar2    0.2464 0.0515   4.7795  0.0000
## sma1  -0.6953 0.0381 -18.2362  0.0000
## 
## sigma^2 estimated as 449.637 on 356 degrees of freedom 
##  
## AIC = 8.991114  AICc = 8.991303  BIC = 9.034383 
## 

unemp_fit$ttable
##      Estimate     SE  t.value p.value
## ar1    0.1351 0.0513   2.6326  0.0088
## ar2    0.2464 0.0515   4.7795  0.0000
## sma1  -0.6953 0.0381 -18.2362  0.0000

Fitting 이후, ar1, ar2, sma1p.value가 모두 유의하므로 good model


4.2.4 Exercise: Birth rate

STEP 1: Plotting

x <- birth
x_list <- list(x, diff(x), diff(diff(x, 12)))
map(x_list, plot)


STEP 2: ACF and PACF of ddlx

ddlx <- diff(diff(x), 12)
acf2(ddlx, max.lag = 60)

  • Seansonal: ACF cutting off at lag 1s (s = 12); PACF tailing off at lags 1s, 2s, 3s … => SMA(1)
  • Non-Seasonal: ACF cutting off at lag 1; PACF tailing off => MA(1)
  • AS RESULT, SARIMA(0,1,1)(0,1,1)12 model

STEP 3: Fitting

birth_fit <- sarima(birth, p = 0, d = 1, q = 1,
                       P = 0, D = 1, Q = 1, S = 12)
## initial  value 2.219164 
## iter   2 value 2.013310
## iter   3 value 1.988107
## iter   4 value 1.980026
## iter   5 value 1.967594
## iter   6 value 1.965384
## iter   7 value 1.965049
## iter   8 value 1.964993
## iter   9 value 1.964992
## iter   9 value 1.964992
## iter   9 value 1.964992
## final  value 1.964992 
## converged
## initial  value 1.951264 
## iter   2 value 1.945867
## iter   3 value 1.945729
## iter   4 value 1.945723
## iter   5 value 1.945723
## iter   5 value 1.945723
## iter   5 value 1.945723
## final  value 1.945723 
## converged
## <><><><><><><><><><><><><><>
##  
## Coefficients: 
##      Estimate     SE  t.value p.value
## ma1   -0.4734 0.0598  -7.9097       0
## sma1  -0.7861 0.0451 -17.4227       0
## 
## sigma^2 estimated as 47.40199 on 358 degrees of freedom 
##  
## AIC = 6.74599  AICc = 6.746084  BIC = 6.778375 
## 

birth_fit$ttable
##      Estimate     SE  t.value p.value
## ma1   -0.4734 0.0598  -7.9097       0
## sma1  -0.7861 0.0451 -17.4227       0

Fitting 이후, 계수는 모두 유의하지만 residualnot WN을 보이고 있음 => AR(1) 추가해서 re-Fitting


STEP 4: re-Fitting

birth_fit2 <- sarima(birth, p = 1, d = 1, q = 1,
                       P = 0, D = 1, Q = 1, S = 12)
## initial  value 2.218186 
## iter   2 value 2.032584
## iter   3 value 1.982464
## iter   4 value 1.975643
## iter   5 value 1.971721
## iter   6 value 1.967284
## iter   7 value 1.963840
## iter   8 value 1.961106
## iter   9 value 1.960849
## iter  10 value 1.960692
## iter  11 value 1.960683
## iter  12 value 1.960675
## iter  13 value 1.960672
## iter  13 value 1.960672
## iter  13 value 1.960672
## final  value 1.960672 
## converged
## initial  value 1.940459 
## iter   2 value 1.934425
## iter   3 value 1.932752
## iter   4 value 1.931750
## iter   5 value 1.931074
## iter   6 value 1.930882
## iter   7 value 1.930860
## iter   8 value 1.930859
## iter   8 value 1.930859
## final  value 1.930859 
## converged
## <><><><><><><><><><><><><><>
##  
## Coefficients: 
##      Estimate     SE  t.value p.value
## ar1    0.3038 0.0865   3.5104   5e-04
## ma1   -0.7006 0.0604 -11.5984   0e+00
## sma1  -0.8000 0.0441 -18.1302   0e+00
## 
## sigma^2 estimated as 45.91462 on 357 degrees of freedom 
##  
## AIC = 6.721818  AICc = 6.722006  BIC = 6.764997 
## 

birth_fit2$ttable
##      Estimate     SE  t.value p.value
## ar1    0.3038 0.0865   3.5104   5e-04
## ma1   -0.7006 0.0604 -11.5984   0e+00
## sma1  -0.8000 0.0441 -18.1302   0e+00

계수와 residual 모두 fitting 되었기에 better model


4.3 Forcasting seasonal ARIMA

4.3.1 Forecasting monthly unemployment : unemp

  • SARIMA(2,1,0)(0,1,1)12
sarima.for(unemp, n.ahead = 36, 2, 1, 0, 0, 1, 1, 12)

## $pred
##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 1979 676.4664 685.1172 653.2388 585.6939 553.8813 664.4072 647.0657 611.0828
## 1980 683.3045 687.7649 654.8658 586.1507 553.9285 664.1108 646.6220 610.5345
## 1981 682.6406 687.0977 654.1968 585.4806 553.2579 663.4398 645.9508 609.8632
##           Sep      Oct      Nov      Dec
## 1979 594.6414 569.3997 587.5801 581.1833
## 1980 594.0427 568.7684 586.9320 580.5249
## 1981 593.3713 568.0970 586.2606 579.8535
## 
## $se
##            Jan       Feb       Mar       Apr       May       Jun       Jul
## 1979  21.20465  32.07710  43.70167  53.66329  62.85364  71.12881  78.73590
## 1980 116.99599 124.17344 131.51281 138.60466 145.49706 152.12863 158.52302
## 1981 194.25167 201.10648 208.17066 215.11503 221.96039 228.64285 235.16874
##            Aug       Sep       Oct       Nov       Dec
## 1979  85.75096  92.28663  98.41329 104.19488 109.67935
## 1980 164.68623 170.63839 176.39520 181.97333 187.38718
## 1981 241.53258 247.74268 253.80549 259.72970 265.52323


4.3.2 How hard is it to forecast commodity prices? : chicken

  • SARIMA(2,1,0)(1,0,0)12
sarima.for(chicken, n.ahead = 60, 2, 1, 0, 1, 0, 0, 12)

## $pred
##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 2016                                                                111.0907
## 2017 110.5358 110.5612 110.5480 110.7055 111.0047 111.1189 111.1552 111.1948
## 2018 111.8108 111.9782 112.1330 112.3431 112.5991 112.7952 112.9661 113.1380
## 2019 114.1331 114.3464 114.5556 114.7827 115.0247 115.2473 115.4617 115.6765
## 2020 116.7942 117.0224 117.2492 117.4819 117.7193 117.9505 118.1790 118.4077
## 2021 119.5651 119.7980 120.0306 120.2650 120.5010 120.7350 120.9681         
##           Sep      Oct      Nov      Dec
## 2016 110.8740 110.6853 110.5045 110.5527
## 2017 111.2838 111.3819 111.4825 111.6572
## 2018 113.3260 113.5168 113.7085 113.9242
## 2019 115.8965 116.1174 116.3386 116.5675
## 2020 118.6380 118.8686 119.0993 119.3326
## 2021                                    
## 
## $se
##             Jan        Feb        Mar        Apr        May        Jun
## 2016                                                                  
## 2017  3.7414959  4.1793190  4.5747009  4.9373266  5.2742129  5.5903499
## 2018  8.2010253  8.5605811  8.9054714  9.2372195  9.5572539  9.8667955
## 2019 12.0038164 12.2921541 12.5738417 12.8492868 13.1188976 13.3830477
## 2020 15.1557253 15.3959082 15.6323906 15.8653300 16.0948844 16.3212022
## 2021 17.8397890 18.0473081 18.2524651 18.4553364 18.6559977 18.8545213
##             Jul        Aug        Sep        Oct        Nov        Dec
## 2016             0.6187194  1.3368594  2.0462419  2.6867986  3.2486625
## 2017  5.8893133  6.2367345  6.6253573  7.0309771  7.4344077  7.8255932
## 2018 10.1668604 10.4736807 10.7857727 11.0980056 11.4063211 11.7085266
## 2019 13.6420693 13.9002670 14.1573839 14.4122197 14.6638269 14.9117124
## 2020 16.5444204 16.7657634 16.9852163 17.2025022 17.4174076 17.6298379
## 2021 19.0509752