ARIMA による時系列分析の流れ

イギリスの交通事故死亡者数(運転手)
データの可視化
ARIMA
- モデルの同定
- 残差チェック
モデルの当てはめ
モデルによる予測
- ガソリン価格の推定
- モデルによる予測
参考文献

イギリスの交通事故死亡者数(運転手)

Road Casualties in Great Britain 1969–84

drivers: 死傷者数(ドライバー)
PetrolPrice: ガソリンの値段
law: シートベルト義務化フラグ(1983.01.31)

DriversKilled	drivers	front	rear	kms	PetrolPrice	VanKilled
107	1687	867	269	9059	0.1029718	12
97	1508	825	265	7685	0.1023630	6
102	1507	806	319	9963	0.1020625	12
87	1385	814	407	10955	0.1008733	8
119	1632	991	454	11823	0.1010197	10
106	1511	945	427	12391	0.1005812	13
110	1559	1004	522	13460	0.1037740	11
106	1630	1091	536	14055	0.1040764	6
107	1579	958	405	12106	0.1037740	10
134	1653	850	437	11372	0.1030264	16

# 運転手の死亡および重傷者数
ts.drivers <- Seatbelts[, "drivers"]

# ガソリン価格
ts.petrol_price <- Seatbelts[, "PetrolPrice"]

# シートベルト義務付けの法律
ts.law <- Seatbelts[, "law"]

データの可視化

原系列

シートベルト義務化に関する法改正(1983〜)辺りから死亡者数の極端な低下が見られる

gridExtra::grid.arrange(
  layout_matrix = matrix(c(1, 2, 3), nrow = 3),
  ggplot2::autoplot(ts.drivers, ylab = "drivers"),
  ggplot2::autoplot(ts.petrol_price, ylab = "petrol price"),
  ggplot2::autoplot(ts.law, ylab = "law")
)

対数化された系列

乗法モデルを適用し、対数変換した系列を分析対象とする
自己相関プロットによると1年ごとの周期が現れており季節性を持ったデータであると考えられる

ts.drivers.log <- log(ts.drivers)
forecast::ggtsdisplay(ts.drivers.log)

差分系列

差分をとる事による定常過程への変換を狙う

単位根の検定(KPSS)

単位根検定により和分過程の次数を特定する

arima.param.d <- forecast::ndiffs(ts.drivers.log)

KPSS による検定により当該データは 1 次の和分過程であると判断される

差分系列の可視化

差分系列は定常過程と十分に見なせるようなデータに落ち着いている

ts.drivers.log.diff1 <- diff(ts.drivers.log)
forecast::ggtsdisplay(ts.drivers.log.diff1)

ARIMA

自己回帰和分移動平均を用いたモデルの構築

モデルの同定

ARIMA(p,d,q) の次数(p,q)を同定する

※外生変数は matrix でなければならない(Data.Frame だとエラー)事に注意

# 外生変数の定義
# 対数変換しておく
mtx.petrol_law <- matrix(
  c(
    log(ts.petrol_price),
    ts.law
  ),
  nrow = length(ts.law),
  ncol = 2,
  dimnames = list(NULL, c("petrol_price.log", "law"))
)

model.arima <- forecast::auto.arima(
  ts.drivers.log,
  xreg = mtx.petrol_law,
  d = arima.param.d,
  max.p = 3,
  max.q = 3,
  max.D = 1,
  max.P = 2,
  max.Q = 2,
  seasonal = T,
  stepwise = F,
  approximation = F,
  parallel = T,
  num.cores = 4,
  trace = T
)

## Tracing model searching in parallel is not supported.

model.arima

## Series: ts.drivers.log 
## Regression with ARIMA(0,1,3)(0,1,1)[12] errors 
## 
## Coefficients:
##           ma1     ma2      ma3     sma1  petrol_price.log      law
##       -0.7056  0.0450  -0.1735  -0.8443           -0.3190  -0.2316
## s.e.   0.0732  0.0967   0.0802   0.0777            0.0963   0.0478
## 
## sigma^2 estimated as 0.005656:  log likelihood=204.11
## AIC=-394.21   AICc=-393.56   BIC=-371.9

推定されたモデルは ARIMA(0,1,3) + Seasonal(0,1,1)

推定パラメータ

broom::tidy(model.arima) %>%

  knitr::kable(
    format = "html",
    align = "c"
  ) %>%
  kableExtra::kable_styling(full_width = F, position = "center")

term	estimate	std.error
ma1	-0.7056369	0.0732008
ma2	0.0449858	0.0966639
ma3	-0.1734711	0.0802363
sma1	-0.8443198	0.0777334
petrol_price.log	-0.3190037	0.0963187
law	-0.2315617	0.0477688

モデル評価

broom::glance(model.arima) %>%
  knitr::kable(
    format = "html",
    align = "c"
  ) %>%
  kableExtra::kable_styling(full_width = F, position = "center")

sigma	logLik	AIC	BIC
0.0752036	204.1055	-394.211	-371.8993

残差チェック

モデルの残差について自己相関および正規性の検定を行う

残差の可視化

自己相関もほぼ無く、目視でも正規乱数と十分にみなせる分布であると思われる

forecast::checkresiduals(model.arima$residuals)

## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.

残差の自己相関検定

Ljung-Box 検定

Box.test(model.arima$residuals, type = "Ljung-Box", lag = 20)

## 
##  Box-Ljung test
## 
## data:  model.arima$residuals
## X-squared = 19.903, df = 20, p-value = 0.464

帰無仮説は棄却されず、残差には自己相関があるとは言い難い

残差の正規性検定

Jarque-Bera 検定

tseries::jarque.bera.test(model.arima$residuals)

## 
##  Jarque Bera Test
## 
## data:  model.arima$residuals
## X-squared = 1.7104, df = 2, p-value = 0.4252

帰無仮説は棄却されず、残差は正規分布に従わないとは言い難い

モデルの当てはめ

既存の範囲におけるモデルの当てはまりを確認した後に将来の予測を行う

黒: 原系列
赤: モデルによる推定

それなりに精度はよさ気

#ggplot2::autoplot(model.arima)

tibble(
  t = 1:(length(ts.drivers)),
  original = ts.drivers,
  predict  = exp(model.arima$fitted)
) %>%
  ggplot(aes(x = t)) +
    geom_line(aes(y = original), alpha = 1/2) +
    geom_line(aes(y = predict), alpha = 1/2, colour = "red")

## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.

モデルによる予測

ガソリン価格の推定

予測を行うに際しガソリン価格(PetrolPrice)の値が不明なのでまずこの値の推定を行う

# 対数変換
ts.petrol_price.log <- log(ts.petrol_price)

# 1次の和分過程
forecast::ndiffs(ts.petrol_price.log)

## [1] 1

model.arima.petrol_price.log <- forecast::auto.arima(
  ts.petrol_price.log,
  d = 1,
#  max.p = 3,
#  max.q = 3,
  max.order = 8,
  seasonal = T,
  stepwise = F,
  approximation = F,
  parallel = T,
  num.cores = 4,
  trace = T
)

## Tracing model searching in parallel is not supported.

model.arima.petrol_price.log

## Series: ts.petrol_price.log 
## ARIMA(5,1,3) 
## 
## Coefficients:
##          ar1     ar2      ar3      ar4     ar5      ma1      ma2     ma3
##       0.5019  0.7411  -0.9899  -0.2492  0.2882  -0.5023  -0.4469  0.9456
## s.e.  0.0768  0.0831   0.0722   0.0782  0.0722   0.0452   0.0441  0.0558
## 
## sigma^2 estimated as 0.0007811:  log likelihood=415.07
## AIC=-812.13   AICc=-811.14   BIC=-782.86

pred.petrol_price.log <- forecast::forecast(model.arima.petrol_price.log, h = 12 * 2)

ggplot2::autoplot(pred.petrol_price.log, xlab = NULL, ylab = "PetrolPrice(log)")

モデルによる予測

mtx.petrol_law.pred <- matrix(
  c(
    pred.petrol_price.log$mean,
    rep(1, 12 * 2)
  ),
  nrow = 12 * 2,
  ncol = 2,
  dimnames = list(NULL, c("petrol_price.log", "law"))
)

pred.drivers.log <- forecast::forecast(
  model.arima,
  xreg = mtx.petrol_law.pred,
  level = c(95, 70),
  h = 12 * 2
)

対数系列

ggplot2::autoplot(pred.drivers.log, xlab = NULL, ylab = "Drivers(log)")

原系列

tibble(
  date = seq(lubridate::ymd(19690101), lubridate::ymd(19861231), by = "month"),
  drivers       = c(ts.drivers, rep(NA, 12 * 2)),
  fitted        = c(exp(model.arima$fitted), rep(NA, 12 * 2)),
  pred.mean     = c(rep(NA, 192), exp(pred.drivers.log$mean)),
  pred.lower.50 = c(rep(NA, 192), exp(pred.drivers.log$lower[,1])),
  pred.lower.95 = c(rep(NA, 192), exp(pred.drivers.log$lower[,2])),
  pred.upper.50 = c(rep(NA, 192), exp(pred.drivers.log$upper[,1])),
  pred.upper.95 = c(rep(NA, 192), exp(pred.drivers.log$upper[,2]))
) %>%
  ggplot(aes(date)) +
    geom_line(aes(y = drivers)) +
    geom_line(aes(y = fitted), colour = "tomato", linetype = 2, alpha = 2/3) +
    geom_line(aes(y = pred.mean), colour = "blue") +
    geom_ribbon(aes(ymin = pred.lower.50, ymax = pred.upper.50), alpha = 1/3, fill = "blue") +
    geom_ribbon(aes(ymin = pred.lower.95, ymax = pred.upper.95), alpha = 1/5, fill = "blue") +
    scale_y_continuous(limits = c(0, 3000), labels = scales::comma) +
    labs(
      x = NULL,
      y = "死傷者数"
    ) +
    theme_gray(base_family = "Osaka")

## Warning: Removed 24 rows containing missing values (geom_path).

## Warning: Removed 24 rows containing missing values (geom_path).

## Warning: Removed 192 rows containing missing values (geom_path).

参考文献

馬場真哉(2018) 『時系列分析と状態空間モデルの基礎』プレアデス出版.

ARIMA による時系列分析の流れ

しまじろう

2019-04-28

イギリスの交通事故死亡者数(運転手)

データの可視化

原系列

対数化された系列

差分系列

単位根の検定(KPSS)

差分系列の可視化

ARIMA

モデルの同定

推定パラメータ

モデル評価

残差チェック

残差の可視化

残差の自己相関検定

残差の正規性検定

モデルの当てはめ

モデルによる予測

ガソリン価格の推定

モデルによる予測

対数系列

原系列

参考文献