Road Casualties in Great Britain 1969–84
DriversKilled | drivers | front | rear | kms | PetrolPrice | VanKilled | law |
---|---|---|---|---|---|---|---|
107 | 1687 | 867 | 269 | 9059 | 0.1029718 | 12 | 0 |
97 | 1508 | 825 | 265 | 7685 | 0.1023630 | 6 | 0 |
102 | 1507 | 806 | 319 | 9963 | 0.1020625 | 12 | 0 |
87 | 1385 | 814 | 407 | 10955 | 0.1008733 | 8 | 0 |
119 | 1632 | 991 | 454 | 11823 | 0.1010197 | 10 | 0 |
106 | 1511 | 945 | 427 | 12391 | 0.1005812 | 13 | 0 |
110 | 1559 | 1004 | 522 | 13460 | 0.1037740 | 11 | 0 |
106 | 1630 | 1091 | 536 | 14055 | 0.1040764 | 6 | 0 |
107 | 1579 | 958 | 405 | 12106 | 0.1037740 | 10 | 0 |
134 | 1653 | 850 | 437 | 11372 | 0.1030264 | 16 | 0 |
# 運転手の死亡および重傷者数
ts.drivers <- Seatbelts[, "drivers"]
# ガソリン価格
ts.petrol_price <- Seatbelts[, "PetrolPrice"]
# シートベルト義務付けの法律
ts.law <- Seatbelts[, "law"]
シートベルト義務化に関する法改正(1983〜)辺りから死亡者数の極端な低下が見られる
gridExtra::grid.arrange(
layout_matrix = matrix(c(1, 2, 3), nrow = 3),
ggplot2::autoplot(ts.drivers, ylab = "drivers"),
ggplot2::autoplot(ts.petrol_price, ylab = "petrol price"),
ggplot2::autoplot(ts.law, ylab = "law")
)
乗法モデルを適用し、対数変換した系列を分析対象とする
自己相関プロットによると1年ごとの周期が現れており季節性を持ったデータであると考えられる
ts.drivers.log <- log(ts.drivers)
forecast::ggtsdisplay(ts.drivers.log)
差分をとる事による定常過程への変換を狙う
単位根検定により和分過程の次数を特定する
arima.param.d <- forecast::ndiffs(ts.drivers.log)
KPSS による検定により当該データは 1
次の和分過程であると判断される
差分系列は定常過程と十分に見なせるようなデータに落ち着いている
ts.drivers.log.diff1 <- diff(ts.drivers.log)
forecast::ggtsdisplay(ts.drivers.log.diff1)
自己回帰和分移動平均を用いたモデルの構築
ARIMA(p,d,q) の次数(p,q)を同定する
※外生変数は matrix でなければならない(Data.Frame だとエラー)事に注意
# 外生変数の定義
# 対数変換しておく
mtx.petrol_law <- matrix(
c(
log(ts.petrol_price),
ts.law
),
nrow = length(ts.law),
ncol = 2,
dimnames = list(NULL, c("petrol_price.log", "law"))
)
model.arima <- forecast::auto.arima(
ts.drivers.log,
xreg = mtx.petrol_law,
d = arima.param.d,
max.p = 3,
max.q = 3,
max.D = 1,
max.P = 2,
max.Q = 2,
seasonal = T,
stepwise = F,
approximation = F,
parallel = T,
num.cores = 4,
trace = T
)
## Tracing model searching in parallel is not supported.
model.arima
## Series: ts.drivers.log
## Regression with ARIMA(0,1,3)(0,1,1)[12] errors
##
## Coefficients:
## ma1 ma2 ma3 sma1 petrol_price.log law
## -0.7056 0.0450 -0.1735 -0.8443 -0.3190 -0.2316
## s.e. 0.0732 0.0967 0.0802 0.0777 0.0963 0.0478
##
## sigma^2 estimated as 0.005656: log likelihood=204.11
## AIC=-394.21 AICc=-393.56 BIC=-371.9
推定されたモデルは ARIMA(0,1,3) + Seasonal(0,1,1)
broom::tidy(model.arima) %>%
knitr::kable(
format = "html",
align = "c"
) %>%
kableExtra::kable_styling(full_width = F, position = "center")
term | estimate | std.error |
---|---|---|
ma1 | -0.7056369 | 0.0732008 |
ma2 | 0.0449858 | 0.0966639 |
ma3 | -0.1734711 | 0.0802363 |
sma1 | -0.8443198 | 0.0777334 |
petrol_price.log | -0.3190037 | 0.0963187 |
law | -0.2315617 | 0.0477688 |
broom::glance(model.arima) %>%
knitr::kable(
format = "html",
align = "c"
) %>%
kableExtra::kable_styling(full_width = F, position = "center")
sigma | logLik | AIC | BIC |
---|---|---|---|
0.0752036 | 204.1055 | -394.211 | -371.8993 |
モデルの残差について自己相関および正規性の検定を行う
自己相関もほぼ無く、目視でも正規乱数と十分にみなせる分布であると思われる
forecast::checkresiduals(model.arima$residuals)
## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.
Ljung-Box 検定
Box.test(model.arima$residuals, type = "Ljung-Box", lag = 20)
##
## Box-Ljung test
##
## data: model.arima$residuals
## X-squared = 19.903, df = 20, p-value = 0.464
帰無仮説は棄却されず、残差には自己相関があるとは言い難い
Jarque-Bera 検定
tseries::jarque.bera.test(model.arima$residuals)
##
## Jarque Bera Test
##
## data: model.arima$residuals
## X-squared = 1.7104, df = 2, p-value = 0.4252
帰無仮説は棄却されず、残差は正規分布に従わないとは言い難い
既存の範囲におけるモデルの当てはまりを確認した後に将来の予測を行う
それなりに精度はよさ気
#ggplot2::autoplot(model.arima)
tibble(
t = 1:(length(ts.drivers)),
original = ts.drivers,
predict = exp(model.arima$fitted)
) %>%
ggplot(aes(x = t)) +
geom_line(aes(y = original), alpha = 1/2) +
geom_line(aes(y = predict), alpha = 1/2, colour = "red")
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
予測を行うに際しガソリン価格(PetrolPrice)の値が不明なのでまずこの値の推定を行う
# 対数変換
ts.petrol_price.log <- log(ts.petrol_price)
# 1次の和分過程
forecast::ndiffs(ts.petrol_price.log)
## [1] 1
model.arima.petrol_price.log <- forecast::auto.arima(
ts.petrol_price.log,
d = 1,
# max.p = 3,
# max.q = 3,
max.order = 8,
seasonal = T,
stepwise = F,
approximation = F,
parallel = T,
num.cores = 4,
trace = T
)
## Tracing model searching in parallel is not supported.
model.arima.petrol_price.log
## Series: ts.petrol_price.log
## ARIMA(5,1,3)
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 ma1 ma2 ma3
## 0.5019 0.7411 -0.9899 -0.2492 0.2882 -0.5023 -0.4469 0.9456
## s.e. 0.0768 0.0831 0.0722 0.0782 0.0722 0.0452 0.0441 0.0558
##
## sigma^2 estimated as 0.0007811: log likelihood=415.07
## AIC=-812.13 AICc=-811.14 BIC=-782.86
pred.petrol_price.log <- forecast::forecast(model.arima.petrol_price.log, h = 12 * 2)
ggplot2::autoplot(pred.petrol_price.log, xlab = NULL, ylab = "PetrolPrice(log)")
mtx.petrol_law.pred <- matrix(
c(
pred.petrol_price.log$mean,
rep(1, 12 * 2)
),
nrow = 12 * 2,
ncol = 2,
dimnames = list(NULL, c("petrol_price.log", "law"))
)
pred.drivers.log <- forecast::forecast(
model.arima,
xreg = mtx.petrol_law.pred,
level = c(95, 70),
h = 12 * 2
)
ggplot2::autoplot(pred.drivers.log, xlab = NULL, ylab = "Drivers(log)")
tibble(
date = seq(lubridate::ymd(19690101), lubridate::ymd(19861231), by = "month"),
drivers = c(ts.drivers, rep(NA, 12 * 2)),
fitted = c(exp(model.arima$fitted), rep(NA, 12 * 2)),
pred.mean = c(rep(NA, 192), exp(pred.drivers.log$mean)),
pred.lower.50 = c(rep(NA, 192), exp(pred.drivers.log$lower[,1])),
pred.lower.95 = c(rep(NA, 192), exp(pred.drivers.log$lower[,2])),
pred.upper.50 = c(rep(NA, 192), exp(pred.drivers.log$upper[,1])),
pred.upper.95 = c(rep(NA, 192), exp(pred.drivers.log$upper[,2]))
) %>%
ggplot(aes(date)) +
geom_line(aes(y = drivers)) +
geom_line(aes(y = fitted), colour = "tomato", linetype = 2, alpha = 2/3) +
geom_line(aes(y = pred.mean), colour = "blue") +
geom_ribbon(aes(ymin = pred.lower.50, ymax = pred.upper.50), alpha = 1/3, fill = "blue") +
geom_ribbon(aes(ymin = pred.lower.95, ymax = pred.upper.95), alpha = 1/5, fill = "blue") +
scale_y_continuous(limits = c(0, 3000), labels = scales::comma) +
labs(
x = NULL,
y = "死傷者数"
) +
theme_gray(base_family = "Osaka")
## Warning: Removed 24 rows containing missing values (geom_path).
## Warning: Removed 24 rows containing missing values (geom_path).
## Warning: Removed 192 rows containing missing values (geom_path).