This data set provides information on the number of women murdered each year (per 100,000 standard population) in the U.S.
# load package & set environment
if(!require("pacman")) install.packages("pacman")
pacman::p_load(fpp2, fpp3, dplyr, patchwork, purrr)
data("wmurders")
options(digits = 3)
set.seed(42)
theme_set(theme_minimal())
By studying appropriate graphs of the series in R, find an appropriate ARIMA(p,d,q) model using first difference (d = 1). If there are two equally likely candidate models, then choose the one with a moving average process (MA).
# plot chart
wmurders %>% as_tsibble() %>% gg_tsdisplay(plot_type = "partial")
# plot chart
wmurders %>% diff() %>% as_tsibble() %>% gg_tsdisplay(plot_type = "partial")
The data at first is clearly not stationary due to having trends. We do the first difference (d = 1) based on the instruction. The result looks better. Also, we decide by the ACF and PACF plot to know that the model could be either p or q as 2. Thus, we have two options But, we choose the one with a moving average process (MA) (q = 2) based on the instruction.
Should you include a constant term in the model? Explain your answer.
# plot data
wmurders %>% autoplot() # no drift
wmurders %>% diff() %>% autoplot() # no drift, no trend
No, I should not include a constant term in the model because I do not see any significant drift in the plots.
Fit the model using R and examine the residuals. Is the model satisfactory?
# fit model
fit = wmurders %>% as_tsibble() %>% model(arima = ARIMA(value ~ pdq(0, 1, 2)))
# print report
fit %>% report()
## Series: value
## Model: ARIMA(0,1,2)
##
## Coefficients:
## ma1 ma2
## -0.066 0.371
## s.e. 0.126 0.164
##
## sigma^2 estimated as 0.0422: log likelihood=9.71
## AIC=-13.4 AICc=-12.9 BIC=-7.46
# check residuals
fit %>% gg_tsresiduals()
Yes, the model is satisfactory due to no significant autocorrelation in residuals and distribution of residuals being normal distributed.
Forecast three times ahead and include the results in a table. Also, create a plot of the series with forecasts and prediction intervals for the next three periods shown.
# generate forecast
fit %>% forecast(h = 3) %>% print.data.frame()
## .model index value .mean
## 1 arima 2005 N(2.46, 0.0422) 2.46
## 2 arima 2006 N(2.48, 0.079) 2.48
## 3 arima 2007 N(2.48, 0.151) 2.48
fit %>% forecast(h = 3) %>% autoplot(wmurders)
The table with data and the plot show above.
Does ARIMA() give the same model you have chosen? If not, which model do you think is better?
# weak auto fit model
fit.auto.w = wmurders %>% as_tsibble() %>% model(ARIMA(value ~ pdq()))
fit.auto.w %>% report()
## Series: value
## Model: ARIMA(1,2,1)
##
## Coefficients:
## ar1 ma1
## -0.243 -0.826
## s.e. 0.155 0.114
##
## sigma^2 estimated as 0.04632: log likelihood=6.44
## AIC=-6.88 AICc=-6.39 BIC=-0.97
# strong auto fit model
fit.auto.s = wmurders %>% as_tsibble() %>% model(ARIMA(value ~ pdq(),
stepwise = F,
approximation = F))
fit.auto.s %>% report()
## Series: value
## Model: ARIMA(0,2,3)
##
## Coefficients:
## ma1 ma2 ma3
## -1.015 0.432 -0.322
## s.e. 0.128 0.228 0.174
##
## sigma^2 estimated as 0.04475: log likelihood=7.77
## AIC=-7.54 AICc=-6.7 BIC=0.35
# compare result
# fit AICc=-12.9 ARIMA(0,1,2)
# fit.auto.w AICc=-6.39 ARIMA(1,2,1)
# fit.auto.s AICc=-6.70 ARIMA(0,2,3)
No, the ARIMA() function in either way of no restriction (fit.auto.w) or yes restriction (fit.auto.s) gives different models as chosen before (fit). The chosen before one (fit) is the best one with the lowest AICc score.