Plot variable not specified, automatically selected `.vars = Passengers`
ap_log <- ap |>mutate(log_Passengers =log(Passengers))ap_diff<- ap_log|>mutate(d_pass_log =difference(log_Passengers, lag =12) |>difference())ap_diff|>autoplot(d_pass_log)
Warning: Removed 13 rows containing missing values or values outside the scale range
(`geom_line()`).
From the ACF, we see a spike at 1 and 12 with subsequent cutoffs, meaning q=1 and Q=1, for non-seasonal and seasonal, respectively.
From the PACF, we see the same thing, but it tails off instead, meaning MA term is dominant. This means p=0 and P=0, for non-seasonal and seasonal, respectively.
I differenced once, so d, D=1.
This leaves us with ARIMA(0,1,1)(0,1,1)[12]
fit <- ap |>model(manual =ARIMA(log(Passengers) ~pdq(0,1,1) +PDQ(0,1,1)),auto =ARIMA(log(Passengers)) )glance(fit)
Both models identify the same seasonal factor at -0.555, but auto shows AR(2) w/drift rather than MA(1) for the trend component. The BIC is basically equivalent across both models, with auto slightly better than manual. Auto outperforms on AIC, but again, it is minimal.
# A tibble: 5 × 9
Region State Purpose .model term estimate std.error statistic p.value
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Sydney New South W… Holiday manual sar1 -0.585 0.0969 -6.04 5.44e- 8
2 Sydney New South W… Holiday auto ar1 0.210 0.122 1.73 8.81e- 2
3 Sydney New South W… Holiday auto sar1 0.954 0.0674 14.1 1.76e-23
4 Sydney New South W… Holiday auto sma1 -0.833 0.135 -6.15 2.87e- 8
5 Sydney New South W… Holiday auto cons… 20.2 0.882 22.9 6.19e-37
Looks like the manual model actually has a better fit since the AIC and BIC metrics are better than the auto model.
In the manual model, the SAR(1) coefficient is -0.585, suggesting that the current quarter’s trip amount would move in the opposite direction of the previous year’s same quarter.
In the auto model, AR(1) is 0.210, which is not significant (p=.088). SAR(1) is 0.954 meaning there is a positive seasonal persistence from last year. SMA(1) is -0.833 basically cancelling out the SAR component.
fit_3 <- tour |>model(WN =MEAN(Trips),RW =NAIVE(Trips),RWd =NAIVE(Trips ~drift()) )fc_3 <- fit_3 |>forecast(h =12)fc_3 |>autoplot(tour, level =95) +facet_wrap(~ .model, ncol =1) +labs(title ="WN, RW, RW with Drift on Sydney Holiday Trips",subtitle ="WN = mean forecast | RW = flat at last value | RWd = trending",x =NULL, y ="Trips")
Trend:
Not really any trend across all 3 models, so nothing to compare. RWd kind of has a trend in the forecast because the average pushes it down slightly.
Shock:
In white noise models, shock has no effect on the next quarter. In RW models, the shock is reflected in the level change. In the RWd, the shock changes the average, pulling all the data points down slightly.
Memory:
WN has no memory, RW has memory because each data point is reflected in the new level. Same goes for RWd.
Stationarity:
The only stationary series is WN because mean and variance are constant. For RW and RWd, they are both non-stationary because the mean and variance change over time.
Forecast:
The WN band is at a consistent width. The RW is a flat line, but since variance is not constant the band widens as time increases. The difference for RWd is that the mean can drag the band down since the average changes over time along with the variance.