Discussion 8

Author

Ethan Wright

1.

library(fpp3)

Warning: package 'fpp3' was built under R version 4.5.2

Registered S3 method overwritten by 'tsibble':
  method               from 
  as_tibble.grouped_df dplyr

── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──

✔ tibble      3.3.0     ✔ tsibble     1.1.6
✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
✔ tidyr       1.3.1     ✔ feasts      0.4.2
✔ lubridate   1.9.4     ✔ fable       0.5.0
✔ ggplot2     3.5.2

Warning: package 'tsibble' was built under R version 4.5.2

Warning: package 'tsibbledata' was built under R version 4.5.2

Warning: package 'feasts' was built under R version 4.5.2

Warning: package 'fabletools' was built under R version 4.5.2

Warning: package 'fable' was built under R version 4.5.2

── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
✖ lubridate::date()    masks base::date()
✖ dplyr::filter()      masks stats::filter()
✖ tsibble::intersect() masks base::intersect()
✖ tsibble::interval()  masks lubridate::interval()
✖ dplyr::lag()         masks stats::lag()
✖ tsibble::setdiff()   masks base::setdiff()
✖ tsibble::union()     masks base::union()

library(patchwork)

ap <- AirPassengers |>
  as_tsibble() |> 
  rename(Passengers = value) |>
  mutate(
    Month   = yearmonth(index) 
  ) |>
  as_tsibble(index = Month)

autoplot(ap)

Plot variable not specified, automatically selected `.vars = Passengers`

ap_log <- ap |>
  mutate(log_Passengers = log(Passengers))

ap_diff<- ap_log|>
  mutate(d_pass_log = difference(log_Passengers, lag = 12) |> 
                     difference())
ap_diff|> autoplot(d_pass_log)

Warning: Removed 13 rows containing missing values or values outside the scale range
(`geom_line()`).

acf<-ap_diff |> ACF(d_pass_log, lag_max = 36) |> autoplot()

pacf<-ap_diff |> PACF(d_pass_log, lag_max = 36) |> autoplot()

acf/pacf

From the ACF, we see a spike at 1 and 12 with subsequent cutoffs, meaning q=1 and Q=1, for non-seasonal and seasonal, respectively.

From the PACF, we see the same thing, but it tails off instead, meaning MA term is dominant. This means p=0 and P=0, for non-seasonal and seasonal, respectively.

I differenced once, so d, D=1.

This leaves us with ARIMA(0,1,1)(0,1,1)[12]

fit <- ap |>
  model(
    manual = ARIMA(log(Passengers) ~ pdq(0,1,1) + PDQ(0,1,1)),
    auto   = ARIMA(log(Passengers))
  )
glance(fit)

# A tibble: 2 × 8
  .model  sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots  
  <chr>    <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>    
1 manual 0.00137    245. -483. -483. -475. <cpl [0]> <cpl [13]>
2 auto   0.00132    250. -489. -489. -475. <cpl [2]> <cpl [12]>

report(fit |> select(auto))

Series: Passengers 
Model: ARIMA(2,0,0)(0,1,1)[12] w/ drift 
Transformation: log(Passengers) 

Coefficients:
         ar1     ar2     sma1  constant
      0.5754  0.2614  -0.5553    0.0193
s.e.  0.0843  0.0842   0.0771    0.0015

sigma^2 estimated as 0.001323:  log likelihood=249.65
AIC=-489.29   AICc=-488.82   BIC=-474.88

the auto() model returns ARIMA(2,0,0)(0,1,1)[12] w/drift

tidy(fit)

# A tibble: 6 × 6
  .model term     estimate std.error statistic  p.value
  <chr>  <chr>       <dbl>     <dbl>     <dbl>    <dbl>
1 manual ma1       -0.402    0.0896      -4.48 1.59e- 5
2 manual sma1      -0.557    0.0731      -7.62 4.54e-12
3 auto   ar1        0.575    0.0843       6.83 2.83e-10
4 auto   ar2        0.261    0.0842       3.11 2.33e- 3
5 auto   sma1      -0.555    0.0771      -7.21 3.99e-11
6 auto   constant   0.0193   0.00149     13.0  2.07e-25

Both models identify the same seasonal factor at -0.555, but auto shows AR(2) w/drift rather than MA(1) for the trend component. The BIC is basically equivalent across both models, with auto slightly better than manual. Auto outperforms on AIC, but again, it is minimal.

2.

tour<-tourism|>
  filter(Region == "Sydney", Purpose == "Holiday")
tour|>autoplot(Trips) +
  labs(title = "Sydney Holiday Trips (quarterly)")

tour_diff<- tour |>
  mutate(d_trips = difference(Trips, lag = 4))

tour_diff|>autoplot(d_trips) +
  labs(title = "Sydney Holiday Trips (quarterly)")

Warning: Removed 4 rows containing missing values or values outside the scale range
(`geom_line()`).

acf2<-tour_diff |> ACF(d_trips, lag_max = 40) |> autoplot()
pacf2<-tour_diff |> PACF(d_trips, lag_max = 40) |> autoplot()

acf2/pacf2

Model should be ARIMA(0,1,0)(1,1,0)[4] since AR seems to dominate due to tailing ACF.

fit_tour <- tour |>
  model(
    manual = ARIMA(Trips ~ pdq(0,1,0) + PDQ(1,1,0, period=4)),
    auto   = ARIMA(Trips)
  )
glance(fit_tour)

# A tibble: 2 × 11
  Region State Purpose .model sigma2 log_lik   AIC  AICc   BIC ar_roots ma_roots
  <chr>  <chr> <chr>   <chr>   <dbl>   <dbl> <dbl> <dbl> <dbl> <list>   <list>  
1 Sydney New … Holiday manual 11487.   -457.  919.  919.  923. <cpl>    <cpl>   
2 Sydney New … Holiday auto    5732.   -458.  927.  928.  939. <cpl>    <cpl>

report(fit_tour |> select(auto))

Series: Trips 
Model: ARIMA(1,0,0)(1,0,1)[4] w/ mean 

Coefficients:
         ar1    sar1     sma1  constant
      0.2102  0.9541  -0.8330   20.2379
s.e.  0.1218  0.0674   0.1354    0.8821

sigma^2 estimated as 5732:  log likelihood=-458.49
AIC=926.99   AICc=927.8   BIC=938.9

tidy(fit_tour)

# A tibble: 5 × 9
  Region State        Purpose .model term  estimate std.error statistic  p.value
  <chr>  <chr>        <chr>   <chr>  <chr>    <dbl>     <dbl>     <dbl>    <dbl>
1 Sydney New South W… Holiday manual sar1    -0.585    0.0969     -6.04 5.44e- 8
2 Sydney New South W… Holiday auto   ar1      0.210    0.122       1.73 8.81e- 2
3 Sydney New South W… Holiday auto   sar1     0.954    0.0674     14.1  1.76e-23
4 Sydney New South W… Holiday auto   sma1    -0.833    0.135      -6.15 2.87e- 8
5 Sydney New South W… Holiday auto   cons…   20.2      0.882      22.9  6.19e-37

Looks like the manual model actually has a better fit since the AIC and BIC metrics are better than the auto model.

In the manual model, the SAR(1) coefficient is -0.585, suggesting that the current quarter’s trip amount would move in the opposite direction of the previous year’s same quarter.

In the auto model, AR(1) is 0.210, which is not significant (p=.088). SAR(1) is 0.954 meaning there is a positive seasonal persistence from last year. SMA(1) is -0.833 basically cancelling out the SAR component.

3.

Distribution

\[ \varepsilon_t \overset{iid}{\sim} N(0, \sigma^2) \]

White Noise

\[ y_t = \varepsilon_t \]

Random Walk

\[ y_t = y_{t-1} + \varepsilon_t \]

Random Walk w/ Drift

\[ y_t = \mu + y_{t-1} + \varepsilon_t \]

Relationship Through Differencing

\[ \Delta y_t = y_t - y_{t-1} = \varepsilon_t \quad \text{(RW} \rightarrow \text{WN)} \]

\[ \Delta y_t = y_t - y_{t-1} = \mu + \varepsilon_t \quad \text{(RWd} \rightarrow \text{WN + constant)} \]

fit_3 <- tour |>
  model(
    WN   = MEAN(Trips),
    RW   = NAIVE(Trips),
    RWd  = NAIVE(Trips ~ drift())
  )

fc_3 <- fit_3 |>
  forecast(h = 12)

fc_3 |>
  autoplot(tour, level = 95) +
  facet_wrap(~ .model, ncol = 1) +
  labs(title    = "WN, RW, RW with Drift on Sydney Holiday Trips",
       subtitle = "WN = mean forecast | RW = flat at last value | RWd = trending",
       x = NULL, y = "Trips")

Trend:

Not really any trend across all 3 models, so nothing to compare. RWd kind of has a trend in the forecast because the average pushes it down slightly.

Shock:

In white noise models, shock has no effect on the next quarter. In RW models, the shock is reflected in the level change. In the RWd, the shock changes the average, pulling all the data points down slightly.

Memory:

WN has no memory, RW has memory because each data point is reflected in the new level. Same goes for RWd.

Stationarity:

The only stationary series is WN because mean and variance are constant. For RW and RWd, they are both non-stationary because the mean and variance change over time.

Forecast:

The WN band is at a consistent width. The RW is a flat line, but since variance is not constant the band widens as time increases. The difference for RWd is that the mean can drag the band down since the average changes over time along with the variance.