HW5_624.rmd

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

#CH8 Q1:
#A.
vic_pigs <- aus_livestock |>
  filter(State == "Victoria",
         Animal == "Pigs")

fit_vic_pigs <- vic_pigs |>
  model(
    SES = ETS(Count ~ error("A") + trend("N") + season("N"))
  )

report(fit_vic_pigs)

## Series: Count 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.3221247 
## 
##   Initial states:
##      l[0]
##  100646.6
## 
##   sigma^2:  87480760
## 
##      AIC     AICc      BIC 
## 13737.10 13737.14 13750.07

fc_vic_pigs <- fit_vic_pigs |>
  forecast(h = 4)

autoplot(fc_vic_pigs, vic_pigs)

#B.
resid <- residuals(fit_vic_pigs)
s <- sd(resid$.resid, na.rm = TRUE)

first_forecast <- fc_vic_pigs$.mean[1]

lower <- first_forecast - 1.96 * s
upper <- first_forecast + 1.96 * s

lower

## [1] 76871.01

upper

## [1] 113502.1

fc_vic_pigs |> hilo(level = 95)

## # A tsibble: 4 x 7 [1M]
## # Key:       Animal, State, .model [1]
##   Animal State    .model    Month
##   <fct>  <fct>    <chr>     <mth>
## 1 Pigs   Victoria SES    2019 Jan
## 2 Pigs   Victoria SES    2019 Feb
## 3 Pigs   Victoria SES    2019 Mar
## 4 Pigs   Victoria SES    2019 Apr
## # ℹ 3 more variables: Count <dist>, .mean <dbl>, `95%` <hilo>

#In this example we can see the manually computed using s as the sd of residuals for the lower and upper intervals are (76871,113502) for January 2019. This is insanely close to the 95% CI (76855, 113518) produced by our models first projection. This shows that the ETS model to compute uncertainty was off by about 16 pigs which is far less than a 1% difference from the manual approximation.

#Q5
#A.
italy_exports <- global_economy |>
  filter(Country == "Italy")

autoplot(italy_exports, Exports) +
  labs(
    title = "Annual Exports for Italy",
    y = "Exports (% of GDP)",
    x = "Year"
  )

#We can see this is yearly data so we don't worry about seasonality. We still can observe a consistent growth with some strong inflections that occur from ~1980 to 1990 where it takes a strong change of course towards growth. The other large movement was right before 2010 which was the global financial crisis. This would be an expected inflection but it is followed by an extremely strong consistent growth going to 2017. The fact that there are changes in the direction of the mdoel definitely suggests that a trend model would perform better.

#B.
model_fits_italy <- italy_exports |>
  model(
    ANN = ETS(Exports ~ error("A") + trend("N") + season("N")),
    AAN = ETS(Exports ~ error("A") + trend("A") + season("N"))
  )
report(model_fits_italy)

## Warning in report.mdl_df(model_fits_italy): Model reporting is only supported
## for individual models, so a glance will be shown. To see the report for a
## specific model, use `select()` and `filter()` to identify a single model.

## # A tibble: 2 × 10
##   Country .model sigma2 log_lik   AIC  AICc   BIC   MSE  AMSE   MAE
##   <fct>   <chr>   <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Italy   ANN      1.85   -135.  275.  275.  281.  1.78  3.73 1.00 
## 2 Italy   AAN      1.80   -133.  276.  277.  286.  1.68  3.24 0.934

fc_italy <- model_fits_italy |>
  forecast(h = 10)

autoplot(fc_italy, italy_exports) +
  labs(
    title = "Italy Exports Forecasts from ETS(A,N,N) and ETS(A,A,N)",
    y = "Exports (% of GDP)",
    x = "Year"
  )

#C.
accuracy(model_fits_italy)

## # A tibble: 2 × 11
##   Country .model .type          ME  RMSE   MAE    MPE  MAPE  MASE RMSSE     ACF1
##   <fct>   <chr>  <chr>       <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>    <dbl>
## 1 Italy   ANN    Training  0.324    1.34 1.00   1.39   4.75 0.983 0.991 -0.00701
## 2 Italy   AAN    Training -0.00841  1.30 0.934 -0.291  4.47 0.916 0.962  0.0375

accuracy(model_fits_italy) |>
  select(.model, RMSE)

## # A tibble: 2 × 2
##   .model  RMSE
##   <chr>  <dbl>
## 1 ANN     1.34
## 2 AAN     1.30

#D
#As expected the rmse of the RMSE of the AAN model was better. This highlights the value of adding the trend to improve predictability, the extra parameter helps us track trend and could produce better forecasting.This type of economic data is likely to have trends for a country like Italy but also follow a long term growth trend as well.

#E. I would argue that the longer we forecast, it is not a guarantee that AAN model would actually outperform the ANN model considering how strong the long term pattern looks here. Still for the term we have defined, I would suggest the trend model is better from the lower MAE and RMSE.

#F
acc_italy_models <- accuracy(model_fits_italy) |>
  filter(.type == "Training") |>
  select(.model, RMSE)

fc_first <- fc_italy |>
  as_tibble() |>
  group_by(.model) |>
  slice(1) |>
  ungroup()

manual_intervals_italy <- fc_first |>
  left_join(acc_italy_models, by = ".model") |>
  mutate(
    lower = .mean - 1.96 * RMSE,
    upper = .mean + 1.96 * RMSE
  ) |>
  select(.model, Year, .mean, RMSE, lower, upper)

manual_intervals_italy

## # A tibble: 2 × 6
##   .model  Year .mean  RMSE lower upper
##   <chr>  <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAN     2018  31.6  1.30  29.0  34.1
## 2 ANN     2018  31.3  1.34  28.7  33.9

#Here the interval for ANN is (28.65,33.91) while for AAN it is (29.04,34.11). The slightly higher interval is probably more reliable as it is smaller which is in support of the lower RMSE value and thus is the preferred model here. It would be interesting to see if this trend persists over the 8 years of missing datapoints, and compare the accuracy.

#Q6
china_gdp_fc <- global_economy |>
  filter(Country == "China")

autoplot(china_gdp_fc, GDP) +
  labs(
    title = "China GDP",
    y = "GDP",
    x = "Year"
  )

model_fits_china <- china_gdp_fc |>
  model(
    ETS_standard = ETS(GDP),
    ETS_damped = ETS(GDP ~ trend("Ad")),
    ETS_boxcox = ETS(box_cox(GDP, 0.3)),
    ETS_damped_boxcox = ETS(box_cox(GDP, 0.3) ~ trend("Ad"))
  )

fc_china <- model_fits_china |>
  forecast(h = 20)

autoplot(fc_china, china_gdp_fc) +
  labs(title = "China GDP Forecasts")

#Here I compared the different ETS  models forecasts on the Chinese GDP dataset. This data shows a strong trend of growth but very little downward momentum so forecasting may be strong. The damped trend flattens the long ter projection of growth from the standard model which may be more realistic. The Box-Cox transformation is meant to help stabilize variance and thus gives a smooth forecast that also has very controlled growth. More similar to the damped forecast, the Box_cox slows growth over time which may be reasonable for something like GDP growth.

#Q7
aus_gas <- aus_production |>
  select(Quarter, Gas)

#fit
aus_gas_models <- aus_gas |>
  model(
    additive      = ETS(Gas ~ error("A") + trend("A") + season("A")),
    multiplicative = ETS(Gas ~ error("A") + trend("A") + season("M")),
    damped        = ETS(Gas ~ error("A") + trend("Ad") + season("M"))
  )

# forecast
fc_gas <- aus_gas_models |>
  forecast(h = 24)

# Plot
autoplot(aus_gas) +
  autolayer(fc_gas, .vars = TRUE) +   # adds forecast series + PI ribbons
  labs(
    title = "Australian Gas Production",
    y = "Gas production",
    x = "Year"
  )

## Plot variable not specified, automatically selected `.vars = Gas`

## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), :
## Ignoring unknown parameters: `.vars`

## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `.vars`

#The gas production data shows very obvious seasonality as there is a huge drop from Q4->Q1 of every year we see. It is obvious that the seasonality grows with the prodcution growth trend so this immediately suggests to us that the multiplicative model will be better than the additive.It looks like th emultiplicative model does a good job capturing seasonality but the damped model is a good question if we had some reason to assume the growth rate would smooth out. From this span of data it looks like the damped model may not be best but perhaps this could only be verified by some other situation being considered such as the switch to renewable energies or some discovery of massive australian gas reserves could make the damped model better or worse.

#Q8
set.seed(111)
#A. Retail turnover series are shown to have seasonality which increases with the level of the series, so multiplicative seasonality is more appropriately fit than additive seasonality.
my_retail <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1))

myseries_train <- my_retail |>
  filter(year(Month) < 2011)

myseries_test <- my_retail |>
  filter(year(Month) >= 2011)

autoplot(my_retail, Turnover) +
  labs(title = "Retail turnover series", y = "Turnover")

#B.
# Holt-Winters multiplicative models using damped ad trend option and multiplicative seasonality
hw_models <- myseries_train |>
  model(
    HW_M  = ETS(Turnover ~ error("M") + trend("A") + season("M")),
    HW_MD = ETS(Turnover ~ error("M") + trend("Ad") + season("M"))
  )

#C.
# Training RMSE
train_acc <- accuracy(hw_models)
train_acc |>
  select(.model, RMSE)

## # A tibble: 2 × 2
##   .model  RMSE
##   <chr>  <dbl>
## 1 HW_M    12.7
## 2 HW_MD   12.4

#Here we would take the non-damped version of our model as it does not produce a lower RMSE so I would say the HW multiplicative model is not the better of the two simply from this

#D/E
# Forecast on test set and SNAIVE models
fc_hw <- hw_models |>
  forecast(new_data = myseries_test)

test_acc_hw <- accuracy(fc_hw, myseries_test)
test_acc_hw |>
  select(.model, RMSE)

## # A tibble: 2 × 2
##   .model  RMSE
##   <chr>  <dbl>
## 1 HW_M    67.1
## 2 HW_MD   74.8

snaive_model <- myseries_train |>
  model(SNAIVE = SNAIVE(Turnover))

fc_snaive <- snaive_model |>
  forecast(new_data = myseries_test)

test_acc_snaive <- accuracy(fc_snaive, myseries_test)
test_acc_snaive |>
  select(.model, RMSE)

## # A tibble: 1 × 2
##   .model  RMSE
##   <chr>  <dbl>
## 1 SNAIVE  116.

# Best HW model residuals
best_model <- myseries_train |>
  model(
    BEST = ETS(Turnover ~ error("M") + trend("Ad") + season("M"))
  )

gg_tsresiduals(best_model)

## Warning: `gg_tsresiduals()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsresiduals()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

augment(best_model) |>
  features(.innov, ljung_box, lag = 24, dof = 5)

## # A tibble: 1 × 5
##   State    Industry                                     .model lb_stat lb_pvalue
##   <chr>    <chr>                                        <chr>    <dbl>     <dbl>
## 1 Victoria Cafes, restaurants and takeaway food servic… BEST      28.4    0.0755

# The HW multiplicative model test RMSE of 67.05, is substantially lower than the seasonal SNAIVE benchmark RMSE of 116.4. This indicates that the HW model captures the trend and seasonal value, which is in contradiciton to our training RMSE.The ACF function shows most spikes in the 95% CI, suggesting little autocorrelation.
#Residual diagnostics show us the preferred model though. The Ljung-Box test returned a p-value of 0.07, which is greater than 0.05. so we fail to reject the null hypothesis of no residual autocorrelation.That is why the residuals behave visibly like white noise and we can assume the model has captured most of the structure in the data.
#Q9
# STL + Box-Cox + ETS
stl_ets_model <- myseries_train |>
  model(
    STL_ETS = decomposition_model(
      STL(box_cox(Turnover, 0) ~ season(window = "periodic")),
      ETS(season_adjust ~ error("A") + trend("A") + season("N"))
    )
  )

fc_stl_ets <- stl_ets_model |>
  forecast(new_data = myseries_test)

test_acc_stl <- accuracy(fc_stl_ets, myseries_test)


# Final comparison
bind_rows(
  test_acc_hw |>
    select(.model, RMSE),
  test_acc_snaive |>
    select(.model, RMSE),
  test_acc_stl |>
    select(.model, RMSE)
)

## # A tibble: 4 × 2
##   .model   RMSE
##   <chr>   <dbl>
## 1 HW_M     67.1
## 2 HW_MD    74.8
## 3 SNAIVE  116. 
## 4 STL_ETS 224.

#Here we have a final comparsion of all our RMSE values and we can conclude that the multiplicative model is still our best. A more complex model could improve sample fit but perform worse out of sample. Model selection is therefore be based on forecasting performance rather than training accuracy alone. When using an STL decomposition applied to a Box-Cox transformed version of the series followed by ETS modeling of the seasonally adjusted data we still could not beat the RMSE from our non-damped HW model.

HW5_624.rmd

2026-03-06

R Markdown

Including Plots