Data624

library(fpp3)

## ── Attaching packages ────────────────────────────────────────────── fpp3 0.5 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.4
## ✔ dplyr       1.1.3     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.0     ✔ feasts      0.3.1
## ✔ lubridate   1.9.3     ✔ fable       0.3.3
## ✔ ggplot2     3.4.4     ✔ fabletools  0.3.4

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0     ✔ readr   2.1.4
## ✔ purrr   1.0.2     ✔ stringr 1.5.0

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()     masks stats::filter()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag()        masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(caret)

## Loading required package: lattice
## 
## Attaching package: 'caret'
## 
## The following object is masked from 'package:purrr':
## 
##     lift
## 
## The following objects are masked from 'package:fabletools':
## 
##     MAE, RMSE

library(forecast)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

library(MASS)

## 
## Attaching package: 'MASS'
## 
## The following object is masked from 'package:dplyr':
## 
##     select

Do exercises 8.1, 8.5, 8.6, 8.7, 8.8, 8.9 in Hyndman.

8.1 Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.

Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of α and ℓ0 , and generate forecasts for the next four months.

vic_pigs <- aus_livestock |> filter(str_detect(State,'Victoria') & str_detect(Animal,'Pig'))

vic_pigs |> autoplot()

## Plot variable not specified, automatically selected `.vars = Count`

 vic_pigs_fit <- vic_pigs |>
    model(ETS(Count ~ error("A") + trend("N") + season("N")))

pig_fc <- vic_pigs_fit |> forecast(h=4)
    
pig_fc |>
    autoplot(vic_pigs |> filter(year(Month)>='2015'))

report(vic_pigs_fit)

## Series: Count 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.3221247 
## 
##   Initial states:
##      l[0]
##  100646.6
## 
##   sigma^2:  87480760
## 
##      AIC     AICc      BIC 
## 13737.10 13737.14 13750.07

The alpha value of 0.32212 would indicate some movement for the level but not so much dramatic change. The initial state is 100,646.6 although there isn’t so much significance for this value.

Compute a 95% prediction interval for the first forecast using \(\hat{y} ± 1.96\) where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.

pig_se <- sd(augment(vic_pigs_fit)$.resid)

lower_pred <- mean(pig_fc |> pull(Count) |> head(1)) - 1.96 * pig_se
upper_pred <- mean(pig_fc |> pull(Count) |> head(1)) + 1.96 * pig_se

manual_distr <- c(lower_pred,upper_pred)
names(manual_distr) <- c('Lower Band','Upper Band')
manual_distr

## Lower Band Upper Band 
##   76871.01  113502.10

hilo(pig_fc)$`95%`[1]

## <hilo[1]>
## [1] [76854.79, 113518.3]95

The interval from the model is not that different from the manually calculated bounds although it is a little wider than what was calculated.

8.5) Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

After looking at the top export countries I was surprised to see Ireland so high on the list and decided to use their data for this problem.

ireland_ex <- global_economy |> filter(Country=='Ireland')

Plot the Exports series and discuss the main features of the data.

ireland_ex |> autoplot(Exports)

Given that it’s annual data there isn’t any seasonality to account for in the model, but the data shows a clear long term upward trend and cycle. There was a dip in the early 2000s but the exports have continued to increase and follow growth cycles.

ireland_ex |> ACF(Exports) |> autoplot()

The autocorrelation between lagged values is fairly high and decreases after more lagged periods. It would make sense that this time series has lagged correlation given most of the upward trends.

Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.

ire_ex_fit <- ireland_ex |> model(ets=ETS(Exports ~ error('A')+trend("N")+season('N')))

ire_ex_fit |> forecast(h=5) |> autoplot(ireland_ex)

With only a simple exponential smoothing technique applied, the forecasts seem to miss a newer downward trend although the distribution shown with the confidence intervals would hopefully capture this future activity.

Compute the RMSE values for the training data.

ann_rmse <- sqrt(ire_ex_fit$ets[[1]]$fit$fit$MSE)
ann_rmse

## [1] 4.02662

Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.

ire_ex_fit_aan <- ireland_ex |> model(ets=ETS(Exports ~ error('A')+trend("A")+season('N')))
ire_ex_fit_aan |> forecast(h=5) |> autoplot(ireland_ex)

From comparing the initial forecasts between the two methods it looks like the additive trend inclusion follows the time series longer term fairly consistent increasing cycles whereas the simple exponential has a less powerful predicted slope/level given it doesn’t factor in the trend component.

Let’s see how the fit was evaluated for RMSE:

aan_rmse <- sqrt(ire_ex_fit_aan$ets[[1]]$fit$fit$MSE)
aan_rmse

## [1] 3.738005

This metric appears to also indicate that the residuals are smaller when factoring in the trend parameter within the exponential model. At a high level that makes sense because the process of determining the fit would likely better factor in the long term upward cycles that are present in Ireland’s exports. It seems more likely to represent the potential future values even if a future cycle has just started by having due to the trend parameter in the model.

Compare the forecasts from both methods. Which do you think is best?

Given the overall cyclic trend of the data set it would appear that adding the trend component would help with trying to best capture future data with a forecast.

Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.

ire_ann_fc <- ire_ex_fit |> forecast(h=5) 
ire_aan_fc <- ire_ex_fit_aan |> forecast(h=5)
ire_pred_interval <- rbind(c(ire_ann_fc$.mean[1]-1.96*ann_rmse,ire_ann_fc$.mean[1]+1.96*ann_rmse),c(120-1.96*sqrt(17),120+1.96*sqrt(17)),c(ire_aan_fc$.mean[1]-1.96*aan_rmse,ire_aan_fc$.mean[1]+1.96*aan_rmse),c(121-1.96*sqrt(15),121+1.96*sqrt(15)))
rownames(ire_pred_interval) <- c('SES ANN','R ANN','AAN','R AAN')
colnames(ire_pred_interval) <- c('Lower','Upper')

ire_pred_interval

##            Lower    Upper
## SES ANN 112.1205 127.9049
## R ANN   111.9187 128.0813
## AAN     113.9043 128.5573
## R AAN   113.4090 128.5910

The intervals are all fairly close to one another although the automated calculated distribution has a slightly larger interval. The variance is calculated in a different way with the automated method but either means of estimated the error should be reasonable.

8.6 Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

[Hint: use a relatively large value of h when forecasting, so you can clearly see the differences between the various options when plotting the forecasts.]

chinese_gdp <- global_economy |> filter(Country=='China') |> mutate(gdp_mil=GDP/1e6)
summary(chinese_gdp)

##            Country        Code         Year           GDP           
##  China         :58   CHN    :58   Min.   :1960   Min.   :4.721e+10  
##  Afghanistan   : 0   ABW    : 0   1st Qu.:1974   1st Qu.:1.455e+11  
##  Albania       : 0   AFG    : 0   Median :1988   Median :3.301e+11  
##  Algeria       : 0   AGO    : 0   Mean   :1988   Mean   :1.970e+12  
##  American Samoa: 0   ALB    : 0   3rd Qu.:2003   3rd Qu.:1.613e+12  
##  Andorra       : 0   AND    : 0   Max.   :2017   Max.   :1.224e+13  
##  (Other)       : 0   (Other): 0                                     
##      Growth             CPI            Imports          Exports      
##  Min.   :-27.270   Min.   : 26.05   Min.   : 2.133   Min.   : 2.492  
##  1st Qu.:  7.298   1st Qu.: 60.33   1st Qu.: 4.352   1st Qu.: 4.357  
##  Median :  9.131   Median : 81.85   Median :12.376   Median :13.048  
##  Mean   :  8.227   Mean   : 79.00   Mean   :12.625   Mean   :14.117  
##  3rd Qu.: 11.235   3rd Qu.: 98.22   3rd Qu.:18.442   3rd Qu.:20.748  
##  Max.   : 19.300   Max.   :119.09   Max.   :28.444   Max.   :36.035  
##  NA's   :1         NA's   :26                                        
##    Population           gdp_mil        
##  Min.   :6.603e+08   Min.   :   47209  
##  1st Qu.:9.044e+08   1st Qu.:  145522  
##  Median :1.110e+09   Median :  330061  
##  Mean   :1.077e+09   Mean   : 1970486  
##  3rd Qu.:1.286e+09   3rd Qu.: 1612853  
##  Max.   :1.386e+09   Max.   :12237700  
##

chinese_gdp |> autoplot(gdp_mil)

The massive upward trend is rather incredible and speaks to the growth in economic power in China that has occurred over the last half century. The values before 1970 seem a bit too low and makes me wonder if there was accurate measurement of the values at this time.

chinese_gdp |> filter(Year<='1980') |> 
    autoplot(gdp_mil)

By drilling into this time frame it appears that due to the substantial growth these earlier values appear to be less than a million, but are just so much smaller than more recent economic numbers.

chinese_gdp |> ggplot(aes(x=gdp_mil)) +
    geom_histogram()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The data is heavily right skewed from reviewing the value distributions, which on some level makes sense given the massive trend cycle.

Let’s start by testing transformations of the data based on the prior skewed histogram:

gdp_mil <- as.matrix(chinese_gdp$gdp_mil)
colnames(gdp_mil) <- c('Chinese_GDP')
bc_gdp <- caret::preProcess(x=gdp_mil,method='BoxCox')
bc_gdp_lambda <- bc_gdp[2]$bc$Chinese_GDP$lambda

par(mfrow=c(1,2))
hist(gdp_mil^gdp_mil^bc_gdp_lambda,main='Box Cox Chinese GDP')
hist(log(gdp_mil),main='Log Transformed Chinese GDP')

The new distribution isn’t exactly normal, but we will test using it in the ETS modeling to see how it impacts forecast quality. Alternatively, taking the logarithm of the GDP appears to do a better job of partially normalizing the data although it is perhaps not centered on the mean. This distribution would appear to do more for GDP than the Box-Cox.

ch_fits <- chinese_gdp|> 
    model(aan= ETS(gdp_mil ~ error("A") + trend("A") + season("N")),
          aadn= ETS(gdp_mil ~ error("A") + trend("Ad") + season("N")),
          bc_aadn=ETS(gdp_mil^bc_gdp_lambda ~ error('A') + trend('Ad') + season('N')),
          log_aadn_0.8=ETS(log(gdp_mil) ~ error('A') + trend('Ad',phi=0.8) + season('N')),
          log_aadn_opt=ETS(log(gdp_mil) ~ error('A') + trend('Ad') + season('N')),
          bc_madn=ETS(gdp_mil^bc_gdp_lambda ~ error('M') + trend('Ad') + season('N')),
          log_madn=ETS(log(gdp_mil) ~ error('M') + trend('Ad') + season('N')))

ch_fits |> 
    forecast(h=25) |>
    autoplot(chinese_gdp,level=NULL) +
    scale_color_manual(values=c('aadn'='#E69F00','aan'='#56B4E9','bc_aadn'='#009E73',
                      'log_aadn_0.8'='#F0E442','log_aadn_opt'='#0072B2','bc_madn'='#D55E00','log_madn'='#CC79A7')) +
    labs(title='Experimenting with Exponential Smoothing')

It’s rather surprising that the model that has such a high trend has a damping factor included, but that is true of both of the box cox transformed values. This would seem to indicate that the specific power tranformation is not right for predicting the values as it’s forecasting continued large high growth trend/cycle which does not seem sustainable.

Let’s replot the non-box cox forecasts to build further intuition:

ch_fits2 <- chinese_gdp |> 
    model(aan= ETS(gdp_mil ~ error("A") + trend("A") + season("N")),
          aadn= ETS(gdp_mil ~ error("A") + trend("Ad") + season("N")),
          log_aadn_0.8=ETS(log(gdp_mil) ~ error('A') + trend('Ad',phi=0.8) + season('N')),
          log_aadn_opt=ETS(log(gdp_mil) ~ error('A') + trend('Ad') + season('N')),
          log_madn=ETS(log(gdp_mil) ~ error('M') + trend('Ad') + season('N')))

ch_fits2 |> 
    forecast(h=25) |>
    autoplot(chinese_gdp,level=NULL) +
    scale_color_manual(values=c('aadn'='#E69F00','aan'='#56B4E9',
                      'log_aadn_0.8'='#F0E442','log_aadn_opt'='#0072B2','log_madn'='#CC79A7')) +
    labs(title='Experimenting with Exponential Smoothing')

It is rather interesting that some of the log plots even with a damping effect are also continuing to show a very large increase. As a way to test, two different methods were used to determine \(\phi\) with the optimized version and a specified parameter. The specified value for damping is showing more modest growth along with the other non-transformed exponential models. Given that the multiplicative error models showed very high trend cycles it’s likely the error term isn’t growing at such a rate over the time series. The damping factor comparing AAN and AADN is a good visual representation of the effect of more conservative differences that would exist in the forecast.

8.7 Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

aus_production |> autoplot(Gas)

Reviewing the initial autoplot of this time series on gas consumption it is pretty clear the seasonal period has changed quite dramatically from the 1960s - 1970s to the 1990s onward.

gg_season(aus_production)

## Plot variable not specified, automatically selected `y = Beer`

There are clear patterns driven off of weather that require more gas consumption in winter and summer.

aus_production |> ACF(Gas) |> autoplot()

There is extremely high lagged correlation even 24 months out and there are some peaks that reoccur every winter/summer alternation which likely stem from energy requirements for more extreme weather patterns.

Let’s plot the forecasts experimenting with \(\phi\) for dampening and using multiplicative for most of the seasonality components.

gas_fit <- aus_production |> 
    model(aaa = ETS(Gas ~ error("A") + trend("A") + season("A")),
         mam =ETS(Gas ~ error('M') + trend('A') + season('M')),
         madm = ETS(Gas ~ error('M') + trend('Ad') + season('M')))

gas_fit |> 
    forecast(h=12) |>
    autoplot(aus_production |> dplyr::select(Gas) |> filter(year(Quarter)>='2000'),level=NULL) +
    scale_color_manual(values=c('aaa'='#E69F00','madm'='#56B4E9',
                      'mam'='#009E73')) +
    labs(title='Experimenting with Exponential Smoothing on Gas Data')

Including the additive seasonality was purely meant to provide a comparative point and the orange forecast is different despite the fact that most of the forecasts overlap one another for the most part. Given that the season component needs to be multiplicative, there are a few combinations of model parameters that are discouraged by Hyndman due to optimization difficulties. The damping parameter has an impact in terms of the tapering off or extending how high or low the seasonal periods will go.The most recent trend cycle has some positive slope which the damping factor would attempt to balance although the forecast is likely not for a long enough period for this to be apparent.

Let’s review some of the fit metrics to see if there are more noticeable differences between the models although this is purely based on the train set.

knitr::kable(accuracy(gas_fit))

.model	.type	ME	RMSE	MAE	MPE	MAPE	MASE	RMSSE	ACF1
aaa	Training	0.0052496	4.763979	3.345044	-4.6854479	10.864395	0.6000330	0.6282550	0.0772293
mam	Training	-0.1148865	4.595113	3.021727	0.1987614	4.082508	0.5420365	0.6059856	-0.0130629
madm	Training	-0.0043856	4.591840	3.031478	0.3258961	4.104484	0.5437856	0.6055540	-0.0217035

The accuracy statistics across the board show an improvement compared to additive seasonality (as expected) and it appears that the damping parameter may not be as helpful for this model comparing aam and madm. The comparison between those two models would indicate that the error term is likely better estimated as an additive term.

Let’s review the residuals to confirm:

gas_fit %>% dplyr::select(madm) |>gg_tsresiduals()

gas_fit %>% dplyr::select(mam) |>gg_tsresiduals()

The residuals appear to be less variable in the multiplicative error model (especially when considering the different scales!) although there are a few more prominent peaks that occur in the time plot; however, there is quite a bit of autocorrelation that exists and a couple of outliers from the approximately normal distribution. The additive histogram is definitely too spread out although the lagged correlations aren’t as significant. Interestingly the innovation residuals are much larger than the multiplicative model, but perhaps why these additive error models were more accurate in the aggregate is because of the fairly accurate fits at the beginning of the time series. It might be similar to the madm model if the scale were modified to be the same.

8.8 Recall your retail time series data (from Exercise 7 in Section 2.10).

set.seed(12345678)

sample_aus_retail <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

sample_aus_retail |> autoplot(Turnover)

Why is multiplicative seasonality necessary for this series?

The seasonality component is growing at a potentially exponential scale and is therefore not going to be accurately represented by incorporating an additive component. Multiplicative seasonality can better account for this type of change compared to additive methods.

Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

ret_fit <- sample_aus_retail |>
model(hw_mult_damped = ETS(Turnover ~ error("M") + trend("Ad") + season("M")),
    hw_mult = ETS(Turnover ~ error("M") + trend("A") + season("M"))
  )
ret_fc <- ret_fit |> forecast(h = 16)
ret_fc |>
  autoplot(sample_aus_retail |> filter(year(Month)>='2012'), level = NULL) +
  labs(title="Exponential Forecasts on Retail Turnover Data",
       y="AUD (millions)") +
  guides(colour = guide_legend(title = "Forecast"))

The biggest difference between the damped forecast and other Holt-Winters’ multiplicative method appears in the low points of the seasonal forecast and the damping factor seems to lower those estimates.

Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

knitr::kable(accuracy(ret_fit) |> dplyr::select(c('.model','RMSE','MAE','MASE','RMSSE')))

.model	RMSE	MAE	MASE	RMSSE
hw_mult_damped	0.6155067	0.4444157	0.5073758	0.5311343
hw_mult	0.6130408	0.4497149	0.5134257	0.5290064

The RMSE values for both models are fairly similar to one another and it’s rather hard purely based off of that factor to make a decision. Retail Turnover in general is prone to fairly large swings that are generally lead indicators I would think to economic cycle changes. Therefore, perhaps a dampening factor is not needed given that the longer term trends are somewhat extreme although not as long lasting.

Check that the residuals from the best method look like white noise.

ret_fit |> dplyr::select(hw_mult) |> gg_tsresiduals()

The residuals do not appear to be problematic as they are homoskedastic and appear to have an reasonable split above and below zero across the time series. There is perhaps slightly more autocorrelation than an ideal case, but not so many lagged values are statistically significant. The distribution is reasonably able to approximate a normal bell shape although it may not be centered exactly at zero.

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?

ret_fit_train <- sample_aus_retail |> filter(year(Month)<'2011') |> 
model(snaive = SNAIVE(Turnover),
    hw_mult = ETS(Turnover ~ error("M") + trend("A") + season("M"))
  )

ret_fit_fc_train <- ret_fit_train |>
    forecast(h=12*8)

ret_fit_fc_train |> autoplot(sample_aus_retail,level=NULL)

It’s rather hard to compare these two forecasts as the seasonal naive appears to poorly estimate the seasons with less Turnover, while the Holts-Winters’ multiplicative is the opposite for the seasons more prone to turnover.

Let’s review the accuracy statistics to quantitatively evaluate the two methods:

knitr::kable(ret_fit_fc_train %>% 
  accuracy(sample_aus_retail))

.model	State	Industry	.type	ME	RMSE	MAE	MPE	MAPE	MASE	RMSSE	ACF1
hw_mult	Northern Territory	Clothing, footwear and personal accessory retailing	Test	-1.5388814	1.782947	1.597517	-12.292535	12.642079	1.746761	1.468981	0.4951790
snaive	Northern Territory	Clothing, footwear and personal accessory retailing	Test	0.8364583	1.551981	1.240625	5.940129	9.064092	1.356528	1.278687	0.6011751

When reviewing these accuracy measures it still looks as though the seasonal naive method has a lower value across many of these values.

8.9 For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

caret::preProcess(sample_aus_retail |> as.data.frame() |> dplyr::select(Turnover),method='BoxCox')

## Created from 369 samples and 1 variables
## 
## Pre-processing:
##   - Box-Cox transformation (1)
##   - ignored (0)
## 
## Lambda estimates for Box-Cox transformation:
## 0.5

The recommended lambda for this series is 0.5 based on the Caret package.

ret_fit <- sample_aus_retail |> 
    model(bc_stl<- STL(Turnover^0.5))

ret_sa <-ret_fit |>  components() |> dplyr::select(season_adjust)
 
ret_sa |> autoplot(season_adjust) + labs(title='Seasonally Adjusted Retail Turnover in Australia')

ret_sa_fit <- ret_sa |> filter(year(Month)<'2011') |>
    model(sa_aaa=ETS(season_adjust ~ error("A") + trend("A") + season("A")),
          sa_madn=ETS(season_adjust ~ error("M") + trend("Ad") + season("N")),
          sa_aada=ETS(season_adjust ~ error("A") + trend("Ad") + season("A")),
          sa_mada=ETS(season_adjust ~ error("M") + trend("Ad") + season("A")),
          sa_madm=ETS(season_adjust ~ error("M") + trend("Ad") + season("M")))

ret_fc <- ret_sa_fit |> forecast(h=12*8) 

ret_fc |> autoplot(ret_sa,level=NULL)

Checking the accuracy measures for the seasonally adjusted time series:

knitr::kable(ret_fc |> accuracy(ret_sa))

.model	.type	ME	RMSE	MAE	MPE	MAPE	MASE	RMSSE	ACF1
sa_aaa	Test	-0.1561432	0.1812880	0.1615810	-4.339976	4.483439	0.9579625	0.8132754	0.5344650
sa_aada	Test	0.1322558	0.1932272	0.1537924	3.509169	4.143077	0.9117859	0.8668358	0.7966006
sa_mada	Test	0.1346678	0.1951547	0.1555047	3.575845	4.189795	0.9219380	0.8754829	0.7923661
sa_madm	Test	0.1426841	0.2042763	0.1644560	3.792477	4.433454	0.9750073	0.9164032	0.8046337
sa_madn	Test	0.1078741	0.1749964	0.1378687	2.838014	3.717208	0.8173796	0.7850509	0.7947684

The RMSE is substantially lower in all of the models for the test set although from a visual review it does not appear that these forecasts are as closely aligned with the real values compared to the seasonal naive method. The models tested various combinations of error, trend, and season components although not surprisingly the model without any seasonal component (the error term was multiplicative with a damped trend) scored lowest across almost all of the metrics. These results would seem to indicate that removing some of the seasonable component can help to stablilize the forecast particularly if STL can reliably identify the seasonal patterns in a time series.

Let’s review the residuals of this model:

ret_sa_fit |> dplyr::select(sa_madn) |> gg_tsresiduals()

There is limited statistically significant autocorrelation with only a 2 year lag period have any non-zero relationship. The distribution of the residuals is fairly normal although there are possibly a few outliers that exist, but overall it is about as approximately normal as possible and there are equivalent points on both left and right that could be outliers. The innovation residuals also look to be fairly homoskedastic with a reasonable mixture of differences above and below the true values.

Data624_HW5

jfor: Josh Forster

2024-02-23