Hyndman, Chapter 8 Ex.

Question 1

Part a

Using the ETS function in R, an $ETS(A,N,N)$ model was constructed with $\alpha = 0.322$ and $l_0 = 100,646.6$. Forecasts for the next 4 months are summarized and displayed visually in the table and plot below.

## # A tsibble: 4 x 2 [1M]
##      Month  .mean
##      <mth>  <dbl>
## 1 2019 Jan 95187.
## 2 2019 Feb 95187.
## 3 2019 Mar 95187.
## 4 2019 Apr 95187.

Part b

The first month’s 95% forecasted prediction interval, created using the variance of the model’s residuals, is compared to the first month’s 95% prediction interval from the forecast variable (ETS_forecast_hilo) below. There is a negligible difference between the 2 calculations, as expected. The forecast variance for any month’s forecasted prediction interval is calculated using $\sigma_h^2 = \sigma^2[1 + \alpha^2(h-1)]$. Since in the first month $h = 1$, the forecast variance is equivalent to the residual variance.

## [1] "95% lower bound (residual variance):  76854"

## [1] "95% upper bound (residual variance):  113519"

## [1] "95% lower bound (forecast):  76855"

## [1] "95% upper bound (forecast):  113518"

Question 2

The code for a function that returns the forecast for the next observation in the series is below. Of note, the $y$ variable in the function requires that the modeled variable within the time series object be specified (e.g., aus_livestockV$Count in this case). As the printed value of $95,186.56$ demonstrates, the forecast for the next observation in the series is identical to the forecasted values in Question 1a.

#QUESTION 2

ETS_fxn1 <- function(y, alpha, level) {
  
  time_forecast <- length(y)
  
  vector_forecast <- c()
  
  for(j in 1:length(y)) {
    
    vector_forecast[j] <- alpha*(1-alpha)^(j-1)*y[time_forecast-(j-1)]
    
  }
  
  fcast1 <- sum(vector_forecast) + (1-alpha)^time_forecast
  
  return(fcast1)
  
}

print(ETS_fxn1(aus_livestockV$Count, 0.3221247, 100646.6))

## [1] 95186.56

Question 3

The code for a function that returns $\alpha$ and $l_0$ such that the SSE is minimized is displayed below. It returns $\alpha = 0.323$ and $l_0 = 107,959.609$. Although not identical to the values produced with the ETS() function, they are close in scale.

#QUESTION 3

ETS_fxn2 <- function(y, par) {
  
  SSE <- c(par[2]-y[1])
  
  for(i in 2:length(y)) {
    
    time_forecast <- i-1
    
    vector_forecast <- c()
    
    for(j in 1:time_forecast) {
      
      vector_forecast[j] <- 
        par[1]*(1-par[1])^(j-1)*y[time_forecast-(j-1)]
      
    }
    
    y_fitted <- sum(vector_forecast)+(1-par[1])^time_forecast*par[2]
    
    SSE[i] <- (y_fitted - y[i])^2
    
  }
  
  return(sum(SSE))
  
}

result <- optim(par = c(0, mean(aus_livestockV$Count)), 
                fn = ETS_fxn2, 
                y = aus_livestockV$Count)

print(paste("Alpha = ", round(result$par[1], 3)))

## [1] "Alpha =  0.323"

print(paste("Level = ", round(result$par[2], 3)))

## [1] "Level =  107959.609"

Question 4

A function combining the functions from 2 and 3 is shown below. After calculating the optimized values of $\alpha$ and $l_0$, it proceeds to calculating the first forecasted observation returning a value of $95,185.67$. The difference between this forecast and the one produced with the ETS() function is negligible.

#QUESTION 4

ETS_fxn3 <- function(y) {
  
  result <- optim(par = c(0, mean(aus_livestockV$Count)), 
                  fn = ETS_fxn2, 
                  y = aus_livestockV$Count)
  
  return(ETS_fxn1(y, result$par[1], result$par[2]))
  
}

print(ETS_fxn3(aus_livestockV$Count))

## [1] 95185.67

Question 5

Part a

A plot of US export data is below (left). Overall, there is an upward trend in the nominal value of exports with cyclical dips corresponding to times of slowed economic activity; seasonality does not appear to be present in the data.

In an effort to understand the data more clearly, export values were inflation-adjusted to be quoted in 2000 USD. The plot on the right below shows the downward shift in real US exports over time, showing sharp decreases between 1960 and early-1970, the mid-1970’s to the mid-1980’s and from the mid-1990’s to early-2000’s. Aside from these decreases and a sharp increase in the value of real exports from early-1970 through mid-1970, change in real US exports in the remaining years was largely unremarkable.

Part b

Although the value of US exports was explored both in a nominal and real sense, forecasts were conducted on the nominal export data. If a trend in nominal values exists (e.g., inflation potentially), an appropriate ETS model will account for it.

The results of a 6 year forecast with the ETS(A,N,N) model are shown below along with a plot of fitted (blue dashed) vs. actual (black solid) values.

Part c

The RMSE value for the training data is shown below as $0.622$.

## [1] 0.622

Part d

Six year forecasts produced by the ETS(A,N,N) and ETS(A,A,N) models are shown below. In general, both models display some credible merits based on observation alone. The ETS(A,N,N) model largely acknowledges the variability in the data and suggests that nominal exports should be weighted heavily toward prior year exports. With $\alpha$ nearly equivalent to $1$, the ETS(A,N,N) model operates very much like a Naive model.

On the other hand, with $\alpha$ nearly equivalent to $1$ and $\beta$ close to $0$, the ETS(A,A,N) model relies heavily on the value of the prior observation and the prior trend, which may be intuitively credible due to the existence of inflation and its non-negligible impact on the data (see plots from part a).

Finally, the ETS_AAN model gives a marginally better RMSE value suggesting it as the stronger model.

## # A tibble: 2 × 2
##   .model   RMSE
##   <chr>   <dbl>
## 1 ETS_ANN 0.622
## 2 ETS_AAN 0.610

Part e

After comparing the plots in part d, it is evident that the forecasts produced under the two models differ. The ETS(A,N,N) model essentially carries forward the last actual observation whereas the ETS(A,A,N) model carries forward the trend existent at the time of the last observation.

In general, the ETS(A,A,N) model produces a more favorable forecast. While there can be pragmatism in naive forecasting, there isn’t an example across the time series of a flattening in exports after a decline, which is what the ETS(A,N,N) model suggests. Further, a slight upward trend in nominal exports, shown in the ETS(A,A,N) model, would be consistent with inflationary expectations.

Part f

An RMSE-constructed 95% prediction interval for the ETS(A,N,N) model is shown below, along with the R-constructed 95% interval.

## [1] "RMSE 95% Prediction Interval Lower Bound, ETS(A,N,N):  10.672"

## [1] "95% Prediction Interval Lower Bound, ETS(A,N,N):  10.651"

## [1] "RMSE vs. R ratio (95% lower):  1.002"

## [1] "RMSE 95% Prediction Interval Upper Bound, ETS(A,N,N):  13.109"

## [1] "95% Prediction Interval Upper Bound, ETS(A,N,N):  13.131"

## [1] "RMSE vs. R ratio (95% upper):  0.998"

An RMSE-constructed 95% prediction interval for the ETS(A,A,N) model is shown below, along with the R-constructed 95% interval.

## [1] "RMSE 95% Prediction Interval Lower Bound, ETS(A,A,N):  10.818"

## [1] "95% Prediction Interval Lower Bound, ETS(A,A,N):  10.774"

## [1] "RMSE vs. R ratio (95% lower):  1.004"

## [1] "RMSE 95% Prediction Interval Upper Bound, ETS(A,A,N):  13.209"

## [1] "95% Prediction Interval Upper Bound, ETS(A,A,N):  13.252"

## [1] "RMSE vs. R ratio (95% upper):  0.997"

Overall, the prediction intervals constructed with the RMSE are similar to the R-generated prediction intervals for both models. Once again, there are some favorable results generated by the ETS(A,A,N) model with a more restricted range.

Question 6

Forecasts of China’s nominal GDP 24 months from 2017 are shown below for an ETS(M,A,N) and ETS(M,Ad,N) model. Both plots look similar, although there is evident dampening in the second plot. The dampening does not appear to improve the prediction interval range.

RMSE results of the ETS(M,A,N) and ETS(M,Ad,N) models are shown below; the ETS(M,Ad,N) model performs better in this view.

## # A tibble: 2 × 2
##   .model      RMSE
##   <chr>      <dbl>
## 1 ETS_MAN  200056.
## 2 ETS_MAdN 199411.

China’s annual GDP displays an exponential trend; addressing this might improve model results. Therefore, two additional models were constructed as follows:

ETS_boxcox: $\lambda = -0.03$
ETS_lnGDP_M: GDP_M was log-transformed

Plots of forecasts of China’s GDP 24 months from 2017 are shown below for the two specified models. Of particular note, the training data shows a linear trend due to the transformations, and the forecasts display a more restricted prediction interval range. Further, the forecasts continue to follow a linear path, similar to the point forecasts from the ETS(M,A,N) and ETS(M,Ad,N) models.

RMSE statistics are shown below for the Box-Cox and log-transformed models. The ETS_boxcox model gives the lowest value of RMSE between the two.

## # A tibble: 2 × 2
##   .model       RMSE
##   <chr>       <dbl>
## 1 ETS_boxcox 0.0587
## 2 ETS_log    0.0906

Question 7

A plot of quarterly gas production in Australia is shown below. The plot shows an increasing trend, and seasonality that increases in variability over time; the latter observation necessitates the use of multiplicative seasonality in any model.

Forecasts of gas production 8 quarters (2 years) beyond the end of the series are shown below for both models, each specified in the chart’s title. The model’s appear to perform similarly, an observation substantiated by the models’ metrics shown in the tables preceeding the plots. The ETSauto function, ETS(M,A,M), performs only slightly better on nearly all metrics on the training data. In this way, it is likely that dampening the trend does not improve the forecasts.

## # A tibble: 2 × 9
##   .model    sigma2 log_lik   AIC  AICc   BIC   MSE  AMSE    MAE
##   <chr>      <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 ETSauto  0.00324   -831. 1681. 1682. 1711.  21.1  32.2 0.0413
## 2 ETS_MAdM 0.00329   -832. 1684. 1685. 1718.  21.1  32.0 0.0417

## # A tibble: 2 × 10
##   .model   .type          ME  RMSE   MAE   MPE  MAPE  MASE RMSSE    ACF1
##   <chr>    <chr>       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1 ETSauto  Training -0.115    4.60  3.02 0.199  4.08 0.542 0.606 -0.0131
## 2 ETS_MAdM Training -0.00439  4.59  3.03 0.326  4.10 0.544 0.606 -0.0217

Question 8

Part a

A plot of monthly retail turnover is below. Evident in the plot is increasing seasonal variability suggesting that employing multiplicative seasonality in any ETS model is necessary.

Part b

Model results for an ETS(M,A,M) and ETS(M,Ad,M) model are below; of note, the ETS(M,A,M) model is equivalent to Holt-Winters’ multiplicative method. Based on the models’ metrics, the ETS(M,A,M) model performs better with slightly favorable results in all metrics other than the MAE.

## # A tibble: 2 × 9
##   .model    sigma2 log_lik   AIC  AICc   BIC   MSE  AMSE    MAE
##   <chr>      <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 ETS_MAM  0.00447   -870. 1775. 1776. 1841. 0.376 0.473 0.0511
## 2 ETS_MAdM 0.00457   -872. 1781. 1783. 1851. 0.379 0.479 0.0505

Part c

A table and plotted comparison of the one-step forecasts generated under both the ETS(M,A,M) and ETS(M,Ad,M) models are below. Once again, the models’ RMSE metrics are nearly identical with the ETS(M,A,M) having slightly more favorable results. Seeing as the plotted results also look nearly identical, the ETS(M,A,M) model remains favorable for this reason.

## # A tibble: 2 × 2
##   .model    RMSE
##   <chr>    <dbl>
## 1 ETS_MAM  0.613
## 2 ETS_MAdM 0.616

Part d

Residual plots for the ETS(M,A,M) model are shown below. Overall, the residuals appear to be white noise based on the top-most plot of the figures below.

The other plots display some skew in the distribution of residuals (bottom right) and some significant autocorrelation among the residuals, particularly at the 8-month and 12-month lags; of note, the correlation at these lags is partially offsetting. A ljung-box test confirms the statistical significance of the auto-correlation among error terms.

## # A tibble: 1 × 3
##   .model  lb_stat lb_pvalue
##   <chr>     <dbl>     <dbl>
## 1 ETS_MAM    34.5 0.0000736

Part e

A comparison of forecasts on retail data from January 2011 through December 2018 is below, along with a table of RMSE comparisons. Although an ETS(M,A,M) model tends to fit the training data more accurately, the opposite is true for the testing data; the seasonal Naive model performs more strongly with an RMSE of $1.55$ vs. $1.78$.

## # A tibble: 4 × 2
##   .model   RMSE
##   <chr>   <dbl>
## 1 ETS_MAM 0.518
## 2 NAIVE   1.21 
## 3 ETS_MAM 1.78 
## 4 NAIVE   1.55

Question 9

Results of a new ETS model on seasonally-adjusted retail turnover data are below, including an RMSE comparison to the prior 2 models. Based on the RMSE metric, the new model has an overall better RMSE than the prior two models.

## # A tibble: 3 × 2
##   .model     RMSE
##   <chr>     <dbl>
## 1 ETS_STLbc  1.44
## 2 ETS_MAM    1.78
## 3 NAIVE      1.55

Question 10

Part a

A plot of total domestic overnight trips in Australia by quarter is shown below. The data displays apparent seasonality; however, the trend of the data changes between pre- and post-2010Q1 from flat to steadily rising.

Part b

Code to obtain the seasonally-adjusted values of an STL decomposition of the Trips data is below, and stored in the variable tourism_summ_STLSA

#QUESTION 10B

#STL Decomposition of tourism trips data
tourism_summ_STL <-
  tourism_summ %>%
  model(STL(Trips)) %>%
  components()

#Plot of STL Decomposition
tourism_summ_STL %>%
  autoplot()

#Variable definition --> seasonally adjusted data
tourism_summ_STLSA <- tourism_summ_STL$season_adjust

Part c

Seasonally-adjusted overnight domestic trips were forecasted over two years (8 quarters) with an ETS(A,Ad,N) model. A plot of forecasted values is shown below.

Part d

Seasonally-adjusted overnight domestic trips were forecasted over two years (8 quarters) with an ETS(A,A,N) model. A plot of forecasted values is shown below.

Part e

A report of a seasonal ETS model fit to the original data is below, along with the code that generated it.

#QUESTION 10E

#Generate seasonal ETS model on original Trips data
tourism_summ_ETS <-
  tourism_summ %>%
  model(ETS_auto = ETS(Trips))

#Print report of trips ETS model
print(report(tourism_summ_ETS))

## Series: Trips 
## Model: ETS(A,A,A) 
##   Smoothing parameters:
##     alpha = 0.4495675 
##     beta  = 0.04450178 
##     gamma = 0.0001000075 
## 
##   Initial states:
##      l[0]      b[0]      s[0]     s[-1]     s[-2]    s[-3]
##  21689.64 -58.46946 -125.8548 -816.3416 -324.5553 1266.752
## 
##   sigma^2:  699901.4
## 
##      AIC     AICc      BIC 
## 1436.829 1439.400 1458.267 
## # A mable: 1 x 1
##       ETS_auto
##        <model>
## 1 <ETS(A,A,A)>

#Forecast 2 years (for question 10g)
tourism_summ_forecast <-
  tourism_summ_ETS %>%
  forecast(h = 8)

Part f

A comparison of RMSE values across the three models produced is below. Based on the RMSE values, either the ETS_AAdN or the ETS_AAN model, fit to seasonally-adjusted data, is the strongest fit with values of $746$ vs. $794$.

## # A tibble: 3 × 2
##   .model    RMSE
##   <chr>    <dbl>
## 1 ETS_AAdN  746.
## 2 ETS_AAN   746.
## 3 ETS_auto  794.

Part g

Plots of the forecasts for the three models are shown below; the model is referenced in the titling of each plot.

Although it was determined in part f that the two seasonally-adjusted models performed better on the RMSE metric, the difference wasn’t strong enough to eliminate the seasonal model, ETS(A,A,A), from consideration. The forecasts strengthen the desirability of the seasonal model considering the inclusion of seasonal shifts in the tourism data, and an ability nearly on par with the seasonally-adjusted models to forecast in-series Trip values.

Part h

The residual plots of the ETS(A,A,A) model are shown below. The residuals show strong normality; however, with slightly upward trending residuals and significant autocorrelation at the 9- and 14-month lags, the true strength of the model is questionable.

Question 11

Part a

A plot of arrivals to Australia from New Zealand is shown below. Generally, the plot displays an upward trend in arrivals with clear seasonality. The seasonality in the data appears to increase in variability suggesting that a multiplicative approach to modeling is appropriate.

Part b

After creating a training set from the Arrivals data, two ETS models were constructed whose 2 year forecasts on the test dataset are shown below. Based on the visual comparison, the ETS(M,A,M) model appears to perform better with more narrow prediction intervals and similar point forecasts. Of note, ETS(M,A,M) is identical to the Holt-Winters’ Multiplicative Method (e.g., additive trend and multiplicative seasonality).

Part c

Incorporating multiplicative seasonality into the model is necessary here due to the increased variability in the seasonality over time. With an upward trend, multiplicative seasonality will create a wider range between peaks and valleys in the data.

Part d

The results of comparing forecasts across four different models are shown below. For reference, the four models include:

ETS(M,A,M) - Auto-generated;
ETS(A,A,A) - Developed on log-transformed Arrival counts;
SNAIVE; and
ETS(A,A,N) - Developed on seasonally-adjusted values of log(Arrivals).

Overall, the seasonal models (first 3 in list) tend to model the data most closely. Based on the model metrics, also shown below, the ETS(M,A,M) model (auto-generated) performed best on the test data.

## # A tibble: 4 × 5
##   .model         RMSE        MAE    MPE  MAPE
##   <chr>         <dbl>      <dbl>  <dbl> <dbl>
## 1 ETS_auto 14913.     11421.     -0.964 3.78 
## 2 ETS_AAA  17765.     12851.     -2.17  4.20 
## 3 NAIVE_S  18051.     17156.      3.44  5.80 
## 4 ETS_STL      0.0652     0.0518 -0.406 0.412

Part e

Based on the RMSE value, the ETS(M,A,M) model appears to give the best forecasts for the seasonal data.

Residual plots for the ETS(M,A,M) model are below. The residuals show no significant autocorrelation (bottom-left), and a reasonable distribution with largely constant variance; it is apparent that the absolute values of residuals are smaller toward the later periods in the data. Despite this and some right skew in residual distribution, the model appears to perform pretty strongly on the residual tests.

Part f

A table of cross-validated model results is below; overall, the conclusions from parts d and e generally hold with the auto-generated ETS seasonal model fitting the seasonal data best. That said, the ETS model on the seasonally-adjusted, log-transformed data remains low; it is difficult to know how to appropriately compare the values considering the difference in scales.

## # A tibble: 4 × 11
##   .model Origin .type       ME    RMSE     MAE     MPE  MAPE  MASE RMSSE    ACF1
##   <chr>  <chr>  <chr>    <dbl>   <dbl>   <dbl>   <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1 ETS_a… NZ     Test   1.45e+3 1.35e+4 1.07e+4  0.548  6.08  0.726 0.711 -0.123 
## 2 ETS_A… NZ     Test  -2.44e+3 1.39e+4 1.09e+4 -1.33   6.29  0.744 0.731 -0.0123
## 3 NAIVE… NZ     Test   7.51e+3 1.96e+4 1.50e+4  3.24   8.37  1.02  1.03   0.559 
## 4 ETS_S… NZ     Test   8.52e-3 7.41e-2 5.42e-2  0.0683 0.450 0.509 0.532 -0.0469

Question 12

Part a

The code associated with applying cross-validation techniques to produce 1-year ahead ETS and seasonal naive forecasts for Portland cement production is below.

#QUESTION 12A

#Portland Cement Production dataset
aus_cement <-
  aus_production %>%
  select(Quarter,
         Cement)

#CROSS-VALIDATION Dataset
aus_cement_tr <-
  aus_cement %>%
  stretch_tsibble(.init = 5, .step = 1) %>%
  relocate(Quarter, .id)

#ETS Model Development
aus_cement_tr_ETS <-
  aus_cement_tr %>%
  model(ETSauto = ETS(Cement)) %>%
  forecast(h = 4) %>%
  group_by (.id) %>%
  mutate(h = row_number()) %>%
  ungroup()

aus_cement_tr_SNAIVE <-
  aus_cement_tr %>%
  model(NAIVE_S = SNAIVE(Cement)) %>%
  forecast(h = 4) %>%
  group_by (.id) %>%
  mutate(h = row_number()) %>%
  ungroup()

Part b

A table and plot of RMSE results for both the ETS and SNAIVE models are shown below. For each forecasted period, the ETS model performs better than the SNAIVE model; however, the RMSE values associated with the ETS model appear to have a linear trend stronger than that of the SNAIVE model suggesting the seasonal naive model to be more appropriate for longer forecasts.

Overall, this outcome is largely expected; the ETS model relies on the availability of prior error, trend, and seasonality, all things that are likely to continue predictably over a short period of time. However, as time continues, it becomes more appropriate to simply rely on the last observed value as a forecast.

## Warning: The future dataset is incomplete, incomplete out-of-sample data will be treated as missing. 
## 4 observations are missing between 2010 Q3 and 2011 Q2

## Warning: The future dataset is incomplete, incomplete out-of-sample data will be treated as missing. 
## 4 observations are missing between 2010 Q3 and 2011 Q2

## # A tibble: 8 × 3
##       h .model   RMSE
##   <int> <chr>   <dbl>
## 1     1 ETSauto  82.0
## 2     2 ETSauto  98.8
## 3     3 ETSauto 116. 
## 4     4 ETSauto 133. 
## 5     1 NAIVE_S 134. 
## 6     2 NAIVE_S 134. 
## 7     3 NAIVE_S 135. 
## 8     4 NAIVE_S 135.

## Warning: The future dataset is incomplete, incomplete out-of-sample data will be treated as missing. 
## 4 observations are missing between 2010 Q3 and 2011 Q2

## Warning: The future dataset is incomplete, incomplete out-of-sample data will be treated as missing. 
## 4 observations are missing between 2010 Q3 and 2011 Q2

Question 13

For each time series analyzed in question 13, the same approach was taken. This reality is particularly important when thinking about STL decomposition models, for which the following was considered:

If a lambda value was calculated to be less than $0.5$, the response variable was log-transformed; otherwise, it was kept as is.
The seasonal window for STL decomposition was set to be infinite.
An ETS model was built on the seasonally-adjusted data, and a seasonal naive model was employed for the seasonal observations (season_year variable).

Due to the above steps, it is possible that a more appropriate decomposed model could be constructed; however, in the interest of comparing the 3 modeling types, research into this observation was not conducted.

Beer Production

A plot of beer production along with a 3-year forecast across three different modeling approaches are shown below. The table with model metrics suggests that the auto-generated ETS model is the most appropriate for the data based on the RMSE value.

## # A tibble: 3 × 5
##   .model    RMSE   MAE      MPE  MAPE
##   <chr>    <dbl> <dbl>    <dbl> <dbl>
## 1 ETSauto   9.62  8.92  0.00998  2.13
## 2 mod_dcmp 15.5  14.3  -1.20     3.34
## 3 NAIVE_S  10.8   9.75 -0.651    2.34

Bricks Production

A plot of bricks production along with a 3-year forecast across three different modeling approaches are shown below. The table with model metrics suggests that the auto-generated ETS model is the most appropriate for the data based on the RMSE value.

## # A tibble: 3 × 5
##   .model    RMSE   MAE   MPE  MAPE
##   <chr>    <dbl> <dbl> <dbl> <dbl>
## 1 ETSauto   17.5  13.2 0.474  3.31
## 2 mod_dcmp  20.1  16.0 3.13   3.89
## 3 NAIVE_S   36.5  32.6 7.85   7.85

Diabetes

A plot of diabetes cost subsidies along with a 3-year forecast across three different modeling approaches are shown below. The table with model metrics suggests that the auto-generated ETS model is the most appropriate for the data based on the RMSE value.

## # A tibble: 3 × 5
##   .model       RMSE      MAE    MPE  MAPE
##   <chr>       <dbl>    <dbl>  <dbl> <dbl>
## 1 ETSauto  1975114. 1580130.  0.579  7.22
## 2 mod_dcmp 2008990. 1698431.  0.281  7.64
## 3 NAIVE_S  3901300. 3362144. 12.2   14.7

Corticosteroids

A plot of corticosteroids cost subsidies along with a 3-year forecast across three different modeling approaches are shown below. The table with model metrics suggests that the auto-generated ETS model is the most appropriate for the data based on the RMSE value.

## # A tibble: 3 × 5
##   .model      RMSE    MAE   MPE  MAPE
##   <chr>      <dbl>  <dbl> <dbl> <dbl>
## 1 ETSauto   92803. 72655.  5.61  8.30
## 2 mod_dcmp  97955. 87481.  7.15  9.33
## 3 NAIVE_S  108094. 84943.  5.70  9.69

Food Retailing Turnover

A plot of food retail turnover along with a 3-year forecast across three different modeling approaches are shown below. The table with model metrics suggests that the auto-generated ETS model is the most appropriate for the data based on the RMSE value; however, of the 5 time series, this one has the most closely aligned results between the auto-generated ETS model and the STL decomposed model.

## # A tibble: 3 × 5
##   .model    RMSE   MAE     MPE  MAPE
##   <chr>    <dbl> <dbl>   <dbl> <dbl>
## 1 ETSauto   17.3  13.5  0.0722 0.985
## 2 mod_dcmp  17.4  14.4 -0.747  1.08 
## 3 NAIVE_S   52.3  49.2  3.59   3.59

Question 14

Part a, Tourism Models

Tourism data was summarized for the whole of Australia and then split into training and test data sets; the test data set covered a period of eight quarters for later comparison of models’ 2-year forecasts.

Two models were constructed:

ETS(M,N,M) - Auto-generated by R; and
ETS(M,Ad,M) - User-defined based on recent upward trend in Trips (figure 1 below)

Two-year forecast results for each of the models are below, along with metric performance on the forecast (e.g., RMSE) and residual plots. Overall, the ETS(M,Ad,M) performed best on the forecasts, suggesting that accounting for the recent upward trend in trip numbers was appropriate. Further, it was appropriate to keep to a multiplicative model given what appears to be increasingly variable seasonality (in an additive sense).

Residual plots provide a mixed, but overall good picture. Despite strong normality and mostly white noise in the residuals, concerns remain in the strong residual autocorrelation both models show at the 14 quarter lag, and the seeming upward trend in residual values, especially in later periods of the data.

## # A tibble: 2 × 5
##   .model    RMSE   MAE   MPE  MAPE
##   <chr>    <dbl> <dbl> <dbl> <dbl>
## 1 ETS_MAdM 1109.  884.  2.65  3.33
## 2 ETSauto  1721. 1395.  5.05  5.22

Part a, Closing Prices

Closing prices for Apple, Google, Amazon, and Facebook were averaged by month and by stock and then split into training and test datasets. ETS models were auto-generated for each of the stocks based on the training data set, and 3-month forecasts were generated and then compared to the test data. Results of the forecasts for each stocks are shown below. Overall, each over-predicted the next three-month of prices; residual plots showed mixed results, particularly in autocorrelated residuals.

## [1] "Model results for  AAPL"

## # A tibble: 1 × 5
##   .model   RMSE   MAE   MPE  MAPE
##   <chr>   <dbl> <dbl> <dbl> <dbl>
## 1 ETSauto  42.8  34.7 -19.9  19.9
## [1] "Model results for  AMZN"

## # A tibble: 1 × 5
##   .model   RMSE   MAE   MPE  MAPE
##   <chr>   <dbl> <dbl> <dbl> <dbl>
## 1 ETSauto  456.  433. -26.7  26.7
## [1] "Model results for  FB"

## # A tibble: 1 × 5
##   .model   RMSE   MAE   MPE  MAPE
##   <chr>   <dbl> <dbl> <dbl> <dbl>
## 1 ETSauto  26.1  24.5 -17.3  17.3
## [1] "Model results for  GOOG"

## # A tibble: 1 × 5
##   .model   RMSE   MAE   MPE  MAPE
##   <chr>   <dbl> <dbl> <dbl> <dbl>
## 1 ETSauto  132.  126. -11.9  11.9

Part a, Lynx Series

Lynx data from the pelt data set was separated into training and test data sets. An ETS model was auto-generated on the training data, and then forecast 4 years for comparison to the test dataset. A plot of the full dataset (training + test) is below along with forecast results and metrics. Overall, the model did not appear to perform well with an RMSE of $19,515$. Further, although the data appears to show seasonality, there is not a consistent variability, nor a consistent additive nature to it. Therefore, seasonality could not be used in the ETS modeling.

## # A tibble: 1 × 5
##   .model    RMSE    MAE   MPE  MAPE
##   <chr>    <dbl>  <dbl> <dbl> <dbl>
## 1 ETSauto 19515. 18173.  65.8  65.8

Part b

ETS modeling did not work well on the lynx data from the pelt dataset. There is seasonality in the data; however, due to its inconsistency, it couldn’t be properly incorporated into the ETS modeling.

R Code

knitr::opts_chunk$set(echo = TRUE)

#INVOKE APPROPRIATE LIBRARIES

library("feasts")
library("seasonal")
library("tsibble")
library("tsibbledata")
library("dplyr")
library("ggplot2")
library("forecast")
library("fable")
library("fpp3")
library("sqldf")


#QUESTION 1A

#Filter data to Victoria and Pigs
aus_livestockV <-
  aus_livestock %>%
    filter(State == "Victoria") %>%
    filter(Animal == "Pigs")

#Build a simple exponential smoothing function from the data
ETS_livestockV <-
  aus_livestockV %>%
  model(ETS = ETS(Count~error("A") + trend("N") + season("N")))

#Forecast 4 months
ETS_forecast <- 
  ETS_livestockV %>%
  forecast(h = "4 months")

#Plot 4 month forecast against the existing data
ETS_forecast %>%
  autoplot(aus_livestockV) + 
  labs(x = "Month",
       y = "Slaughter Count",
       title = "Monthly Slaughter w/ Forecast",
       subtitle = "Victoria, Pigs")

#Add R generated prediction intervals from the 
ETS_forecast_hilo <-
  ETS_forecast %>%
  hilo()

#Print summary of point forecasts
print(ETS_forecast_hilo %>%
        select(Month, .mean))


#QUESTION 1B

ETS_livestockV_sdRes <- sqrt(87480760) #Variance of residuals

print(paste("95% lower bound (residual variance): ", 
            round(ETS_forecast_hilo$.mean[1] - 1.96*ETS_livestockV_sdRes, 0)))
print(paste("95% upper bound (residual variance): ", 
            round(ETS_forecast_hilo$.mean[1] + 1.96*ETS_livestockV_sdRes, 0)))

print(paste("95% lower bound (forecast): ",
            round(unpack_hilo(ETS_forecast_hilo, "95%")$`95%_lower`[1], 0)))
print(paste("95% upper bound (forecast): ",
            round(unpack_hilo(ETS_forecast_hilo, "95%")$`95%_upper`[1], 0)))


#QUESTION 2

ETS_fxn1 <- function(y, alpha, level) {
  
  time_forecast <- length(y)
  
  vector_forecast <- c()
  
  for(j in 1:length(y)) {
    
    vector_forecast[j] <- alpha*(1-alpha)^(j-1)*y[time_forecast-(j-1)]
    
  }
  
  fcast1 <- sum(vector_forecast) + (1-alpha)^time_forecast
  
  return(fcast1)
  
}

print(ETS_fxn1(aus_livestockV$Count, 0.3221247, 100646.6))


#QUESTION 3

ETS_fxn2 <- function(y, par) {
  
  SSE <- c(par[2]-y[1])
  
  for(i in 2:length(y)) {
    
    time_forecast <- i-1
    
    vector_forecast <- c()
    
    for(j in 1:time_forecast) {
      
      vector_forecast[j] <- 
        par[1]*(1-par[1])^(j-1)*y[time_forecast-(j-1)]
      
    }
    
    y_fitted <- sum(vector_forecast)+(1-par[1])^time_forecast*par[2]
    
    SSE[i] <- (y_fitted - y[i])^2
    
  }
  
  return(sum(SSE))
  
}

result <- optim(par = c(0, mean(aus_livestockV$Count)), 
                fn = ETS_fxn2, 
                y = aus_livestockV$Count)

print(paste("Alpha = ", round(result$par[1], 3)))

print(paste("Level = ", round(result$par[2], 3)))


#QUESTION 4

ETS_fxn3 <- function(y) {
  
  result <- optim(par = c(0, mean(aus_livestockV$Count)), 
                  fn = ETS_fxn2, 
                  y = aus_livestockV$Count)
  
  return(ETS_fxn1(y, result$par[1], result$par[2]))
  
}

print(ETS_fxn3(aus_livestockV$Count))


#QUESTION 5A

#US Economy; select appropriate columns
US_economy <-
  global_economy %>%
  filter(Country == "United States") %>%
  select(Country,
       Code,
       Year,
       Exports,
       CPI)

#Lag US Exports variable to address null values
US_economy <-
  US_economy %>%
  mutate(Exports1L = lag(Exports, n = 1L)) %>%
  mutate(CPI_Ref = US_economy$CPI[Year == 2000])

#Null value set equal to lagged value of Exports (Naive approach)
for(i in 1:length(rownames(US_economy))) {
  
  if(is.na(US_economy$Exports[i])) {
    
    US_economy$Exports[i] = US_economy$Exports1L[i]
    
  } else {
    
    next
    
  }
  
}

#Finalize US_economy dataset
US_economy <-
  US_economy %>%
  mutate(RealExports = Exports*CPI_Ref/CPI) %>%
  select(Country, 
       Code,
       Year,
       Exports,
       RealExports)

US_economy %>%
  autoplot(Exports) +
  labs(x = "Year",
       y = "Exports",
       title = "Monthly Exports",
       subtitle = "United States")

US_economy %>%
  autoplot(RealExports) +
  labs(x = "Year",
       y = "Real Exports",
       title = "Monthly Real Exports",
       subtitle = "United States (2000 USD)")


#QUESTION 5B

US_economy_ETS <-
  US_economy %>%
  model(ETS_ANN = ETS(Exports ~ error("A") + trend("N") + season("N")),
        ETS_AAN = ETS(Exports ~ error("A") + trend("A") + season("N")))

#Select the ANN model from the ETS variable
US_economy_ANNf <-
  US_economy_ETS %>%
  select(ETS_ANN)

#Forecast another 6 years worth of Exports
US_economy_fcast <-
  US_economy_ETS %>%
  forecast(h = 6)

US_economy_fcast %>%
  filter(.model == "ETS_ANN") %>%
  autoplot(US_economy) +
  geom_line(data = fitted(US_economy_ANNf),
            aes(x = Year,
                y = .fitted),
            col = "blue",
            lty = "dashed") + 
  labs(x = "Year",
       y = "Nominal Export Value",
       title = "Annual Exports",
       subtitle = "US - ETS(A,N,N)")


#QUESTION 5C

#Training RMSE for ETS(A,N,N)
print(round(accuracy(US_economy_ANNf)$RMSE, 3))


#QUESTION 5D

#Repeat plot from 5c
US_economy_fcast %>%
  filter(.model == "ETS_ANN") %>%
  autoplot(US_economy) +
  geom_line(data = fitted(US_economy_ANNf),
            aes(x = Year,
                y = .fitted),
            col = "blue",
            lty = "dashed") + 
  labs(x = "Year",
       y = "Nominal Export Value",
       title = "Annual Exports",
       subtitle = "US - ETS(A,N,N)")

#Segregate ETS(A,A,N) model
US_economy_AANf <-
  US_economy_ETS %>%
  select(ETS_AAN)

#Generate comparison plot with ETS(A,A,N) model
US_economy_fcast %>%
  filter(.model == "ETS_AAN") %>%
  autoplot(US_economy) +
  geom_line(data = fitted(US_economy_AANf),
            aes(x = Year,
                y = .fitted),
            col = "blue",
            lty = "dashed") + 
  labs(x = "Year",
       y = "Nominal Export Value",
       title = "Annual Exports",
       subtitle = "US - ETS(A,A,N)")

#RMSE comparisons between ANN and AAN models
RMSE_compare <- rbind(US_economy_ANNf %>% accuracy() %>% select(.model,
                                                                  RMSE),
                      US_economy_AANf %>% accuracy() %>% select(.model,
                                                                  RMSE))

print(RMSE_compare)


#QUESTION 5F

US_economy_fcast_hilo <-
  US_economy_fcast %>%
  hilo()


#ETS(A,N,N) model forecasts
US_economy_fcast_ANN <-
  US_economy_fcast %>%
  filter(.model == "ETS_ANN")

#ETS(A,N,N) model forecast intervals
US_economy_ANN_hilo <-
  US_economy_fcast_hilo %>%
  filter(.model == "ETS_ANN")

#95% lower RMSE
print(paste("RMSE 95% Prediction Interval Lower Bound, ETS(A,N,N): ",
            round(US_economy_fcast_ANN$.mean[1] - 1.96 * RMSE_compare %>%
                    filter(.model == "ETS_ANN") %>%
                    select(RMSE), 3)))

#95% lower R-generated
print(paste("95% Prediction Interval Lower Bound, ETS(A,N,N): ",
            round(unpack_hilo(US_economy_ANN_hilo, "95%")$`95%_lower`[1], 3)))

print(paste("RMSE vs. R ratio (95% lower): ", 
      round((US_economy_fcast_ANN$.mean[1] - 1.96 * RMSE_compare %>%
         filter(.model == "ETS_ANN") %>%
         select(RMSE))/unpack_hilo(US_economy_ANN_hilo, "95%")$`95%_lower`[1], 3)
))

#95% upper RMSE
print(paste("RMSE 95% Prediction Interval Upper Bound, ETS(A,N,N): ",
            round(US_economy_fcast_ANN$.mean[1] + 1.96 * RMSE_compare %>%
                    filter(.model == "ETS_ANN") %>%
                    select(RMSE), 3)))

#95% upper R-generated
print(paste("95% Prediction Interval Upper Bound, ETS(A,N,N): ",
            round(unpack_hilo(US_economy_ANN_hilo, "95%")$`95%_upper`[1], 3)))

print(paste("RMSE vs. R ratio (95% upper): ",
      round((US_economy_fcast_ANN$.mean[1] + 1.96 * RMSE_compare %>%
         filter(.model == "ETS_ANN") %>%
         select(RMSE))/unpack_hilo(US_economy_ANN_hilo, "95%")$`95%_upper`[1], 3)
))


#ETS(A,A,N) model forecasts
US_economy_fcast_AAN <-
  US_economy_fcast %>%
  filter(.model == "ETS_AAN")

#ETS(A,A,N) model forecast prediction intervals
US_economy_AAN_hilo <-
  US_economy_fcast_hilo %>%
  filter(.model == "ETS_AAN")

#95% lower RMSE
print(paste("RMSE 95% Prediction Interval Lower Bound, ETS(A,A,N): ",
            round(US_economy_fcast_AAN$.mean[1] - 1.96 * RMSE_compare %>%
                    filter(.model == "ETS_AAN") %>%
                    select(RMSE), 3)))

#95% lower R-generated
print(paste("95% Prediction Interval Lower Bound, ETS(A,A,N): ",
            round(unpack_hilo(US_economy_AAN_hilo, "95%")$`95%_lower`[1], 3)))

print(paste("RMSE vs. R ratio (95% lower): ", 
      round((US_economy_fcast_AAN$.mean[1] - 1.96 * RMSE_compare %>%
         filter(.model == "ETS_AAN") %>%
         select(RMSE))/unpack_hilo(US_economy_AAN_hilo, "95%")$`95%_lower`[1], 3)
))

#95% upper RMSE
print(paste("RMSE 95% Prediction Interval Upper Bound, ETS(A,A,N): ",
            round(US_economy_fcast_AAN$.mean[1] + 1.96 * RMSE_compare %>%
                    filter(.model == "ETS_AAN") %>%
                    select(RMSE), 3)))

#95% upper R-generated
print(paste("95% Prediction Interval Upper Bound, ETS(A,A,N): ",
            round(unpack_hilo(US_economy_AAN_hilo, "95%")$`95%_upper`[1], 3)))

print(paste("RMSE vs. R ratio (95% upper): ", 
      round((US_economy_fcast_AAN$.mean[1] + 1.96 * RMSE_compare %>%
         filter(.model == "ETS_AAN") %>%
         select(RMSE))/unpack_hilo(US_economy_AAN_hilo, "95%")$`95%_upper`[1], 3)
))


#QUESTION 6

#Pull China's economic data
china_economy <- 
  global_economy %>%
  filter(Country == "China") %>%
  mutate(GDP_M = GDP/1000000) %>% #transform GDP to be quoted in millions
  select(Country,
         Code,
         Year,
         GDP_M)

#Plot of China's annual GDP
china_economy %>%
  autoplot(GDP_M) +
  labs(x = "Year",
       y = "GDP (Nominal)",
       title = "Annual GDP (in millions)",
       subtitle = "China")

#Auto ETS - creates ETS(M,A,N)
china_ETS1 <-
  china_economy %>%
  model(autoETS = ETS(GDP_M))

china_ETS1_fcast <-
  china_ETS1 %>%
  forecast(h = 24)

china_ETS1_fcast %>%
  autoplot(china_economy) +
  labs(x = "Year",
       y = "GDP (Nominal)",
       title = "Annual GDP (in millions)",
       subtitle = "China, ETS(M,A,N)")

#Dampened ETS - creates ETS(M,Ad,N)
china_ETS2 <-
  china_economy %>%
  model(ETS_MAN = ETS(GDP_M),
        ETS_MAdN = ETS(GDP_M ~ error("M") + trend("Ad") + season("N")))

china_ETS2_fcast <-
  china_ETS2 %>%
  forecast(h = 24)

china_ETS2_fcast %>%
  filter(.model == "ETS_MAdN") %>%
  autoplot(china_economy) +
  labs(x = "Year",
       y = "GDP (Nominal)",
       title = "Annual GDP (in millions)",
       subtitle = "China, ETS(M,Ad,N)")


print(rbind(china_ETS2 %>% 
              select(ETS_MAN) %>% 
              accuracy() %>% 
              select(.model, RMSE),
            china_ETS2 %>%
              select(ETS_MAdN) %>%
              accuracy() %>%
              select(.model, RMSE)))


#Add Box-Cox Transformation
lambda_china <-
  china_economy %>%
  features(GDP_M, features = guerrero) %>%
  pull(lambda_guerrero)

china_economy <- 
  china_economy %>%
  mutate(GDP_M_bc = box_cox(GDP_M, lambda_china))

china_economy %>%
  autoplot(GDP_M_bc) +
  labs(x = "Year",
       y = "GDP (Transformed)",
       title = "Annual GDP (millions)",
       subtitle = "China, Box-Cox (lambda = -0.03)")

china_ETSbc <-
  china_economy %>%
  model(ETS_boxcox = ETS(GDP_M_bc))

china_ETSbc_fcast <-
  china_ETSbc %>%
  forecast(h = 24)

china_ETSbc_fcast %>%
  autoplot(china_economy) +
  labs(x = "Year",
       y = "GDP (Transformed)",
       title = "Annual GDP (millions)",
       subtitle = "China, Box-Cox (lambda = -0.03)")

#Add logarithmic transformation
china_economy <-
  china_economy %>%
  mutate(lnGDP_M = log(GDP_M))

china_economy %>%
  autoplot(lnGDP_M) + 
  labs(x = "Year",
       y = "ln(GDP)",
       title = "Annual GDP (millions)",
       subtitle = "China, log-transformed")

china_ETSlog <-
  china_economy %>%
  model(ETS_log = ETS(lnGDP_M))

china_ETSlog_fcast <-
  china_ETSlog %>%
  forecast(h = 24)

china_ETSlog_fcast %>%
  autoplot(china_economy) + 
  labs(x = "Year",
       y = "ln(GDP)",
       title = "Annual GDP (millions)",
       subtitle = "China, log-transformed")


print(rbind(china_ETSbc %>%
              accuracy() %>% 
              select(.model, RMSE),
            china_ETSlog %>%
              accuracy() %>%
              select(.model, RMSE)))


#QUESTION 7

#Create aus_gas dataset
aus_gas <- 
  aus_production %>%
  select(Quarter,
         Gas)

#plot of quarterly gas production
aus_gas %>%
  autoplot(Gas) + 
  labs(x = "Quarter",
       y = "Gas Production",
       title = "Quarterly Gas Production",
       subtitle = "Australia")


#ETS model creation of quarterly gas production
aus_gas_ETS <-
  aus_gas %>%
  model(ETSauto = ETS(Gas),
        ETS_MAdM = ETS(Gas ~ error("M") + trend("Ad") + season("M")))

#forecast for 8 quarters (2 years)
aus_gas_ETSfcast <-
  aus_gas_ETS %>%
  forecast(h = 8)

#Plot of auto generated ETS forecasts
aus_gas_ETSfcast %>%
  filter(.model == "ETSauto") %>%
  autoplot(aus_gas) + 
  labs(x = "Quarter",
       y = "Gas Production",
       title = "Quarterly Gas Production",
       subtitle = "Australia, ETS(M,A,M)")

#Plot of trend-damped ETS forecasts
aus_gas_ETSfcast %>%
  filter(.model == "ETS_MAdM") %>%
  autoplot(aus_gas) + 
  labs(x = "Quarter",
       y = "Gas Production",
       title = "Quarterly Gas Production",
       subtitle = "Australia, ETS(M,Ad,M)")

#Model comparisons
print(glance(aus_gas_ETS))

rbind(aus_gas_ETS %>% select(ETSauto) %>% accuracy(),
      aus_gas_ETS %>% select(ETS_MAdM) %>% accuracy())


#QUESTION 8A

#aus_retail_series definition
set.seed(12345678)
aus_retail_series <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

aus_retail_series %>%
  autoplot(Turnover) +
  labs(x = "Month",
       y = "Turnover",
       title = "Monthly Retail Turnover")


#QUESTION 8B

aus_retail_ETS <-
  aus_retail_series %>%
  model(ETS_MAM = ETS(Turnover ~ error("M") + trend("A") + season("M")),
        ETS_MAdM = ETS(Turnover ~ error("M") + trend("Ad") + season("M")))

print(glance(aus_retail_ETS) %>%
        select(.model,
               sigma2:MAE))


#QUESTION 8C

aus_retail_MAM <-
  aus_retail_ETS %>%
  select(ETS_MAM)

aus_retail_MAdM <-
  aus_retail_ETS %>%
  select(ETS_MAdM)

print(rbind(
  aus_retail_MAM %>% accuracy() %>% select(.model, RMSE),
  aus_retail_MAdM %>% accuracy() %>% select(.model, RMSE)
))

aus_retail_series %>%
  autoplot(Turnover) +
  labs(x = "Month",
       y = "Turnover",
       title = "Monthly Retail Turnover") +
  geom_line(data = fitted(aus_retail_MAM),
            aes(x = Month,
                y = .fitted),
            lty = "dotted",
            col = "light blue") +
  geom_line(data = fitted(aus_retail_MAdM),
            aes(x = Month,
                y = .fitted),
            lty = "dotted",
            col = "red") +
  theme_bw()


#QUESTION 8D

aus_retail_MAM %>%
  gg_tsresiduals() +
  labs(title = "Residual Plots",
       subtitle = "ETS(M,A,M)")

augment(aus_retail_MAM) %>%
  features(.innov, ljung_box, lag = 24, dof = 15)


#QUESTION 8E

aus_retail_training <-
  aus_retail_series %>%
  filter(Month < yearmonth("2011-01-01"))

test_start <- length(rownames(aus_retail_training))+1

aus_retail_testing <-
  aus_retail_series[test_start:length(rownames(aus_retail_series)),]

aus_training_models <-
  aus_retail_training %>%
  model(ETS_MAM = ETS(Turnover ~ error("M") + trend("A") + season("M")),
        NAIVE = SNAIVE(Turnover))

aus_testing_forecast <-
  aus_training_models %>%
  forecast(h = length(rownames(aus_retail_testing)))

aus_testing_forecast %>%
  filter(.model == "ETS_MAM") %>%
  autoplot(aus_retail_testing) +
  labs(x = "Month",
       y = "Turnover",
       title = "Monthly Retail Turnover",
       subtitle = "Actual vs. Forecast, ETS(M,A,M)")

aus_testing_forecast %>%
  filter(.model == "NAIVE") %>%
  autoplot(aus_retail_testing) +
  labs(x = "Month",
       y = "Turnover",
       title = "Monthly Retail Turnover",
       subtitle = "Actual vs. Forecast, NAIVE")

print(rbind(
  aus_training_models %>% select(ETS_MAM) %>% 
    accuracy() %>% select(.model, RMSE),
  aus_training_models %>% select(NAIVE) %>%
    accuracy() %>% select(.model, RMSE),
  aus_testing_forecast %>% filter(.model == "ETS_MAM") %>%
    accuracy(aus_retail_testing) %>% select(.model, RMSE),
  aus_testing_forecast %>% filter(.model == "NAIVE") %>%
    accuracy(aus_retail_testing) %>% select(.model, RMSE)
))


#QUESTION 9

#Determine lambda for Box-Cox Transformation
lambda_retail <-
  aus_retail_series %>%
    features(Turnover, features = guerrero) %>%
    pull(lambda_guerrero)

#Create new Box-Cox transformed Turnover variable (new tsibble)
aus_retail_seriesbc <- 
  aus_retail_series %>%
  mutate(Turnover = box_cox(Turnover, lambda_retail))

#STL Decomposition
aus_retail_STLbc <-
  aus_retail_seriesbc %>%
  model(STL = STL(Turnover)) %>%
  components()

#Add seasonally adjusted variable to new tsibble (inverse box-cox)
aus_retail_seriesbc <-
  aus_retail_seriesbc %>%
  mutate(Turnover_SA = inv_box_cox(aus_retail_STLbc$season_adjust, lambda_retail))

#Create training dataset (pre-2011)
aus_STLbc_training <-
  aus_retail_seriesbc %>%
  filter(Month < yearmonth('2011-01-01')) %>%
  select(State,
         Industry,
         `Series ID`,
         Month,
         Turnover_SA)

aus_STLbc_training_plot <-
  aus_STLbc_training %>%
  select(Month,
         Turnover_SA)

#Create testing dataset (post-2010)
test_start <- length(rownames(aus_STLbc_training)) + 1

aus_STLbc_testing <-
  aus_retail_seriesbc[test_start:length(rownames(aus_retail_STLbc)),] %>%
  select(Month,
         Turnover_SA)

#ETS model on seasonally adjusted turnover (adjusted to real value)
aus_STLbc_ETS <-
  aus_STLbc_training %>%
  model(ETS_STLbc = ETS(Turnover_SA))

#Plot forecast against test data
aus_STLbc_forecast <-
  aus_STLbc_ETS %>%
  select(ETS_STLbc) %>%
  forecast(h = length(rownames(aus_STLbc_testing)))

aus_STLbc_forecast %>%
  autoplot(aus_STLbc_testing) +
  labs(x = "Month",
       y = "Turnover",
       title = "Monthly Seasonally Adjusted Turnover",
       subtitle = "ETS(A,Ad,N)")

print(rbind(
  aus_STLbc_forecast %>% accuracy(aus_STLbc_testing) %>% select (.model, RMSE),
  aus_testing_forecast %>% filter(.model == "ETS_MAM") %>%
    accuracy(aus_retail_testing) %>% select(.model, RMSE),
  aus_testing_forecast %>% filter(.model == "NAIVE") %>%
    accuracy(aus_retail_testing) %>% select(.model, RMSE)
))


#QUESTION 10A

#Create summary of tourism data
tourism_summ <- 
  tourism %>%
    select(Quarter, Region, State, Purpose, Trips) %>%
    summarise(Trips = sum(Trips))

#Plot tourism summary data
tourism_summ %>%
  autoplot(Trips)


#QUESTION 10B

#STL Decomposition of tourism trips data
tourism_summ_STL <-
  tourism_summ %>%
  model(STL(Trips)) %>%
  components()

#Plot of STL Decomposition
tourism_summ_STL %>%
  autoplot()

#Variable definition --> seasonally adjusted data
tourism_summ_STLSA <- tourism_summ_STL$season_adjust


#QUESTION 10C

#Create new tourism summary tsibble with seasonally adjusted trips values
tourism_summ_SA <-
  tourism_summ %>%
  mutate(Trips_SA = tourism_summ_STLSA) %>%
  select(Quarter, Trips_SA)

tourism_SA_ETS <-
  tourism_summ_SA %>%
  model(ETS_AAdN = ETS(Trips_SA ~ error("A") + trend("Ad") + season("N")),
        ETS_AAN = ETS(Trips_SA ~ error("A") + trend("A") + season("N")))

tourism_SA_forecast <-
  tourism_SA_ETS %>%
  forecast(h = 8)

tourism_SA_forecast %>%
  filter(.model == "ETS_AAdN") %>%
  autoplot(tourism_summ_SA) +
  labs(x = "Quarter",
       y = "Trips",
       title = "Quarterly Overnight Domestic Trips",
       subtitle = "Australia, Seasonally-Adjusted, ETS(A,Ad,N)")


#QUESTION 10D

#Create new tourism summary tsibble with seasonally adjusted trips values
tourism_SA_forecast %>%
  filter(.model == "ETS_AAN") %>%
  autoplot(tourism_summ_SA) +
  labs(x = "Quarter",
       y = "Trips",
       title = "Quarterly Overnight Domestic Trips",
       subtitle = "Australia, Seasonally-Adjusted, ETS(A,A,N)")


#QUESTION 10E

#Generate seasonal ETS model on original Trips data
tourism_summ_ETS <-
  tourism_summ %>%
  model(ETS_auto = ETS(Trips))

#Print report of trips ETS model
print(report(tourism_summ_ETS))

#Forecast 2 years (for question 10g)
tourism_summ_forecast <-
  tourism_summ_ETS %>%
  forecast(h = 8)


#QUESTION 10F

print(rbind(
  tourism_SA_ETS %>% select(ETS_AAdN) %>% accuracy() %>% select(.model, RMSE),
  tourism_SA_ETS %>% select(ETS_AAN) %>% accuracy() %>% select(.model, RMSE),
  tourism_summ_ETS %>% accuracy() %>% select(.model, RMSE)
))


#QUESTION 10G

tourism_SA_forecast %>%
  filter(.model == "ETS_AAdN") %>%
  autoplot(tourism_summ_SA) +
  labs(x = "Quarter",
       y = "Trips",
       title = "Quarterly Overnight Domestic Trips",
       subtitle = "Australia, Seasonally-Adjusted, ETS(A,Ad,N)")

tourism_SA_forecast %>%
  filter(.model == "ETS_AAN") %>%
  autoplot(tourism_summ_SA) +
  labs(x = "Quarter",
       y = "Trips",
       title = "Quarterly Overnight Domestic Trips",
       subtitle = "Australia, Seasonally-Adjusted, ETS(A,A,N)")

tourism_summ_forecast %>%
  autoplot(tourism_summ) +
  labs(x = "Quarter",
       y = "Trips",
       title = "Quarterly Overnight Domestic Trips",
       subtitle = "Australia, ETS(A,A,A)")


#QUESTION 10H

tourism_summ_ETS %>%
  gg_tsresiduals()


tourism_SA_ETS %>%
  select(ETS_AAdN) %>%
  gg_tsresiduals() +
  labs(title = "ETS(A,Ad,N)")

tourism_SA_ETS %>%
  select(ETS_AAN) %>%
  gg_tsresiduals() +
  labs(title = "ETS(A,A,N)")


#QUESTION 11A

aus_arrivals_NZ <-
  aus_arrivals %>%
    filter(Origin == "NZ")

aus_arrivals_NZ %>%
  autoplot(Arrivals) +
  labs(x = "Quarter",
       y = "Arrivals",
       title = "Quarterly Arrivals from New Zealand")


#QUESTION 11B
train_end <- length(rownames(aus_arrivals_NZ)) - 8
test_start <- train_end + 1
test_end <- length(rownames(aus_arrivals_NZ))

aus_NZ_train <- aus_arrivals_NZ[1:train_end,]

aus_NZ_test <- aus_arrivals_NZ[test_start:test_end,]

aus_NZ_ETS <-
  aus_NZ_train %>%
  model(ETS_MAM = ETS(Arrivals ~ error("M") + trend("A") + season("M")),
        ETS_MAdM = ETS(Arrivals ~ error("M") + trend("Ad") + season("M")))

aus_NZ_forecast <-
  aus_NZ_ETS %>%
  forecast(h = length(rownames(aus_NZ_test)))

aus_NZ_forecast %>%
  autoplot(aus_NZ_test) +
  labs(title = "Quarterly Arrivals from New Zealand",
       subtitle = "ETS Forecast Comparisons") +
  theme_bw()


#QUESTION 11D

#ETS model (Auto selection)
aus_NZ_ETSauto <-
  aus_NZ_train %>%
  model(ETS_auto = ETS(Arrivals))

aus_NZ_ETSauto_fc <-
  aus_NZ_ETSauto %>%
  forecast(h = length(rownames(aus_NZ_test)))

aus_NZ_ETSauto_fc %>%
  autoplot(aus_NZ_test) +
  labs(title = "Quarterly Arrivals from New Zealand",
       subtitle = aus_NZ_ETSauto$ETS_auto)

#ETS(A,A,A) Model on log(Arrivals)
aus_NZ_ETSAAA <-
  aus_NZ_train %>%
  model(ETS_AAA = ETS(log(Arrivals) ~ error("A") + trend("A") + season("A")))

aus_NZ_ETSAAA_fc <-
  aus_NZ_ETSAAA %>%
  forecast(h = length(rownames(aus_NZ_test)))

aus_NZ_ETSAAA_fc %>%
  autoplot(aus_NZ_test) +
  labs(title = "Quarterly Arrivals from New Zealand",
       subtitle = aus_NZ_ETSAAA$ETS_AAA)

#Seasonal Naive model
aus_NZ_SNAIVE <-
  aus_NZ_train %>%
  model(NAIVE_S = SNAIVE(Arrivals))

aus_NZ_SNAIVE_fc <-
  aus_NZ_SNAIVE %>%
  forecast(h = length(rownames(aus_NZ_test)))

aus_NZ_SNAIVE_fc %>%
  autoplot(aus_NZ_test) +
  labs(title = "Quarterly Arrivals from New Zealand",
       subtitle = aus_NZ_SNAIVE$NAIVE_S)

#STL Decomp on log(arrivals) - ETS on season_adjust
#Create new dataset with logArrivals
aus_larrivals_NZ <-
  aus_arrivals_NZ %>%
  mutate(logArrivals = log(Arrivals)) %>%
  select(Quarter,
         Origin,
         logArrivals)

#Decompose with STL() (Auto Selection)
aus_NZ_STL <-
  aus_larrivals_NZ %>%
  model(STL(logArrivals)) %>%
  components()

#New dataset with Arrivals as seasonally adjusted values (inv log transform)
aus_arrivals_NZ_STL <-
  aus_arrivals_NZ %>%
  mutate(Arrivals = aus_NZ_STL$season_adjust)

#Training and test data
aus_NZ_train_STL <- aus_arrivals_NZ_STL[1:train_end,]
aus_NZ_test_STL <- aus_arrivals_NZ_STL[test_start:test_end,]

#ETS model on seasonally adjusted data (Auto selection)
aus_NZ_ETSSTL <-
  aus_NZ_train_STL %>%
  model(ETS_STL = ETS(Arrivals))

aus_NZ_ETSSTL_fc <-
  aus_NZ_ETSSTL %>%
  forecast(h = length(rownames(aus_NZ_test_STL)))

aus_NZ_ETSSTL_fc %>%
  autoplot(aus_NZ_test_STL) +
  labs(x = "Quarter",
       y = "Arrivals",
       title = "Quarterly Arrivals from New Zealand",
       subtitle = paste("Seasonally Adjusted Values, ", aus_NZ_ETSSTL$ETS_STL))

#Model accuracy metrics
print(rbind(aus_NZ_ETSauto_fc %>% 
        accuracy(aus_NZ_test) %>% select(.model, RMSE:MAPE),
      aus_NZ_ETSAAA_fc %>%
        accuracy(aus_NZ_test) %>% select(.model, RMSE:MAPE),
      aus_NZ_SNAIVE_fc %>%
        accuracy(aus_NZ_test) %>% select(.model, RMSE:MAPE),
      aus_NZ_ETSSTL_fc %>%
        accuracy(aus_NZ_test_STL) %>% select(.model, RMSE:MAPE)
))


#QUESTION 11E

aus_NZ_ETSauto %>%
  gg_tsresiduals() +
  labs(title = "ETS(M,A,M)")


#QUESTION 11F

#CROSS-VALIDATION
#Create new tsibble - initial @ 30 rows, increase by 1 thereafter
aus_arrivals_NZtr <-
  aus_arrivals_NZ %>%
  stretch_tsibble(.init = 30, .step = 1) %>%
  relocate(Quarter, Origin, .id)

#Auto ETS
aus_NZ_ETSauto <-
  aus_arrivals_NZtr %>%
  model(ETS_auto = ETS(Arrivals))

aus_NZ_ETSauto_fc <-
  aus_NZ_ETSauto %>%
  forecast(h = 1)

#Additive ETS
aus_NZ_ETSAAA <-
  aus_arrivals_NZtr %>%
  model(ETS_AAA = ETS(log(Arrivals) ~ error("A") + trend("A") + season("A")))

aus_NZ_ETSAAA_fc <-
  aus_NZ_ETSAAA %>%
  forecast(h = 1)

#NAIVE Model
aus_NZ_SNAIVE <-
  aus_arrivals_NZtr %>%
  model(NAIVE_S = SNAIVE(Arrivals))

aus_NZ_SNAIVE_fc <-
  aus_NZ_SNAIVE %>%
  forecast(h = 1)

#STL Decomp on log-transformation then ETS
aus_larrivals_NZtr <-
  aus_arrivals_NZtr %>%
  mutate(logArrivals = log(Arrivals)) %>%
  select(Quarter,
         Origin,
         logArrivals)

aus_NZ_STL <-
  aus_larrivals_NZtr %>%
  model(STL(logArrivals)) %>%
  components()

aus_arrivals_NZ_STLtr <-
  aus_arrivals_NZtr %>%
  mutate(Arrivals = aus_NZ_STL$season_adjust)

aus_NZ_ETSSTL <-
  aus_arrivals_NZ_STLtr %>%
  model(ETS_STL = ETS(Arrivals))

aus_NZ_ETSSTL_fc <-
  aus_NZ_ETSSTL %>%
  forecast(h = 1)

print(rbind(aus_NZ_ETSauto_fc %>% accuracy(aus_arrivals_NZ),
            aus_NZ_ETSAAA_fc %>% accuracy(aus_arrivals_NZ),
            aus_NZ_SNAIVE_fc %>% accuracy(aus_arrivals_NZ),
            aus_NZ_ETSSTL_fc %>% accuracy(aus_arrivals_NZ_STL)
))


#QUESTION 12A

#Portland Cement Production dataset
aus_cement <-
  aus_production %>%
  select(Quarter,
         Cement)

#CROSS-VALIDATION Dataset
aus_cement_tr <-
  aus_cement %>%
  stretch_tsibble(.init = 5, .step = 1) %>%
  relocate(Quarter, .id)

#ETS Model Development
aus_cement_tr_ETS <-
  aus_cement_tr %>%
  model(ETSauto = ETS(Cement)) %>%
  forecast(h = 4) %>%
  group_by (.id) %>%
  mutate(h = row_number()) %>%
  ungroup()

aus_cement_tr_SNAIVE <-
  aus_cement_tr %>%
  model(NAIVE_S = SNAIVE(Cement)) %>%
  forecast(h = 4) %>%
  group_by (.id) %>%
  mutate(h = row_number()) %>%
  ungroup()


#QUESTION 12B

print(rbind(
  aus_cement_tr_ETS %>%
    accuracy(aus_cement, by = c("h", ".model")) %>% select(h, .model, RMSE),
  aus_cement_tr_SNAIVE %>% 
    accuracy(aus_cement, by = c("h", ".model")) %>% select(h, .model, RMSE)
))

aus_ETS_accuracy <-
  aus_cement_tr_ETS %>%
    accuracy(aus_cement, by = c("h", ".model"))

aus_SNAIVE_accuracy <-
  aus_cement_tr_SNAIVE %>%
    accuracy(aus_cement, by = c("h", ".model"))

aus_ETS_accuracy %>%
  ggplot(aes(x = h, y = RMSE)) +
  geom_point() +
  geom_point(data = aus_SNAIVE_accuracy,
             aes(x = h, y = RMSE),
             col = "red") +
  labs(x = "h-value (forecast period)",
       y = "RMSE Value",
       title = "RMSE Value Comparison",
       subtitle = "ETS (black) vs. SNAIVE (red)")


#QUESTION 13

#Beer Production
model_tsibble <-
  aus_production %>%
  filter(is.na(Beer)==FALSE) %>%
  mutate(model_var = Beer) %>%
  select(Quarter,
         model_var)

model_tsibble %>%
  autoplot(model_var) + 
  labs(x = "Quarter",
       y = "Beer Amount",
       title = "Quarterly Beer Production")

train_end <- length(rownames(model_tsibble))-12 #3 years
test_start <- train_end + 1
test_end <- length(rownames(model_tsibble))

model_train <- model_tsibble[1:train_end,]
model_test <- model_tsibble[test_start:test_end,]

lambda_model <-
  model_train %>%
  features(model_var, features = guerrero) %>%
  pull(lambda_guerrero)

if(lambda_model < 0.5) {

  model_house <-
    model_train %>%
    model(ETSauto = ETS(model_var),
          NAIVE_S = SNAIVE(model_var),
          mod_dcmp = decomposition_model(
            STL(log(model_var) ~ season(window = Inf)),
            ETS(season_adjust ~ season("N")), SNAIVE(season_year))
    )
  
} else {
  
  model_house <-
    model_train %>%
    model(ETSauto = ETS(model_var),
          NAIVE_S = SNAIVE(model_var),
          mod_dcmp = decomposition_model(
            STL(model_var ~ season(window = Inf)),
            ETS(season_adjust ~ season("N")), SNAIVE(season_year))
    )
  
}

model_house_fc <-
  model_house %>%
  forecast(h = length(rownames(model_test)))

model_house_fc %>%
  autoplot(model_test) +
  labs(x = "Quarter",
       y = "Beer Amount",
       title = "Quarterly Beer Production - 3-year Forecast",
       subtitle = "ETS v SNAIVE v Decomposition") +
  theme_bw()

print(model_house_fc %>% accuracy(model_test) %>% select(.model, RMSE:MAPE))


#Bricks Production
model_tsibble <-
  aus_production %>%
  filter(is.na(Bricks)==FALSE) %>%
  mutate(model_var = Bricks) %>%
  select(Quarter,
         model_var)

model_tsibble %>%
  autoplot(model_var) + 
  labs(x = "Quarter",
       y = "Bricks Amount",
       title = "Quarterly Bricks Production")

train_end <- length(rownames(model_tsibble))-12 #3 years
test_start <- train_end + 1
test_end <- length(rownames(model_tsibble))

model_train <- model_tsibble[1:train_end,]
model_test <- model_tsibble[test_start:test_end,]

lambda_model <-
  model_train %>%
  features(model_var, features = guerrero) %>%
  pull(lambda_guerrero)

if(lambda_model < 0.5) {
  
  model_house <-
    model_train %>%
    model(ETSauto = ETS(model_var),
          NAIVE_S = SNAIVE(model_var),
          mod_dcmp = decomposition_model(
            STL(log(model_var) ~ season(window = Inf)),
            ETS(season_adjust ~ season("N")), SNAIVE(season_year))
    )
  
} else {
  
  model_house <-
    model_train %>%
    model(ETSauto = ETS(model_var),
          NAIVE_S = SNAIVE(model_var),
          mod_dcmp = decomposition_model(
            STL(model_var ~ season(window = Inf)),
            ETS(season_adjust ~ season("N")), SNAIVE(season_year))
    )
  
}

model_house_fc <-
  model_house %>%
  forecast(h = length(rownames(model_test)))

model_house_fc %>%
  autoplot(model_test) +
  labs(x = "Quarter",
       y = "Bricks Amount",
       title = "Quarterly Bricks Production - 3-year Forecast",
       subtitle = "ETS v SNAIVE v Decomposition") +
  theme_bw()

print(model_house_fc %>% accuracy(model_test) %>% select(.model, RMSE:MAPE))


#Diabetes
model_tsibble <-
  PBS %>%
  filter(ATC2 == "A10") %>%
  summarize(model_var = sum(Cost))

model_tsibble %>%
  autoplot(model_var) + 
  labs(x = "Month",
       y = "Cost",
       title = "Monthly Cost Diabetes Subsidies")

train_end <- length(rownames(model_tsibble))-12 #3 years
test_start <- train_end + 1
test_end <- length(rownames(model_tsibble))

model_train <- model_tsibble[1:train_end,]
model_test <- model_tsibble[test_start:test_end,]

lambda_model <-
  model_train %>%
  features(model_var, features = guerrero) %>%
  pull(lambda_guerrero)

if(lambda_model < 0.5) {
  
  model_house <-
    model_train %>%
    model(ETSauto = ETS(model_var),
          NAIVE_S = SNAIVE(model_var),
          mod_dcmp = decomposition_model(
            STL(log(model_var) ~ season(window = Inf)),
            ETS(season_adjust ~ season("N")), SNAIVE(season_year))
    )
  
} else {
  
  model_house <-
    model_train %>%
    model(ETSauto = ETS(model_var),
          NAIVE_S = SNAIVE(model_var),
          mod_dcmp = decomposition_model(
            STL(model_var ~ season(window = Inf)),
            ETS(season_adjust ~ season("N")), SNAIVE(season_year))
    )
  
}

model_house_fc <-
  model_house %>%
  forecast(h = length(rownames(model_test)))

model_house_fc %>%
  autoplot(model_test) +
  labs(x = "Month",
       y = "Cost",
       title = "Monthly Cost of Diabetes Subsidies - 3-year Forecast",
       subtitle = "ETS v SNAIVE v Decomposition") +
  theme_bw()

print(model_house_fc %>% accuracy(model_test) %>% select(.model, RMSE:MAPE))


#Corticosteroids
model_tsibble <-
  PBS %>%
  filter(ATC2 == "H02") %>%
  summarize(model_var = sum(Cost))

model_tsibble %>%
  autoplot(model_var) + 
  labs(x = "Month",
       y = "Cost",
       title = "Monthly Cost Corticosteroids Subsidies")

train_end <- length(rownames(model_tsibble))-12 #3 years
test_start <- train_end + 1
test_end <- length(rownames(model_tsibble))

model_train <- model_tsibble[1:train_end,]
model_test <- model_tsibble[test_start:test_end,]

lambda_model <-
  model_train %>%
  features(model_var, features = guerrero) %>%
  pull(lambda_guerrero)

if(lambda_model < 0.5) {
  
  model_house <-
    model_train %>%
    model(ETSauto = ETS(model_var),
          NAIVE_S = SNAIVE(model_var),
          mod_dcmp = decomposition_model(
            STL(log(model_var) ~ season(window = Inf)),
            ETS(season_adjust ~ season("N")), SNAIVE(season_year))
    )
  
} else {
  
  model_house <-
    model_train %>%
    model(ETSauto = ETS(model_var),
          NAIVE_S = SNAIVE(model_var),
          mod_dcmp = decomposition_model(
            STL(model_var ~ season(window = Inf)),
            ETS(season_adjust ~ season("N")), SNAIVE(season_year))
    )
  
}

model_house_fc <-
  model_house %>%
  forecast(h = length(rownames(model_test)))

model_house_fc %>%
  autoplot(model_test) +
  labs(x = "Month",
       y = "Cost",
       title = "Monthly Cost of Corticosteroids Subsidies - 3-year Forecast",
       subtitle = "ETS v SNAIVE v Decomposition") +
  theme_bw()

print(model_house_fc %>% accuracy(model_test) %>% select(.model, RMSE:MAPE))


#Food Retailing
model_tsibble <-
  aus_retail %>%
  filter(Industry == "Food retailing") %>%
  summarize(model_var = mean(Turnover))

model_tsibble %>%
  autoplot(model_var) + 
  labs(x = "Month",
       y = "Turnover",
       title = "Monthly Turnover, Food Retailing")

train_end <- length(rownames(model_tsibble))-12 #3 years
test_start <- train_end + 1
test_end <- length(rownames(model_tsibble))

model_train <- model_tsibble[1:train_end,]
model_test <- model_tsibble[test_start:test_end,]

lambda_model <-
  model_train %>%
  features(model_var, features = guerrero) %>%
  pull(lambda_guerrero)

if(lambda_model < 0.5) {
  
  model_house <-
    model_train %>%
    model(ETSauto = ETS(model_var),
          NAIVE_S = SNAIVE(model_var),
          mod_dcmp = decomposition_model(
            STL(log(model_var) ~ season(window = Inf)),
            ETS(season_adjust ~ season("N")), SNAIVE(season_year))
    )
  
} else {
  
  model_house <-
    model_train %>%
    model(ETSauto = ETS(model_var),
          NAIVE_S = SNAIVE(model_var),
          mod_dcmp = decomposition_model(
            STL(model_var ~ season(window = Inf)),
            ETS(season_adjust ~ season("N")), SNAIVE(season_year))
    )
  
}

model_house_fc <-
  model_house %>%
  forecast(h = length(rownames(model_test)))

model_house_fc %>%
  autoplot(model_test) +
  labs(x = "Month",
       y = "Turnover",
       title = "Monthly Turnover, Food Retailing - 3-year Forecast",
       subtitle = "ETS v SNAIVE v Decomposition") +
  theme_bw()

print(model_house_fc %>% accuracy(model_test) %>% select(.model, RMSE:MAPE))


#QUESTION 14A - TOURISM

tourism_total <-
  tourism %>%
  summarize(Trips = sum(Trips))

tourism_train_end <- length(rownames(tourism_total))-8 #2 years of forecasts
tourism_test_start <- tourism_train_end + 1
tourism_test_end <- length(rownames(tourism_total))

tourism_train <- tourism_total[1:tourism_train_end,]
tourism_test <- tourism_total[tourism_test_start:tourism_test_end,]

tourism_train %>%
  autoplot(Trips) +
  labs(x = "Quarter",
       y = "Trips",
       title = "Quarterly Trips - Australia",
       subtitle = "Figure 1")

tourism_ETS <-
  tourism_train %>%
  model(ETSauto = ETS(Trips),
        ETS_MAdM = ETS(Trips ~ error("M") + trend("Ad") + season("M")))

tourism_ETS_fc <-
  tourism_ETS %>%
  forecast(h = length(rownames(tourism_test)))

tourism_ETS_fc %>%
  autoplot(tourism_test) +
  labs(x = "Quarter",
       y = "Trips",
       title = "Quarterly Trips - 2-year Forecast",
       subtitle = tourism_ETS$ETSauto)

tourism_ETS_fc %>% accuracy(tourism_test) %>% select(.model, RMSE:MAPE)

tourism_ETS %>%
  select(ETSauto) %>%
  gg_tsresiduals() +
  labs(title = tourism_ETS$ETSauto)

tourism_ETS %>%
  select(ETS_MAdM) %>%
  gg_tsresiduals() +
  labs(title = tourism_ETS$ETS_MAdM)


#QUESTION 14A, CLOSING PRICES

gafa_stock_close <- data.frame(gafa_stock)

gafa_stock_close2 <- 
  gafa_stock_close %>%
  mutate(Month = yearmonth(Date)) %>%
  select(Month, Symbol, Close)

gafa_stock_close3 <- 
  gafa_stock_close2 %>%
  group_by(Month, Symbol) %>%
  summarise_at(vars(Close), list(Close = mean)) %>%
  ungroup()

gafa_stock_closef <-
  gafa_stock_close3 %>%
  as_tsibble(key = Symbol, index = Month)
  
gafa_stock_symbols <- unique(gafa_stock_close$Symbol)

for(i in 1:length(gafa_stock_symbols)) {
  
  print(paste("Model results for ", gafa_stock_symbols[i]))
  
  model_tsibble <-
    gafa_stock_closef %>%
    filter(Symbol == gafa_stock_symbols[i])
  
  model_train_end <- length(rownames(model_tsibble)) - 3 #3-month forecast
  model_test_start <- model_train_end + 1
  model_test_end <- length(rownames(model_tsibble))
  
  model_train <- model_tsibble[1:model_train_end,]
  model_test <- model_tsibble[model_test_start:model_test_end,]
  
  model_train_ETS <-
    model_train %>%
    model(ETSauto = ETS(Close))
  
  model_train_fc <-
    model_train_ETS %>%
    forecast(h = length(rownames(model_test)))
  
  print(
    model_train_fc %>%
    autoplot(model_test) +
    labs(x = "Date",
         y = "Close Price",
         title = "Daily Close Price - 3-month Forecast",
         subtitle = paste(model_train_ETS$ETSauto, ", ",gafa_stock_symbols[i]))
  )
  
  print(
  model_train_ETS %>%
    gg_tsresiduals() +
    labs(title = paste(model_train_ETS$ETSauto, ", ", gafa_stock_symbols[i]))
  )
  
  print(model_train_fc %>% accuracy(model_test) %>% select(.model, RMSE:MAPE))
  
}


#QUESTION 14A, LYNX

pelt_lynx <-
  pelt %>%
  select(Year, Lynx) %>%
  as_tsibble(index = Year)

pelt_train_end <- length(rownames(pelt_lynx)) - 4
pelt_test_start <- pelt_train_end + 1
pelt_test_end <- length(rownames(pelt_lynx))

pelt_lynx_train <- pelt_lynx[1:pelt_train_end,]
pelt_lynx_test <- pelt_lynx[pelt_test_start:pelt_test_end,]

pelt_lynx[1:pelt_test_start,] %>%
  autoplot(Lynx) +
  labs(x = "Year",
      y = "Lynx",
      title = "Yearly Lynx") +
  geom_line(data = pelt_lynx_test,
            aes(x = Year,
                y = Lynx),
            col = "red")

pelt_lynx_ETS <-
  pelt_lynx_train %>%
  model(ETSauto = ETS(Lynx))

pelt_lynx_fc <-
  pelt_lynx_ETS %>%
  forecast(h = length(rownames(pelt_lynx_test)))

pelt_lynx_fc %>%
  autoplot(pelt_lynx_test) +
  labs(x = "Year",
       y = "Lynx",
       title = "Yearly Lynx - 4-year Forecast",
       subtitle = pelt_lynx_ETS$ETSauto)

print(pelt_lynx_fc %>% accuracy(pelt_lynx_test) %>% select(.model, RMSE:MAPE))

Hyndman, Chapter 8 Ex.

Timothy White

4/16/2022

Question 1

Part a

Part b

Question 2

Question 3

Question 4

Question 5

Part a

Part b

Part c

Part d

Part e

Part f

Question 6

Question 7

Question 8

Part a

Part b

Part c

Part d

Part e

Question 9

Question 10

Part a

Part b

Part c

Part d

Part e

Part f

Part g

Part h

Question 11

Part a

Part b

Part c

Part d

Part e

Part f

Question 12

Part a

Part b

Question 13

Beer Production

Bricks Production

Diabetes

Corticosteroids

Food Retailing Turnover

Question 14

Part a, Tourism Models

Part a, Closing Prices

Part a, Lynx Series

Part b

R Code