library(fpp3)
library(tsibble)
library(cowplot)

8.1

a.

We first look at the overall plot, with the one-step-ahead fitted values alongside the data in orange, representing the weighted average of all of the observations in the series \(y_1,…,y_T\). We can use it to visually understand how the model is attempting to fit itself to the data.

vic_pigs  <- aus_livestock %>% filter(Animal=="Pigs" & State =="Victoria" ) 


fit_pigs <- vic_pigs %>%
  model(  ETS(Count ~ error("A") + trend("N") + season("N"))) 
  
fc_pigs <- fit_pigs %>%
  forecast(h = 4) 

fc_pigs %>%
  autoplot(vic_pigs,level=NULL) +
  geom_line(aes(y = .fitted), col="#D55E00",
            data = augment(fit_pigs)) +
  labs(y=  "Number of Pigs Slaughtered", title="Pig Slaughter in Victoria") +
  guides(colour = "none")

We can use the report function to find the optimal values of \(\alpha\) and \(\ell_0\). We find our \(\alpha\) as 0.3221247 and our \(\ell_0\) as 100646.6.

fit_pigs %>% report()

## Series: Count 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.3221247 
## 
##   Initial states:
##      l[0]
##  100646.6
## 
##   sigma^2:  87480760
## 
##      AIC     AICc      BIC 
## 13737.10 13737.14 13750.07

We can more accurately view the forecast by looking at the last year given in the data.

fc_pigs %>%
  autoplot(filter(vic_pigs,year(Month) == 2018 ))  +
 
  labs(y=  "Number of Pigs Slaughtered", title="Pig Slaughter in Victoria") +
  guides(colour = "none")

b.

We can calculate our interval as shown below. We get an interval of \(76871 \le\hat{y} \le 113502\)

  fc_pigs$Count[1] + c(-1, 1) * 1.96 * sd(augment(fit_pigs)$.resid)

## <distribution[2]>
## [1] N(76871, 8.7e+07)  N(113502, 8.7e+07)

We see the R calculation is slightly different. It likely uses a combination of a different value than our 1.96 and accounts for the degrees of freedom more properly.

fc_pigs %>%
  mutate(interval = hilo(Count, 95)) %>% pull(interval)

## <hilo[4]>
## [1] [76854.79, 113518.3]95 [75927.17, 114445.9]95 [75042.22, 115330.9]95
## [4] [74194.54, 116178.6]95

8.5

a.

We view the plot for Germany below. As we can see, the volume of exports increases overtime, with a noticeable dip around 2010.

Germany <- global_economy %>% filter(Country=="Germany" & !is.na(Exports))
 
autoplot(Germany,Exports)

b.

fit_A_N_N_ger  <- Germany  %>%
model(  ETS(Exports ~ error("A") + trend("N") + season("N"))) 

fit_A_N_N_ger %>% forecast( h=10) %>%

autoplot(Germany)

c.

Our RMSE is shown below as 1.759358.

fit_A_N_N_ger %>% accuracy()

d.

We find that the (A,A,N) model has a lower RMSE which indicates that it performs better than our previous model.

fit_A_A_N_ger  <- Germany  %>%
model(  ETS(Exports ~ error("A") + trend("A") + season("N"))) 

fit_A_A_N_ger  %>% accuracy()

e.

We see that the ANN model doesn’t incorporate the trend and therefore levels off completely. This does not seem accurate, so we should use the AAN model here which properly understands that the future forecasts will continue to increase which seems the most likely with exports only continuing to grow.

fit_both <-  Germany  %>%
model(  ANN  = ETS(Exports ~ error("A") + trend("N") + season("N")),
        AAN =ETS(Exports ~ error("A") + trend("A") + season("N")) ) 

fit_both %>% forecast(h=10) %>% 
  autoplot(Germany, level=NULL) +
  labs(title="Forecast Comparison: Germany's Exports")

f.

As shown below, our calculated interval is very close to the R generated interval

ANN_RM <-  pull(accuracy(select(fit_both,ANN)),RMSE) 

AAN_RM <-  pull(accuracy(select(fit_both,AAN)),RMSE) 



fit_both %>% 
  select(ANN) %>%
  forecast(h=1)  %>%
 mutate(
    Low = Exports-1.96 * ANN_RM,   
    High =  Exports+ 1.96 *ANN_RM)  %>% pull(High,Low)

## <distribution[1]>
## N(44, 3.2) 
## N(51, 3.2)

fit_both %>% 
  select(AAN) %>%
  forecast(h=1)  %>%
 mutate(
    Low = Exports-1.96 * AAN_RM,   
    High =  Exports+ 1.96 *AAN_RM)  %>% pull(High,Low)

## <distribution[1]>
## N(45, 2.9) 
## N(51, 2.9)

 fit_both %>%
   forecast(h=1)  %>%
  mutate(interval = hilo(Exports, 95)) %>% pull(interval)

## <hilo[2]>
## [1] [43.71329, 50.75818]95 [44.59370, 51.24660]95

8.6

china <- global_economy %>%
  filter(Country == "China")

china %>% autoplot(GDP) +
  labs(title="Chinese GDP")

lam <- china %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)
bc_china<-china %>% autoplot(box_cox(GDP, lambda))

We can see the various forecasts below. The simple ANN model works very similar to naive and predicts the same forecast as the last into the future. Damped and AAN work very similar for this data, likely due to there not being much of a trend occurring up until the later part of the data. We find that Box-Cox and log work very similar, with the log slope being less aggressive. Box-Cox damped looks like it might be a decent prediction, along with AAN and the damped prediction.

china_fit <- china %>%
  model(
    AAN  = ETS(GDP ~ error("A") + trend("A") + season("N")),
        Damped = ETS(GDP ~ error("A")+trend("Ad")+season("N")),
    ANN = ETS(GDP ~ error("A") + trend("N") + season("N")),
    `Box-Cox` = ETS(box_cox(GDP, lam)),
    `Box-Cox Damped` = ETS(box_cox(GDP, lam) ~ trend("Ad")),
    Log = ETS(log(GDP))
)




china_fit %>%
  forecast(h=10) %>%
  autoplot(china, level =NULL)+
  labs(title="Forecast: Chinese GDP 10 Years")

china_fit %>%  accuracy()

8.7

We see that the seasonal variance is not consistent over time, it is growing and changing. We therefore need to use multiplicative to deal with this.

aus_production %>% autoplot(Gas)

We can see below that the multiplicative and damped forecasts are very close. We can consider the non damped forecast as the potential max future production and the damped forecast to be a more conservative estimate. As the RMSE values are practically the same, both predictions are likely to occur so it depends on how conservative we want our estimates to be or not.

aus_fit <-  aus_production %>%
   model(
    Multiplicative = ETS(Gas ~ error("M") + trend("A") +
                                                season("M")),
    `Damped Multiplicative` = ETS(Gas ~ error("M") + 
                                    trend("Ad") + season("M"))
  )


aus_fc <- aus_fit  %>% forecast(h = "10 years")

aus_fc %>%
  autoplot(aus_production, level = NULL) +
  labs(title="Australian Gas Production ",
       y="Gas Production") +
  guides(colour = guide_legend(title = "Forecast"))

accuracy(aus_fit)

8.8

a.

set.seed(1221)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

As shown below, we do not have a consistent seasonal variation across our data levels. With that in mind, we can conclude that that additive method which requires a consistent seasonal cycle over time, is not the correct approach for this data.

cowplot::plot_grid(autoplot(myseries),autoplot( filter(myseries,year(Month) < 2000)) , autoplot( filter(myseries,year(Month) > 2000)) ,ncol=1 )

b.

We can see the non-damped forecast grows at a very fast rate. The damping does a great job of slowing down the growth of the foretasted trend by a reasonable amount.

myseries_fit <-  myseries %>%
  model(
    Multiplicative = ETS(Turnover ~ error("M") + trend("A") +
                                                season("M")),
    `Damped Multiplicative` = ETS(Turnover ~ error("M") + 
                                    trend("Ad") + season("M"))
  )

myseries_fc <- myseries_fit  %>% forecast(h = "10 years")

myseries_fc %>%
  autoplot(myseries, level = NULL) +
  labs(title="Australian retail",
       y="Turnover") +
  guides(colour = guide_legend(title = "Forecast"))

c.

In fact, we find that the RMSE for the damped method is lower, indicating that it better fits the data.

accuracy(myseries_fit)

d.

We see below that our residual plot is looking very good. Our acf balues are centered around zero and we have a normally distrbuted histogram.

myseries_fit %>% 
  select(`Damped Multiplicative`)  %>% gg_tsresiduals()

e.

myseries_train <- myseries %>%
  filter(year(Month) < 2011)

fit_compare <- myseries_train %>%
  model(
    `Damped Multiplicative` = ETS(Turnover ~ error("M") + 
                                            trend("Ad") +
                                            season("M")),
    Multiplicative = ETS(Turnover ~ error("M") + 
                                    trend("A") +
                                    season("M")),
    `Seasonal Naïve` = SNAIVE(Turnover)
  )

In the case of our specific data, it would not appear that Seasonal Naïve has been bested. This is likely due to the sudden and dramatic downwards spiral the Newspaper and Book industry has gone through due to the growing popularity of digital alternatives. Looking further at the RMSE values however, we find that the Seasonal Naïve approach has a significantly worse error value with the other two approaches being nearly identical. Further investigation is needed to find the best approach for our data.

 joined_data <-anti_join(myseries, myseries_train) 

 fit_compare %>%forecast(joined_data) %>%

  autoplot(joined_data,level = NULL) +
   guides(colour=guide_legend(title="Forecast")) +
  ggtitle('Comparison of Forecast Methods')

accuracy(fit_compare)

8.9

lambda <- myseries_train %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

We see below that the STL Boxcox forecast has the best overall RMSE value, indicating that it better fits the data than our other models. It may be a great choice for this data set.

fit_reteail_box <- myseries_train %>%
  model(
    `STL BoxCox` = STL(box_cox(Turnover, lambda) ~ season(window = "periodic"),
             robust = T),
    `ETS BoxCox` = ETS(box_cox(Turnover, lambda))
                       )
    
  


rbind(accuracy(fit_reteail_box),accuracy(fit_compare))

Data 624: Homework Five

Zachary Safir

2/26/2022

8.1

a.

b.

8.5

a.

b.

c.

d.

e.

f.

8.6

8.7

8.8

a.

b.

c.

d.

e.

8.9