Data624

Exercise 1. Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.

a-Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of α and ℓ_0 , and generate forecasts for the next four months.

#head(aus_livestock)
View(aus_livestock)
#view(aus_livestock$Animal [18000:18100]) # %>% filter(Animal =="Lambs"))
min(aus_livestock$Month)

## <yearmonth[1]>
## [1] "1972 Jul"

aus3 <- aus_livestock %>%
                      filter(Animal == "Pigs", State == "Victoria")

fit <- aus3$Count %>%
       ses(h=4)
fit$model

## Simple exponential smoothing 
## 
## Call:
##  ses(y = ., h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.322 
## 
##   Initial states:
##     l = 100646.6098 
## 
##   sigma:  9353.115
## 
##      AIC     AICc      BIC 
## 13737.10 13737.14 13750.07

autoplot(fit)

Based on the Simple Exponential Smoothing (ses) call, α = 0.322, l = 100646.6098

b-Compute a 95% prediction interval for the first forecast using ^y±1.96s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.

predi_low <- fit$mean - 1.96*(sd(fit$residuals)) #76871.23 
predi_high <- fit$mean + 1.96*(sd(fit$residuals)) #113502.3 
predi_high; predi_low

## Time Series:
## Start = 559 
## End = 562 
## Frequency = 1 
## [1] 113502.3 113502.3 113502.3 113502.3

## Time Series:
## Start = 559 
## End = 562 
## Frequency = 1 
## [1] 76871.23 76871.23 76871.23 76871.23

fit #76855.01....113518.5

##     Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 559       95186.77 83200.27 107173.3 76855.01 113518.5
## 560       95186.77 82594.29 107779.3 75928.23 114445.3
## 561       95186.77 82016.16 108357.4 75044.05 115329.5
## 562       95186.77 81462.36 108911.2 74197.09 116176.5

Predicted interval = [76871.23, 113502.3] Interval produced by R = [76855.01, 113518.5] if we round these intervals, there is no delta…we fall into the same confidence interval. But if we consider the entire value, the interval produced by R is more narrowed than we obtained with predicted interval.

Exercise 5. Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

a- Plot the Exports series and discuss the main features of the data.

??global_economy

## starting httpd help server ... done

aus4 <- global_economy %>%
               filter(Country == "Australia")#%>%
               #model(RW(Population ~ drift())) %>%
               #forecast(h = "5 years") %>%
#view(aus4)              
                                                                                     #autoplot(global_economy) + labs(y = "Number of People", title = "Population of Australia from 1960 to 2025")

global_economy %>%
                filter(Country == "Australia") %>%
                                               autoplot(Exports) + 
                                               labs(y = "Percentage of Export relatively to GDP", title = "Exports of goods and services (% of GDP). of Australia from 1960 to 2020")

aus4a <- global_economy %>%
                filter(Country == "Australia") %>%
                #mutate(GDP = round(GDP,5))%>%
                mutate(Exports_R = round(((GDP*Exports)/100), 5))%>%
                dplyr::select(Year, GDP, Exports_R)
#view(aus4a)
#plot.ts(aus4a, plot.type = "single")
aus4a1 <- aus4a %>%
          gather(key = "variable", value = "value", -Year)

ggplot(aus4a1, aes(x = Year, y = value)) +
  geom_line(aes(color = variable)) +
  scale_color_manual(values = c("darkred", "steelblue"))+labs(y = "Percentage of Export relatively to GDP", title = "Exports of goods and services (% of GDP). of Australia from 1960 to 2020")

#ts.plot(aus4a,gpars= list(col=rainbow(3))) #, main = "Exports of goods and services (% of GDP). of Australia from 1960 to 2020", xlab = "Years", las=3, ylab = "Percentage of Export relatively to GDP")
#legend("right", names(aus4a[2:3]), col=2:3, lty=1)
#title("Exports of goods and services (% of GDP). of Australia from 1960 to 2020")
#max(aus4$Year)

We chose the Australia data and we observed that in the long run the GDP increased proportionally to the exports. So, overall Australia did good from 1960 to 2017. In fact, around year of 2000-2012, Australia had a sudden peak in their GDP which is explained by the rise of their exports. There were few recessions but didn’t last long. Australia economy shows a fast recovery.

b- Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.

fit <- aus4 %>%
  model(ETS(Exports ~ error("A") + trend("N") + season("N")))
fc <- fit %>%
  forecast(h = 5)
fc %>% 
  autoplot(aus4)+labs(y = "Percentage of Export relatively to GDP", title = "Exports of goods and services (% of GDP). of Australia from 1960 to 2020")

c-Compute the RMSE values for the training data.

fit %>%
  accuracy()

## # A tibble: 1 x 11
##   Country  .model         .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE   ACF1
##   <fct>    <chr>          <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 Austral~ "ETS(Exports ~ Trai~ 0.232  1.15 0.914  1.09  5.41 0.928 0.928 0.0125

RMSE = 1.146794

d-Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.

fit1 <- aus4 %>%
  model(ETS(Exports ~ error("A") + trend("A") + season("N")))
fc1 <- fit1 %>%
  forecast(h = 5)
fc1 %>% 
  autoplot(aus4)+labs(y = "Percentage of Export relatively to GDP", title = "Exports of goods and services (% of GDP). of Australia from 1960 to 2020")

fit1 %>%
  accuracy()

## # A tibble: 1 x 11
##   Country  .model      .type       ME  RMSE   MAE    MPE  MAPE  MASE RMSSE  ACF1
##   <fct>    <chr>       <chr>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Austral~ "ETS(Expor~ Trai~ -7.46e-7  1.12 0.893 -0.387  5.39 0.907 0.904 0.109

Well, the ETS(A,A,N)model shows an upward trend line compared to the ETS(A,N,N)model. We think the ETS(A,A,N) model is the best.

f - Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.

predi_low_ANN <- fc$.mean - 1.96*(1.146794) #18.35944
predi_high_ANN <- fc$.mean + 1.96*(1.146794) #22.85488 
predi_low_AAN <- fc1$.mean - 1.96*(1.116727) #19.19850 
predi_high_AAN <- fc1$.mean + 1.96*(1.116727) #23.57607 

#fc$Exports #N(21, 1.4)
#fc1$Exports #N(21, 1.3)

Predicted interval ETS(ANN)model = [18.35944,22.85488] Interval produced by R = N(21, 1.4)

Predicted interval ETS(AAN)model = [19.19850,23.57607] Interval produced by R = N(21, 1.3)

Exercise 6: Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

[Hint: use a relatively large value of h when forecasting, so you can clearly see the differences between the various options when plotting the forecasts.]

china <- global_economy %>%
               filter(Country == "China")#%>%
   
china %>%
      autoplot(GDP) + 
      labs(y = "Gross domestic product (in $USD).", title = "Gross domestic product of China from 1960 to 2020")

China GDP shows an exponential growth from year 2000+. No wonder, China is the factory of the world.

#lambda1 <- BoxCox.lambda(china$GDP)
#Box_cox = ETS (BoxCox(GDP, lambda1) ~ error("A") + trend("A") + season("N")),

fit <- china %>%
  #stretch_tsibble(.init = 10) %>%
  model(
    SES = ETS(GDP ~ error("A") + trend("N") + season("N")),
    Holt = ETS(GDP ~ error("A") + trend("A") + season("N")),
    Damped = ETS(GDP ~ error("A") + trend("Ad") +
                   season("N"))
  ) %>%
  forecast(h = 23) #%>%  accuracy(china)

fit %>%
  autoplot(china)+ 
      labs(y = "Gross domestic product (in $USD).", title = "Gross domestic product of China from 1960 to 2040")

As we can see, the Holt model shows a continuous exponential growth of China’s GDP.

Exercise 7. Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

??aus_production
aus1 <- aus_production
sum(is.na(aus1$Gas))

## [1] 0

# not good , need a fix...aus1 %>% filter(!is.na(Bricks))
#aus2 <- aus1 %>% na.omit(aus1$Gas)
#mydata = mydata[complete.cases(mydata), ]

#aus1 %>% drop_na(Bricks)

aus1 %>%
  autoplot(Gas)+  labs(y = "Gas production in petajoules", title = "Quarterly estimates of Gas production in Australia")

We don’t see a constant peak but rather an exponential trend for Australia Gas production. So,the multiplicative seasonality is more suitable.

# Short_hand    Method
# (N,N)     Simple exponential smoothing
# (A,N)     Holt’s linear method
# (Ad,N)    Additive damped trend method
# (A,A)     Additive Holt-Winters’ method
# (A,M)     Multiplicative Holt-Winters’ method
# (Ad,M)    Holt-Winters’ damped method 


fit <- aus1 %>%
  #stretch_tsibble(.init = 10) %>%
  model(
        #additive = ETS(Gas ~ error("A") + trend("A") +
        #                                        season("A")),
    multiplicative = ETS(Gas ~ error("M") + trend("A") +
                                                season("M")),
    additive_damped_simple = ETS(Gas ~ error("A") + trend("Ad") +
                                                season("N")),
    multiplicative_damped_simple = ETS(Gas ~ error("M") + trend("Ad") +
                                                season("M"))
    
    #SES = ETS(GDP ~ error("A") + trend("N") + season("N")),
    #Holt = ETS(GDP ~ error("A") + trend("A") + season("N")),
    #Damped = ETS(Gas ~ error("A") + trend("Ad") +                   season("N"))
  ) %>%
  forecast(h = 10) #%>%  accuracy(china)

fit %>%
  autoplot(aus1, level = NULL)+ 
      labs(y = "Gas production in petajoules", title = "Quarterly estimates of Gas production in Australia")

We clearly see that the multiplicative seasonality is suitable for the forecasting of Australia Gas production.

Exercise 8. Recall your retail time series data (from Exercise 8 in Section 2.10).

a-Why is multiplicative seasonality necessary for this series?

view(aus_retail)
#head(aus_retail)
??aus_retail
set.seed(1278)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

#aus8 <- aus_retail %>% 
#                   filter(Industry == "Takeaway food services")
myseries %>%
     autoplot(Turnover)+  
     labs(y = "Retail turnover in $Million AU", title = "   Australian Retail Trade Turnover")

We don’t see a constant peak, therefore, the multiplicative seasonality is more suitable.

b-Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

fit <- myseries %>%
  #stretch_tsibble(.init = 10) %>%
  model(
        additive = ETS(Turnover ~ error("A") + trend("A") +
                                                season("A")),
    multiplicative = ETS(Turnover ~ error("M") + trend("A") +
                                                season("M")),
    #additive_damped_simple = ETS(Gas ~ error("A") + trend("Ad") +
     #                                           season("N")),
    multiplicative_damped_simple = ETS(Turnover ~ error("M") + trend("Ad") +
                                                season("M"))
    
    #SES = ETS(GDP ~ error("A") + trend("N") + season("N")),
    #Holt = ETS(GDP ~ error("A") + trend("A") + season("N")),
    #Damped = ETS(Gas ~ error("A") + trend("Ad") +                   season("N"))
  ) %>%
  forecast(h = 10) #%>%  accuracy(china)

fit %>%
  autoplot(myseries, level = NULL)+ 
     labs(y = "Retail turnover in $Million AU", title = "   Australian Retail Trade Turnover")

c-Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

series1 <- myseries %>%
  stretch_tsibble(.init = 10) %>%
  model(
        additive = ETS(Turnover ~ error("A") + trend("A") +
                                                season("A")),
    multiplicative = ETS(Turnover ~ error("M") + trend("A") +
                                                season("M")),
    #additive_damped_simple = ETS(Gas ~ error("A") + trend("Ad") +
     #                                           season("N")),
    multiplicative_damped_simple = ETS(Turnover ~ error("M") + trend("Ad") +
                                                season("M"))
    
    #SES = ETS(GDP ~ error("A") + trend("N") + season("N")),
    #Holt = ETS(GDP ~ error("A") + trend("A") + season("N")),
    #Damped = ETS(Gas ~ error("A") + trend("Ad") +                   season("N"))
  ) %>%
  forecast(h = 1) %>%
                   accuracy(myseries)

## Warning: 8 errors (2 unique) encountered for additive
## [3] A seasonal ETS model cannot be used for this data.
## [5] Not enough data to estimate this ETS model.

## Warning: 8 errors (2 unique) encountered for multiplicative
## [3] A seasonal ETS model cannot be used for this data.
## [5] Not enough data to estimate this ETS model.

## Warning: 9 errors (2 unique) encountered for multiplicative_damped_simple
## [3] A seasonal ETS model cannot be used for this data.
## [6] Not enough data to estimate this ETS model.

## Warning: The future dataset is incomplete, incomplete out-of-sample data will be treated as missing. 
## 1 observation is missing at 2019 Jan

series1

## # A tibble: 3 x 12
##   .model   State Industry   .type       ME  RMSE   MAE     MPE  MAPE  MASE RMSSE
##   <chr>    <chr> <chr>      <chr>    <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl> <dbl>
## 1 additive Tasm~ Clothing,~ Test  -0.0168   1.74  1.17 -0.664   5.92 0.625 0.630
## 2 multipl~ Tasm~ Clothing,~ Test  -0.00611  1.59  1.11 -0.447   5.78 0.592 0.573
## 3 multipl~ Tasm~ Clothing,~ Test   0.0843   1.59  1.10  0.0922  5.69 0.587 0.573
## # ... with 1 more variable: ACF1 <dbl>

d- Check that the residuals from the best method look like white noise.

 myseries %>%
 # stretch_tsibble(.init = 10) %>%
  model(
       # additive = ETS(Turnover ~ error("A") + trend("A") +
        #                                        season("A")),
    #multiplicative = ETS(Turnover ~ error("M") + trend("A") +
    #                                            season("M")),
    #additive_damped_simple = ETS(Gas ~ error("A") + trend("Ad") +
     #                                           season("N")),
    multiplicative_damped_simple = ETS(Turnover ~ error("M") + trend("Ad") +
                                                season("M"))

    #SES = ETS(GDP ~ error("A") + trend("N") + season("N")),
    #Holt = ETS(GDP ~ error("A") + trend("A") + season("N")),
    #Damped = ETS(Gas ~ error("A") + trend("Ad") +                   season("N"))
  ) %>%
  gg_tsresiduals()

#checkresiduals()

white noise distribution is any distribution that has:

Zero mean
A constant variance/standard deviation (does not change over time)
Zero autocorrelation at all lags

e-Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?

train <- myseries %>%
                  filter(year(Month)<=2010)


fit_train <- train %>%
  #stretch_tsibble(.init = 10) %>%
  model(
        #additive = ETS(Turnover ~ error("A") + trend("A") +
        #                                        season("A")),
    multiplicative = ETS(Turnover ~ error("M") + trend("A") +
                                                season("M")),
    #additive_damped_simple = ETS(Gas ~ error("A") + trend("Ad") +
     #                                           season("N")),
    multiplicative_damped_simple = ETS(Turnover ~ error("M") + trend("Ad") +
                                                season("M"))
    
    #SES = ETS(GDP ~ error("A") + trend("N") + season("N")),
    #Holt = ETS(GDP ~ error("A") + trend("A") + season("N")),
    #Damped = ETS(Gas ~ error("A") + trend("Ad") +                   season("N"))
  ) %>%
  forecast(h = 10) #%>%  accuracy(china)

fit_train %>%
  autoplot(train, level = NULL)+ 
     labs(y = "Retail turnover in $Million AU", title = "   Australian Retail Trade Turnover")

test <- myseries %>%
                  filter(year(Month)>2010)

fit1_train <- train %>%
                    model(SNAIVE(Turnover))
fc <- fit1_train %>%
  forecast(new_data = anti_join(myseries, train))

## Joining, by = c("State", "Industry", "Series ID", "Month", "Turnover")

fit_train %>%
  accuracy(myseries)

## # A tibble: 2 x 12
##   .model  State  Industry  .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>   <chr>  <chr>     <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 multip~ Tasma~ Clothing~ Test   1.50  3.37  2.57  5.07  9.90  1.88  1.91 0.642
## 2 multip~ Tasma~ Clothing~ Test   1.69  3.60  2.70  5.77 10.4   1.98  2.05 0.659

cat("Improvement")

## Improvement

fit1_train %>%
          accuracy()

## # A tibble: 1 x 12
##   State  Industry   .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>  <chr>      <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Tasma~ Clothing,~ SNAIV~ Trai~ 0.532  1.76  1.37  3.03  8.57     1     1 0.597

Exercise 9. For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

dcmp <- train %>%
  model(stl = STL(Turnover))
components(dcmp)

## # A dable: 345 x 9 [1M]
## # Key:     State, Industry, .model [1]
## # :        Turnover = trend + season_year + remainder
##    State  Industry          .model    Month Turnover trend season_year remainder
##    <chr>  <chr>             <chr>     <mth>    <dbl> <dbl>       <dbl>     <dbl>
##  1 Tasma~ Clothing, footwe~ stl    1982 Apr      8.7  8.69     -0.0948    0.101 
##  2 Tasma~ Clothing, footwe~ stl    1982 May      9.3  8.67      0.685    -0.0588
##  3 Tasma~ Clothing, footwe~ stl    1982 Jun      8.6  8.65     -0.158     0.104 
##  4 Tasma~ Clothing, footwe~ stl    1982 Jul      8.9  8.63     -0.296     0.561 
##  5 Tasma~ Clothing, footwe~ stl    1982 Aug      7.5  8.62     -1.09     -0.0312
##  6 Tasma~ Clothing, footwe~ stl    1982 Sep      7.5  8.61     -1.16      0.0500
##  7 Tasma~ Clothing, footwe~ stl    1982 Oct      7    8.60     -0.790    -0.807 
##  8 Tasma~ Clothing, footwe~ stl    1982 Nov      9    8.59      0.333     0.0803
##  9 Tasma~ Clothing, footwe~ stl    1982 Dec     13.8  8.58      5.60     -0.372 
## 10 Tasma~ Clothing, footwe~ stl    1983 Jan      6.8  8.57     -1.51     -0.257 
## # ... with 335 more rows, and 1 more variable: season_adjust <dbl>

components(dcmp) %>% autoplot()

min(myseries$Month)

## <yearmonth[1]>
## [1] "1982 Apr"

#train <- ts(as.vector(train), start = c(1982, 4), end = c(2010,12), frequency = 12 )
trains <- ts(data=train$Turnover, frequency=12)

# fit_train1 <-trains %>%
#                   stlm(
#                   s.window = 13,
#                   robust = TRUE,
#                    method = "ets",
#                 lambda1 = BoxCox.lambda(trains)
#   ) %>%
#   forecast(
#     h = 10,
#     lambda1 = BoxCox.lambda(trains)
#     )
#fit_train1 %>%
#  accuracy()

something wrong with lambdag

Data624_HW5

Alexis Mekueko

10/10/2021

Exercise 1. Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.

Exercise 5. Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

Exercise 7. Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

Exercise 8. Recall your retail time series data (from Exercise 8 in Section 2.10).

Exercise 9. For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?