a-Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of α and ℓ_0 , and generate forecasts for the next four months.
#head(aus_livestock)
View(aus_livestock)
#view(aus_livestock$Animal [18000:18100]) # %>% filter(Animal =="Lambs"))
min(aus_livestock$Month)
## <yearmonth[1]>
## [1] "1972 Jul"
aus3 <- aus_livestock %>%
filter(Animal == "Pigs", State == "Victoria")
fit <- aus3$Count %>%
ses(h=4)
fit$model
## Simple exponential smoothing
##
## Call:
## ses(y = ., h = 4)
##
## Smoothing parameters:
## alpha = 0.322
##
## Initial states:
## l = 100646.6098
##
## sigma: 9353.115
##
## AIC AICc BIC
## 13737.10 13737.14 13750.07
autoplot(fit)
Based on the Simple Exponential Smoothing (ses) call, α = 0.322, l = 100646.6098
b-Compute a 95% prediction interval for the first forecast using ^y±1.96s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.
predi_low <- fit$mean - 1.96*(sd(fit$residuals)) #76871.23
predi_high <- fit$mean + 1.96*(sd(fit$residuals)) #113502.3
predi_high; predi_low
## Time Series:
## Start = 559
## End = 562
## Frequency = 1
## [1] 113502.3 113502.3 113502.3 113502.3
## Time Series:
## Start = 559
## End = 562
## Frequency = 1
## [1] 76871.23 76871.23 76871.23 76871.23
fit #76855.01....113518.5
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 559 95186.77 83200.27 107173.3 76855.01 113518.5
## 560 95186.77 82594.29 107779.3 75928.23 114445.3
## 561 95186.77 82016.16 108357.4 75044.05 115329.5
## 562 95186.77 81462.36 108911.2 74197.09 116176.5
Predicted interval = [76871.23, 113502.3] Interval produced by R = [76855.01, 113518.5] if we round these intervals, there is no delta…we fall into the same confidence interval. But if we consider the entire value, the interval produced by R is more narrowed than we obtained with predicted interval.
a- Plot the Exports series and discuss the main features of the data.
??global_economy
## starting httpd help server ... done
aus4 <- global_economy %>%
filter(Country == "Australia")#%>%
#model(RW(Population ~ drift())) %>%
#forecast(h = "5 years") %>%
#view(aus4)
#autoplot(global_economy) + labs(y = "Number of People", title = "Population of Australia from 1960 to 2025")
global_economy %>%
filter(Country == "Australia") %>%
autoplot(Exports) +
labs(y = "Percentage of Export relatively to GDP", title = "Exports of goods and services (% of GDP). of Australia from 1960 to 2020")
aus4a <- global_economy %>%
filter(Country == "Australia") %>%
#mutate(GDP = round(GDP,5))%>%
mutate(Exports_R = round(((GDP*Exports)/100), 5))%>%
dplyr::select(Year, GDP, Exports_R)
#view(aus4a)
#plot.ts(aus4a, plot.type = "single")
aus4a1 <- aus4a %>%
gather(key = "variable", value = "value", -Year)
ggplot(aus4a1, aes(x = Year, y = value)) +
geom_line(aes(color = variable)) +
scale_color_manual(values = c("darkred", "steelblue"))+labs(y = "Percentage of Export relatively to GDP", title = "Exports of goods and services (% of GDP). of Australia from 1960 to 2020")
#ts.plot(aus4a,gpars= list(col=rainbow(3))) #, main = "Exports of goods and services (% of GDP). of Australia from 1960 to 2020", xlab = "Years", las=3, ylab = "Percentage of Export relatively to GDP")
#legend("right", names(aus4a[2:3]), col=2:3, lty=1)
#title("Exports of goods and services (% of GDP). of Australia from 1960 to 2020")
#max(aus4$Year)
We chose the Australia data and we observed that in the long run the GDP increased proportionally to the exports. So, overall Australia did good from 1960 to 2017. In fact, around year of 2000-2012, Australia had a sudden peak in their GDP which is explained by the rise of their exports. There were few recessions but didn’t last long. Australia economy shows a fast recovery.
b- Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.
fit <- aus4 %>%
model(ETS(Exports ~ error("A") + trend("N") + season("N")))
fc <- fit %>%
forecast(h = 5)
fc %>%
autoplot(aus4)+labs(y = "Percentage of Export relatively to GDP", title = "Exports of goods and services (% of GDP). of Australia from 1960 to 2020")
c-Compute the RMSE values for the training data.
fit %>%
accuracy()
## # A tibble: 1 x 11
## Country .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <fct> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Austral~ "ETS(Exports ~ Trai~ 0.232 1.15 0.914 1.09 5.41 0.928 0.928 0.0125
RMSE = 1.146794
d-Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.
fit1 <- aus4 %>%
model(ETS(Exports ~ error("A") + trend("A") + season("N")))
fc1 <- fit1 %>%
forecast(h = 5)
fc1 %>%
autoplot(aus4)+labs(y = "Percentage of Export relatively to GDP", title = "Exports of goods and services (% of GDP). of Australia from 1960 to 2020")
fit1 %>%
accuracy()
## # A tibble: 1 x 11
## Country .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <fct> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Austral~ "ETS(Expor~ Trai~ -7.46e-7 1.12 0.893 -0.387 5.39 0.907 0.904 0.109
Well, the ETS(A,A,N)model shows an upward trend line compared to the ETS(A,N,N)model. We think the ETS(A,A,N) model is the best.
f - Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.
predi_low_ANN <- fc$.mean - 1.96*(1.146794) #18.35944
predi_high_ANN <- fc$.mean + 1.96*(1.146794) #22.85488
predi_low_AAN <- fc1$.mean - 1.96*(1.116727) #19.19850
predi_high_AAN <- fc1$.mean + 1.96*(1.116727) #23.57607
#fc$Exports #N(21, 1.4)
#fc1$Exports #N(21, 1.3)
Predicted interval ETS(ANN)model = [18.35944,22.85488] Interval produced by R = N(21, 1.4)
Predicted interval ETS(AAN)model = [19.19850,23.57607] Interval produced by R = N(21, 1.3)
[Hint: use a relatively large value of h when forecasting, so you can clearly see the differences between the various options when plotting the forecasts.]
china <- global_economy %>%
filter(Country == "China")#%>%
china %>%
autoplot(GDP) +
labs(y = "Gross domestic product (in $USD).", title = "Gross domestic product of China from 1960 to 2020")
China GDP shows an exponential growth from year 2000+. No wonder, China is the factory of the world.
#lambda1 <- BoxCox.lambda(china$GDP)
#Box_cox = ETS (BoxCox(GDP, lambda1) ~ error("A") + trend("A") + season("N")),
fit <- china %>%
#stretch_tsibble(.init = 10) %>%
model(
SES = ETS(GDP ~ error("A") + trend("N") + season("N")),
Holt = ETS(GDP ~ error("A") + trend("A") + season("N")),
Damped = ETS(GDP ~ error("A") + trend("Ad") +
season("N"))
) %>%
forecast(h = 23) #%>% accuracy(china)
fit %>%
autoplot(china)+
labs(y = "Gross domestic product (in $USD).", title = "Gross domestic product of China from 1960 to 2040")
As we can see, the Holt model shows a continuous exponential growth of China’s GDP.
??aus_production
aus1 <- aus_production
sum(is.na(aus1$Gas))
## [1] 0
# not good , need a fix...aus1 %>% filter(!is.na(Bricks))
#aus2 <- aus1 %>% na.omit(aus1$Gas)
#mydata = mydata[complete.cases(mydata), ]
#aus1 %>% drop_na(Bricks)
aus1 %>%
autoplot(Gas)+ labs(y = "Gas production in petajoules", title = "Quarterly estimates of Gas production in Australia")
We don’t see a constant peak but rather an exponential trend for Australia Gas production. So,the multiplicative seasonality is more suitable.
# Short_hand Method
# (N,N) Simple exponential smoothing
# (A,N) Holt’s linear method
# (Ad,N) Additive damped trend method
# (A,A) Additive Holt-Winters’ method
# (A,M) Multiplicative Holt-Winters’ method
# (Ad,M) Holt-Winters’ damped method
fit <- aus1 %>%
#stretch_tsibble(.init = 10) %>%
model(
#additive = ETS(Gas ~ error("A") + trend("A") +
# season("A")),
multiplicative = ETS(Gas ~ error("M") + trend("A") +
season("M")),
additive_damped_simple = ETS(Gas ~ error("A") + trend("Ad") +
season("N")),
multiplicative_damped_simple = ETS(Gas ~ error("M") + trend("Ad") +
season("M"))
#SES = ETS(GDP ~ error("A") + trend("N") + season("N")),
#Holt = ETS(GDP ~ error("A") + trend("A") + season("N")),
#Damped = ETS(Gas ~ error("A") + trend("Ad") + season("N"))
) %>%
forecast(h = 10) #%>% accuracy(china)
fit %>%
autoplot(aus1, level = NULL)+
labs(y = "Gas production in petajoules", title = "Quarterly estimates of Gas production in Australia")
We clearly see that the multiplicative seasonality is suitable for the forecasting of Australia Gas production.
a-Why is multiplicative seasonality necessary for this series?
view(aus_retail)
#head(aus_retail)
??aus_retail
set.seed(1278)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
#aus8 <- aus_retail %>%
# filter(Industry == "Takeaway food services")
myseries %>%
autoplot(Turnover)+
labs(y = "Retail turnover in $Million AU", title = " Australian Retail Trade Turnover")
We don’t see a constant peak, therefore, the multiplicative seasonality is more suitable.
b-Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.
fit <- myseries %>%
#stretch_tsibble(.init = 10) %>%
model(
additive = ETS(Turnover ~ error("A") + trend("A") +
season("A")),
multiplicative = ETS(Turnover ~ error("M") + trend("A") +
season("M")),
#additive_damped_simple = ETS(Gas ~ error("A") + trend("Ad") +
# season("N")),
multiplicative_damped_simple = ETS(Turnover ~ error("M") + trend("Ad") +
season("M"))
#SES = ETS(GDP ~ error("A") + trend("N") + season("N")),
#Holt = ETS(GDP ~ error("A") + trend("A") + season("N")),
#Damped = ETS(Gas ~ error("A") + trend("Ad") + season("N"))
) %>%
forecast(h = 10) #%>% accuracy(china)
fit %>%
autoplot(myseries, level = NULL)+
labs(y = "Retail turnover in $Million AU", title = " Australian Retail Trade Turnover")
c-Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?
series1 <- myseries %>%
stretch_tsibble(.init = 10) %>%
model(
additive = ETS(Turnover ~ error("A") + trend("A") +
season("A")),
multiplicative = ETS(Turnover ~ error("M") + trend("A") +
season("M")),
#additive_damped_simple = ETS(Gas ~ error("A") + trend("Ad") +
# season("N")),
multiplicative_damped_simple = ETS(Turnover ~ error("M") + trend("Ad") +
season("M"))
#SES = ETS(GDP ~ error("A") + trend("N") + season("N")),
#Holt = ETS(GDP ~ error("A") + trend("A") + season("N")),
#Damped = ETS(Gas ~ error("A") + trend("Ad") + season("N"))
) %>%
forecast(h = 1) %>%
accuracy(myseries)
## Warning: 8 errors (2 unique) encountered for additive
## [3] A seasonal ETS model cannot be used for this data.
## [5] Not enough data to estimate this ETS model.
## Warning: 8 errors (2 unique) encountered for multiplicative
## [3] A seasonal ETS model cannot be used for this data.
## [5] Not enough data to estimate this ETS model.
## Warning: 9 errors (2 unique) encountered for multiplicative_damped_simple
## [3] A seasonal ETS model cannot be used for this data.
## [6] Not enough data to estimate this ETS model.
## Warning: The future dataset is incomplete, incomplete out-of-sample data will be treated as missing.
## 1 observation is missing at 2019 Jan
series1
## # A tibble: 3 x 12
## .model State Industry .type ME RMSE MAE MPE MAPE MASE RMSSE
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 additive Tasm~ Clothing,~ Test -0.0168 1.74 1.17 -0.664 5.92 0.625 0.630
## 2 multipl~ Tasm~ Clothing,~ Test -0.00611 1.59 1.11 -0.447 5.78 0.592 0.573
## 3 multipl~ Tasm~ Clothing,~ Test 0.0843 1.59 1.10 0.0922 5.69 0.587 0.573
## # ... with 1 more variable: ACF1 <dbl>
d- Check that the residuals from the best method look like white noise.
myseries %>%
# stretch_tsibble(.init = 10) %>%
model(
# additive = ETS(Turnover ~ error("A") + trend("A") +
# season("A")),
#multiplicative = ETS(Turnover ~ error("M") + trend("A") +
# season("M")),
#additive_damped_simple = ETS(Gas ~ error("A") + trend("Ad") +
# season("N")),
multiplicative_damped_simple = ETS(Turnover ~ error("M") + trend("Ad") +
season("M"))
#SES = ETS(GDP ~ error("A") + trend("N") + season("N")),
#Holt = ETS(GDP ~ error("A") + trend("A") + season("N")),
#Damped = ETS(Gas ~ error("A") + trend("Ad") + season("N"))
) %>%
gg_tsresiduals()
#checkresiduals()
white noise distribution is any distribution that has:
Zero mean
A constant variance/standard deviation (does not change over time)
Zero autocorrelation at all lags
e-Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?
train <- myseries %>%
filter(year(Month)<=2010)
fit_train <- train %>%
#stretch_tsibble(.init = 10) %>%
model(
#additive = ETS(Turnover ~ error("A") + trend("A") +
# season("A")),
multiplicative = ETS(Turnover ~ error("M") + trend("A") +
season("M")),
#additive_damped_simple = ETS(Gas ~ error("A") + trend("Ad") +
# season("N")),
multiplicative_damped_simple = ETS(Turnover ~ error("M") + trend("Ad") +
season("M"))
#SES = ETS(GDP ~ error("A") + trend("N") + season("N")),
#Holt = ETS(GDP ~ error("A") + trend("A") + season("N")),
#Damped = ETS(Gas ~ error("A") + trend("Ad") + season("N"))
) %>%
forecast(h = 10) #%>% accuracy(china)
fit_train %>%
autoplot(train, level = NULL)+
labs(y = "Retail turnover in $Million AU", title = " Australian Retail Trade Turnover")
test <- myseries %>%
filter(year(Month)>2010)
fit1_train <- train %>%
model(SNAIVE(Turnover))
fc <- fit1_train %>%
forecast(new_data = anti_join(myseries, train))
## Joining, by = c("State", "Industry", "Series ID", "Month", "Turnover")
fit_train %>%
accuracy(myseries)
## # A tibble: 2 x 12
## .model State Industry .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 multip~ Tasma~ Clothing~ Test 1.50 3.37 2.57 5.07 9.90 1.88 1.91 0.642
## 2 multip~ Tasma~ Clothing~ Test 1.69 3.60 2.70 5.77 10.4 1.98 2.05 0.659
cat("Improvement")
## Improvement
fit1_train %>%
accuracy()
## # A tibble: 1 x 12
## State Industry .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Tasma~ Clothing,~ SNAIV~ Trai~ 0.532 1.76 1.37 3.03 8.57 1 1 0.597
dcmp <- train %>%
model(stl = STL(Turnover))
components(dcmp)
## # A dable: 345 x 9 [1M]
## # Key: State, Industry, .model [1]
## # : Turnover = trend + season_year + remainder
## State Industry .model Month Turnover trend season_year remainder
## <chr> <chr> <chr> <mth> <dbl> <dbl> <dbl> <dbl>
## 1 Tasma~ Clothing, footwe~ stl 1982 Apr 8.7 8.69 -0.0948 0.101
## 2 Tasma~ Clothing, footwe~ stl 1982 May 9.3 8.67 0.685 -0.0588
## 3 Tasma~ Clothing, footwe~ stl 1982 Jun 8.6 8.65 -0.158 0.104
## 4 Tasma~ Clothing, footwe~ stl 1982 Jul 8.9 8.63 -0.296 0.561
## 5 Tasma~ Clothing, footwe~ stl 1982 Aug 7.5 8.62 -1.09 -0.0312
## 6 Tasma~ Clothing, footwe~ stl 1982 Sep 7.5 8.61 -1.16 0.0500
## 7 Tasma~ Clothing, footwe~ stl 1982 Oct 7 8.60 -0.790 -0.807
## 8 Tasma~ Clothing, footwe~ stl 1982 Nov 9 8.59 0.333 0.0803
## 9 Tasma~ Clothing, footwe~ stl 1982 Dec 13.8 8.58 5.60 -0.372
## 10 Tasma~ Clothing, footwe~ stl 1983 Jan 6.8 8.57 -1.51 -0.257
## # ... with 335 more rows, and 1 more variable: season_adjust <dbl>
components(dcmp) %>% autoplot()
min(myseries$Month)
## <yearmonth[1]>
## [1] "1982 Apr"
#train <- ts(as.vector(train), start = c(1982, 4), end = c(2010,12), frequency = 12 )
trains <- ts(data=train$Turnover, frequency=12)
# fit_train1 <-trains %>%
# stlm(
# s.window = 13,
# robust = TRUE,
# method = "ets",
# lambda1 = BoxCox.lambda(trains)
# ) %>%
# forecast(
# h = 10,
# lambda1 = BoxCox.lambda(trains)
# )
#fit_train1 %>%
# accuracy()
something wrong with lambdag