Optimal Values for alpha = 0.3579,l0 = 95487.5
avp<-aus_livestock%>%filter(Animal=="Pigs",State=="Victoria")
fit<-avp%>%
model(
si = ETS(Count)
)
r<-fit%>%report()
## Series: Count
## Model: ETS(A,N,A)
## Smoothing parameters:
## alpha = 0.3579401
## gamma = 0.0001000139
##
## Initial states:
## l[0] s[0] s[-1] s[-2] s[-3] s[-4] s[-5] s[-6]
## 95487.5 2066.235 6717.311 -2809.562 -1341.299 -7750.173 -8483.418 5610.89
## s[-7] s[-8] s[-9] s[-10] s[-11]
## -579.8107 1215.464 -2827.091 1739.465 6441.989
##
## sigma^2: 60742898
##
## AIC AICc BIC
## 13545.38 13546.26 13610.24
f<-fit%>%forecast(h=4)
f
## # A fable: 4 x 6 [1M]
## # Key: Animal, State, .model [1]
## Animal State .model Month Count .mean
## <fct> <fct> <chr> <mth> <dist> <dbl>
## 1 Pigs Victoria si 2019 Jan N(84425, 6.1e+07) 84425.
## 2 Pigs Victoria si 2019 Feb N(85158, 6.9e+07) 85158.
## 3 Pigs Victoria si 2019 Mar N(91567, 7.6e+07) 91567.
## 4 Pigs Victoria si 2019 Apr N(90098, 8.4e+07) 90098.
f%>%autoplot(avp%>%filter(Month>= yearmonth("2018 Jan")))
MY CI is much smaller then R’s which seems to be much more realistic given the past numbers.
s<-augment(fit)%>%pull(.resid)%>%sd()
y<-f%>%pull(.mean)%>%head(1)
lbound<-y-(1.96*s)
ubound<-y+(1.96*s)
print(paste("My 95% CI : ",lbound," ",ubound))
## [1] "My 95% CI : 69328.2490079102 99521.1652255165"
print(paste("R's 95% CI : ",f%>%pull(Count)%>%head(1)))
## [1] "R's 95% CI : N(84425, 6.1e+07)"
fr<-global_economy%>%filter(Country=="France")
The data is clearly trending up over time with fluctuations including a large dip in export sin the 90s.
autoplot(fr,Exports)+
labs(title = "French Exports by % of GDP",
y = "% of GDP")
fit<-fr%>%
model(
ANN = ETS(Exports ~ error("A")+trend("N")+season("N"))
)
r<-fit%>%report()
## Series: Exports
## Model: ETS(A,N,N)
## Smoothing parameters:
## alpha = 0.9998995
##
## Initial states:
## l[0]
## 14.3989
##
## sigma^2: 1.3745
##
## AIC AICc BIC
## 257.9200 258.3644 264.1013
f<-fit%>%forecast(h=10)
f
## # A fable: 10 x 5 [1Y]
## # Key: Country, .model [1]
## Country .model Year Exports .mean
## <fct> <chr> <dbl> <dist> <dbl>
## 1 France ANN 2018 N(31, 1.4) 30.9
## 2 France ANN 2019 N(31, 2.7) 30.9
## 3 France ANN 2020 N(31, 4.1) 30.9
## 4 France ANN 2021 N(31, 5.5) 30.9
## 5 France ANN 2022 N(31, 6.9) 30.9
## 6 France ANN 2023 N(31, 8.2) 30.9
## 7 France ANN 2024 N(31, 9.6) 30.9
## 8 France ANN 2025 N(31, 11) 30.9
## 9 France ANN 2026 N(31, 12) 30.9
## 10 France ANN 2027 N(31, 14) 30.9
f%>%autoplot(fr)+
labs(title = "Forecast of French Exports by % of GDP",
y = "% of GDP")
RMSE = 1.152003
fit%>%accuracy()
## # A tibble: 1 x 11
## Country .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <fct> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 France ANN Training 0.284 1.15 0.840 1.17 3.86 0.983 0.991 -0.00542
The AAn trended model takes into account the trend of the dataset. Since the french exports by gdp is heavily trending upward the forecast of the trended model are going to be trending much more then the forecast of the non trended model.In this case the RMSE offers a small edge to the trended method.For this particular dataset the trended method definitely has strong consideration because it is strongly trending in recent years however it has not historically always trended which lends the credence of using a different model. However, in this case I think the trended model is the clear choice.
fit2<-fr%>%
model(
AAN= ETS(Exports ~ error("A")+trend("A")+season("N"))
)
fit%>%accuracy()
## # A tibble: 1 x 11
## Country .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <fct> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 France ANN Training 0.284 1.15 0.840 1.17 3.86 0.983 0.991 -0.00542
fit2%>%accuracy()
## # A tibble: 1 x 11
## Country .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <fct> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 France AAN Training -0.0111 1.12 0.800 -0.243 3.81 0.936 0.964 0.0264
f2<-fit2%>%forecast(h=10)
f%>%autoplot(fr)+
labs(title = "Forecast of French Exports by % of GDP AAN Model",
y = "% of GDP")
f2%>%autoplot(fr)+
labs(title = "Forecast of French Exports by % of GDP ANN Model",
y = "% of GDP")
I think the forecasts, especially when graphed, for the trending method appear to be much more clearly in line with the historic dataset given the trends of the French exports by GDP. Also considering the RMSE and MAE I think the trended method is the better model.
Once again my intervals are much more narrow then R’s.
ANNRMSE<-fit%>%accuracy()%>%transmute(RMSE)
AANRMSE<-fit2%>%accuracy()%>%transmute(RMSE)
y<-f%>%pull(.mean)%>%head(1)
lbound<-y-(1.96*ANNRMSE)
ubound<-y+(1.96*ANNRMSE)
print(paste("My 95% CI ANN: ",lbound," ",ubound))
## [1] "My 95% CI ANN: 28.6238549283032 33.1397076931717"
print(paste("R's 95% CI ANN: ",f%>%pull(Exports)%>%head(1)))
## [1] "R's 95% CI ANN: N(31, 1.4)"
y<-f2%>%pull(.mean)%>%head(1)
lbound<-y-(1.96*AANRMSE)
ubound<-y+(1.96*AANRMSE)
print(paste("My 95% CI AAN: ",lbound," ",ubound))
## [1] "My 95% CI AAN: 28.9789600391914 33.3692025588516"
print(paste("R's 95% CI AAN: ",f%>%pull(Exports)%>%head(1)))
## [1] "R's 95% CI AAN: N(31, 1.4)"
We can see that box_cox transformed data tends to cause the forecasts to trend slightly upward as compared to non transformed. Due to the rapid increase in Chinese GDP a Box Cox trended model is likely the best choice.
ch<-global_economy%>%filter(Country=="China")
lambda <- ch %>%
features(GDP, features = guerrero) %>%
pull(lambda_guerrero)
fit<-ch%>%
model(
ANN = ETS(GDP ~ error("A")+trend("N")+season("N")),
AAN = ETS(GDP ~ error("A")+trend("A")+season("N")),
Damped = ETS(GDP ~ error("A") + trend("Ad") + season("N")),
bc_ANN = ETS(box_cox(GDP,lambda) ~ error("A")+trend("N")+season("N")),
bc_AAN = ETS(box_cox(GDP,lambda) ~ error("A")+trend("A")+season("N")),
bc_Damped = ETS(box_cox(GDP,lambda) ~ error("A") + trend("Ad") + season("N"))
)
r<-fit%>%report()
f<-fit%>%forecast(h=10)
f%>%autoplot(ch)+
labs(title = "Forecasts of Chinese GDP",
y = "GDP")
It is obvious that seasonality is needed in some form here due to the fluctuations. Multiplicative seasonality is the preferred method because the range of the seasonal peaks is increasing as time goes on. Adding the Damped function makes the models slightly more accurate with multiplicative_damped being the best model. Interestingly dampening the additive model produces the lowest RMSE but with what we know about the range of seasonality multiplicative models would be more accurate and preferred.
fit <- aus_production %>%
model(
additive = ETS(Gas ~ error("A") + trend("A") +
season("A")),
multiplicative = ETS(Gas ~ error("M") + trend("A") +
season("M")),
additive_damp = ETS(Gas ~ error("A") + trend("Ad") +
season("A")),
multiplicative_damp = ETS(Gas ~ error("M") + trend("Ad") +
season("M"))
)
fc <- fit %>% forecast(h = 15)
fc %>%
autoplot(aus_production, level = NULL) +
labs(title="Australian Gas Production",
y="Gas Production in petajoules") +
guides(colour = guide_legend(title = "Forecast"))
fit%>%accuracy()
## # A tibble: 4 x 10
## .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 additive Traini… 0.00525 4.76 3.35 -4.69 10.9 0.600 0.628 0.0772
## 2 multiplicative Traini… -0.115 4.60 3.02 0.199 4.08 0.542 0.606 -0.0131
## 3 additive_damp Traini… 0.600 4.58 3.16 0.460 7.39 0.566 0.604 0.0660
## 4 multiplicative_… Traini… -0.00439 4.59 3.03 0.326 4.10 0.544 0.606 -0.0217
Multiplicative seasonality is the preferred method because the range of the seasonal peaks is increasing as time goes on.
set.seed(19865)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
autoplot(myseries, Turnover) +
labs(title = "Retail Turnover",
y = "$Million AUD")
fit <- myseries %>%
model(
additive = ETS(Turnover ~ error("A") + trend("A") +
season("A")),
multiplicative = ETS(Turnover ~ error("M") + trend("A") +
season("M")),
additive_damp = ETS(Turnover ~ error("A") + trend("Ad") +
season("A")),
multiplicative_damp = ETS(Turnover ~ error("M") + trend("Ad") +
season("M"))
)
fc <- fit %>% forecast(h = 15)
fc %>%
autoplot(myseries, level = NULL) +
labs(title="Australian Retail Turnover in $Million AUD",
y="$Million AUD") +
guides(colour = guide_legend(title = "Forecast"))
fit%>%accuracy()
## # A tibble: 4 x 12
## State Industry .model .type ME RMSE MAE MPE MAPE MASE RMSSE
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 South … Other ret… additi… Trai… -0.0871 6.10 4.52 -0.482 3.98 0.546 0.569
## 2 South … Other ret… multip… Trai… 0.152 5.45 3.80 -0.0486 2.82 0.458 0.508
## 3 South … Other ret… additi… Trai… 0.858 6.26 4.65 0.840 4.16 0.561 0.584
## 4 South … Other ret… multip… Trai… 0.482 5.48 3.82 0.325 2.84 0.461 0.511
## # … with 1 more variable: ACF1 <dbl>
THe RMSE is clearly better for the multiplicative models then the additive.
fit <- myseries %>%
model(
multiplicative_damp = ETS(Turnover ~ error("M") + trend("Ad") +
season("M"))
)
fit%>% gg_tsresiduals()
The ACF graph indicates the residuals are not white noise
We can see based on RMSE these models beat the naive approach from chapter 5
train<- myseries%>%filter(year(Month)<=2010)
test<- myseries%>%filter(year(Month)>2010)
fit <- train %>%
model(
additive = ETS(Turnover ~ error("A") + trend("A") +
season("A")),
multiplicative = ETS(Turnover ~ error("M") + trend("A") +
season("M")),
additive_damp = ETS(Turnover ~ error("A") + trend("Ad") +
season("A")),
multiplicative_damp = ETS(Turnover ~ error("M") + trend("Ad") +
season("M"))
)
f<-fit%>%forecast(test)
f %>%
autoplot(myseries, level = NULL) +
labs(title="Australian Retail Turnover in $Million AUD",
y="$Million AUD") +
guides(colour = guide_legend(title = "Forecast"))
fit2 <- train %>%
model(SNAIVE(Turnover))
fc <- fit2 %>%
forecast(new_data = anti_join(myseries, train))
fc %>% autoplot(myseries)
fit%>%accuracy()
## # A tibble: 4 x 12
## State Industry .model .type ME RMSE MAE MPE MAPE MASE RMSSE
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 South … Other ret… additi… Trai… -0.0223 5.36 3.98 -0.497 4.07 0.531 0.564
## 2 South … Other ret… multip… Trai… 0.157 4.09 3.05 -0.0938 2.86 0.407 0.431
## 3 South … Other ret… additi… Trai… 0.701 5.55 4.08 0.516 4.16 0.546 0.584
## 4 South … Other ret… multip… Trai… 0.424 4.15 3.10 0.284 2.90 0.413 0.437
## # … with 1 more variable: ACF1 <dbl>
fit2 %>% accuracy()
## # A tibble: 1 x 12
## State Industry .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 South… Other re… SNAIVE… Trai… 6.31 9.50 7.49 5.84 6.97 1 1 0.605
fc %>% accuracy(myseries)
## # A tibble: 1 x 12
## .model State Industry .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(… South… Other r… Test 21.8 33.6 25.6 8.21 10.1 3.42 3.53 0.907
The STL decomposition with box cox transformation is the most accurate on the Training set.
lambda <- train %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
fit <- train %>%
model(
STL_BC = STL(box_cox(Turnover,lambda)),
multiplicative = ETS(Turnover ~ error("M") + trend("A") +
season("M")),
ETS_BC = ETS(box_cox(Turnover,lambda)~ error("M") + trend("A") +
season("M"))
)
fit%>%accuracy()
## # A tibble: 3 x 12
## State Industry .model .type ME RMSE MAE MPE MAPE MASE RMSSE
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 South … Other re… STL_BC Trai… 0.0869 3.36 2.41 0.00184 2.22 0.321 0.353
## 2 South … Other re… multip… Trai… 0.157 4.09 3.05 -0.0938 2.86 0.407 0.431
## 3 South … Other re… ETS_BC Trai… -0.480 4.77 3.23 -0.341 2.92 0.431 0.503
## # … with 1 more variable: ACF1 <dbl>