Homework2

Chapter 7 & 8 Questions

Calling all libraries

library(fpp)

## Loading required package: forecast

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

## Loading required package: fma

## Loading required package: expsmooth

## Loading required package: lmtest

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## Loading required package: tseries

library(fpp2)

## -- Attaching packages ---------------------------------------------- fpp2 2.4 --

## v ggplot2 3.3.5

##

## 
## Attaching package: 'fpp2'

## The following objects are masked from 'package:fpp':
## 
##     ausair, ausbeer, austa, austourists, debitcards, departures,
##     elecequip, euretail, guinearice, oil, sunspotarea, usmelec

library(fpp3)

## -- Attaching packages -------------------------------------------- fpp3 0.4.0 --

## v tibble      3.1.5      v tsibble     1.1.0 
## v dplyr       1.0.7      v tsibbledata 0.3.0 
## v tidyr       1.1.4      v feasts      0.2.2 
## v lubridate   1.7.10     v fable       0.3.1

## -- Conflicts ------------------------------------------------- fpp3_conflicts --
## x lubridate::date()      masks base::date()
## x dplyr::filter()        masks stats::filter()
## x fabletools::forecast() masks forecast::forecast()
## x tsibble::index()       masks zoo::index()
## x tsibble::intersect()   masks base::intersect()
## x tsibble::interval()    masks lubridate::interval()
## x dplyr::lag()           masks stats::lag()
## x tsibble::setdiff()     masks base::setdiff()
## x tsibble::union()       masks base::union()

## 
## Attaching package: 'fpp3'

## The following object is masked from 'package:fpp2':
## 
##     insurance

## The following object is masked from 'package:fpp':
## 
##     insurance

library(forecast)
library(grid)
library(gridExtra)

## 
## Attaching package: 'gridExtra'

## The following object is masked from 'package:dplyr':
## 
##     combine

library(seasonal)

## 
## Attaching package: 'seasonal'

## The following object is masked from 'package:tibble':
## 
##     view

library(urca)
library(Quandl)

## Loading required package: xts

## 
## Attaching package: 'xts'

## The following objects are masked from 'package:dplyr':
## 
##     first, last

Chapter 7 1. Consider “pigs” series - the number of pigs slaughtered in Victoria each month. a. Use the ses() function in R to find the optimal values of alpha and l, and generate forecasts for the next four months.

fc = ses(pigs, h = 4)
summary(fc)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = pigs, h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.2971 
## 
##   Initial states:
##     l = 77260.0561 
## 
##   sigma:  10308.58
## 
##      AIC     AICc      BIC 
## 4462.955 4463.086 4472.665 
## 
## Error measures:
##                    ME    RMSE      MAE       MPE     MAPE      MASE       ACF1
## Training set 385.8721 10253.6 7961.383 -0.922652 9.274016 0.7966249 0.01282239
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Sep 1995       98816.41 85605.43 112027.4 78611.97 119020.8
## Oct 1995       98816.41 85034.52 112598.3 77738.83 119894.0
## Nov 1995       98816.41 84486.34 113146.5 76900.46 120732.4
## Dec 1995       98816.41 83958.37 113674.4 76092.99 121539.8

Answer: alpha = 0.2971 l = 77260.0561

Compute a 95% prediction interval for the first forecast using y hat +/- 1.96s where s = standard deviation of the residuals. Compare your interval with the interval produced by R.

# Prediction Interval produced by R
c(fc$upper[1, "95%"],fc$lower[1, "95%"])

##       95%       95% 
## 119020.84  78611.97

# Manual Prediction Interval
s = sd(fc$residuals)
c(fc$mean[1] + 1.96*s, fc$mean[1] - 1.96*s)

## [1] 118952.84  78679.97

Answer: The manually computed 95% prediction interval is slightly smaller than the interval produced by R

Write your own function to implement simple exponential smoothing. The function should take arguments y (the time series), alpha (smoothing parameter) and level (the initial level). It should return the forecast of the next observation in the series. Does it give the same forecast as ses()?

ses_manual = function(y, alpha, level){
  x = level
  for (i in 1:length(y)) {
    x = alpha*y[i] + (1 - alpha)*x
  }
  print(x)
}
# Find Alpha and Intial Level of fc
a = fc$model$par[1]
l = fc$model$par[2]

# run ses_manual()
ses_manual(y = pigs, alpha = a, level = l)

##    alpha 
## 98816.41

Answer: The ses_manual() function computed the same forecast as ses().

Modify your function from the previous exercise to return the sum of squared errors rather than the forecast of the next observation. Then use the optim() function to find the optimal values of alpha and l. Do you get the same values as the ses() function?

ses_sse = function(y, alpha, level){
  SSE = 0
  x = level
  for (i in 1:length(y)) {
    error = y[i] - x
    SSE = SSE + error^2
    x = alpha*y[i] + (1 - alpha)*x
  }
  print(SSE)
}
# Find Alpha and Intial Level of fc
a = fc$model$par[1]
l = fc$model$par[2]

ses_sse(y = pigs, alpha = a, level = l)

##           l 
## 19765613407

# using optim() to find the optimal values of alpha and l
# optim(par = c(0.5, pigs[1]), fn = ses)

FINISH QUESTION 3 & 4

Combine your previous two functions to produce a function which both finds the optimal values of alpha and the intial level, and produces a forecast of the next observation in the series.
Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.

Plot the series and discuss the main features of the data.

autoplot(books)

Answer: Over time, there is a slight increase in sales. There is some slight seasonality in the data. Throughout most of the time, there are more hardcover books sold than paperback. b. Use the ses() function to forecast each series, and plot the forecasts.

## Create Subsets of Paperback and Hardcover Books
paper = books[,1]
hard = books[,2]
## ses() for each type of book
fc_paper = ses(paper, h = 4)
fc_hard = ses(hard, h = 4)
## summaries of each ses
summary(fc_paper)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = paper, h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.1685 
## 
##   Initial states:
##     l = 170.8271 
## 
##   sigma:  34.8183
## 
##      AIC     AICc      BIC 
## 318.9747 319.8978 323.1783 
## 
## Error measures:
##                    ME     RMSE     MAE       MPE     MAPE      MASE       ACF1
## Training set 7.175981 33.63769 27.8431 0.4736071 15.57784 0.7021303 -0.2117522
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       207.1097 162.4882 251.7311 138.8670 275.3523
## 32       207.1097 161.8589 252.3604 137.9046 276.3147
## 33       207.1097 161.2382 252.9811 136.9554 277.2639
## 34       207.1097 160.6259 253.5935 136.0188 278.2005

summary(fc_hard)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = hard, h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.3283 
## 
##   Initial states:
##     l = 149.2861 
## 
##   sigma:  33.0517
## 
##      AIC     AICc      BIC 
## 315.8506 316.7737 320.0542 
## 
## Error measures:
##                    ME     RMSE      MAE      MPE     MAPE      MASE       ACF1
## Training set 9.166735 31.93101 26.77319 2.636189 13.39487 0.7987887 -0.1417763
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       239.5601 197.2026 281.9176 174.7799 304.3403
## 32       239.5601 194.9788 284.1414 171.3788 307.7414
## 33       239.5601 192.8607 286.2595 168.1396 310.9806
## 34       239.5601 190.8347 288.2855 165.0410 314.0792

## Plot the forecasts
autoplot(books) + autolayer(fitted(fc_paper), series = "Paperback Forecast") +
  autolayer(fitted(fc_hard), series = "Hardcover Forecast")

c. Compute the RMSE values for the training data in each case.

accuracy(fc_paper)[2]

## [1] 33.63769

accuracy(fc_hard)[2]

## [1] 31.93101

We will continue with the daily sales of paperback and hardcover books in data set books.

Apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.

holt_paper = holt(paper, h = 4)
holt_hard = holt(hard, h = 4)
autoplot(books) + autolayer(fitted(holt_paper), series = "Paperback Forecast") +
  autolayer(fitted(holt_hard), series = "Hardcover Forecast")

b. Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES). Discuss the merits of the two forecasting methods for these data sets.

# Recalling RMSE from last question. 
accuracy(fc_paper)[2]

## [1] 33.63769

accuracy(fc_hard)[2]

## [1] 31.93101

# RMSE with Holt's Linear Method.
accuracy(holt_paper)[2]

## [1] 31.13692

accuracy(holt_hard)[2]

## [1] 27.19358

Answer: Holt’s Linear Method adds an additional smoothing parameter, beta star. With the addition of this smoothing parameter, we see a decrease in RMSE for both paperback and hardcover books. This tells me that the smoothing parameter improves the forecasts on both types of books.

Compare the forecasts for the two series using both methods. Which do you think is best?

summary(holt_paper)

## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = paper, h = 4) 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 170.699 
##     b = 1.2621 
## 
##   sigma:  33.4464
## 
##      AIC     AICc      BIC 
## 318.3396 320.8396 325.3456 
## 
## Error measures:
##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -3.717178 31.13692 26.18083 -5.508526 15.58354 0.6602122
##                    ACF1
## Training set -0.1750792
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       209.4668 166.6035 252.3301 143.9130 275.0205
## 32       210.7177 167.8544 253.5811 145.1640 276.2715
## 33       211.9687 169.1054 254.8320 146.4149 277.5225
## 34       213.2197 170.3564 256.0830 147.6659 278.7735

summary(holt_hard)

## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = hard, h = 4) 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 147.7935 
##     b = 3.303 
## 
##   sigma:  29.2106
## 
##      AIC     AICc      BIC 
## 310.2148 312.7148 317.2208 
## 
## Error measures:
##                      ME     RMSE      MAE       MPE    MAPE      MASE
## Training set -0.1357882 27.19358 23.15557 -2.114792 12.1626 0.6908555
##                     ACF1
## Training set -0.03245186
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       250.1739 212.7390 287.6087 192.9222 307.4256
## 32       253.4765 216.0416 290.9113 196.2248 310.7282
## 33       256.7791 219.3442 294.2140 199.5274 314.0308
## 34       260.0817 222.6468 297.5166 202.8300 317.3334

Answer: As well look at the Error Measures for all 4 models, we can see significant improvement when we use Holt’s method for both paperback and hardcover.

Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses() and holt().

s_paper_fc = sd(fc_paper$residuals)
s_hard_fc = sd(fc_hard$residuals)
s_paper_holt = sd(holt_paper$residuals)
s_hard_holt = sd(holt_hard$residuals)
# Paperback SES
c(fc_paper$mean[1] + 1.96*s_paper_fc, fc_paper$mean[1] - 1.96*s_paper_fc)

## [1] 272.6230 141.5964

c(c(fc_paper$upper[1, "95%"],fc_paper$lower[1, "95%"]))

##      95%      95% 
## 275.3523 138.8670

# Hardcover SES
c(fc_hard$mean[1] + 1.96*s_hard_fc, fc_hard$mean[1] - 1.96*s_hard_fc)

## [1] 300.5354 178.5848

c(c(fc_hard$upper[1, "95%"],fc_hard$lower[1, "95%"]))

##      95%      95% 
## 304.3403 174.7799

# Paperback Holt
c(holt_paper$mean[1] + 1.96*s_paper_holt, holt_paper$mean[1] - 1.96*s_paper_holt)

## [1] 271.0945 147.8390

c(c(holt_paper$upper[1, "95%"],holt_paper$lower[1, "95%"]))

##      95%      95% 
## 275.0205 143.9130

# Hardcover Holt
c(holt_hard$mean[1] + 1.96*s_hard_holt, holt_hard$mean[1] - 1.96*s_hard_holt)

## [1] 304.3838 195.9640

c(c(holt_hard$upper[1, "95%"],holt_hard$lower[1, "95%"]))

##      95%      95% 
## 307.4256 192.9222

Answer: The manually computed prediction intervals are smaller than the R generated interval. 7. For this exercise use data set eggs, the price of a dozen eggs in the United States, from 1990-1993. Experiment with the various options in the holt() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts. Which model gives the best RMSE?

holt1 = holt(eggs, h=100)
holt2 = holt(eggs, h = 100, damped = T)
holt3 = holt(eggs, h = 100, lambda = "auto")
holt4 = holt(eggs, h = 100, damped = T, lambda = "auto")

plot1 = autoplot(holt1)
plot2 = autoplot(holt2)
plot3 = autoplot(holt3)
plot4 = autoplot(holt4)

grid.arrange(plot1, plot2, plot3, plot4, nrow = 2)

accuracy(holt1)[2]

## [1] 26.58219

accuracy(holt2)[2]

## [1] 26.54019

accuracy(holt3)[2]

## [1] 26.39364

accuracy(holt4)[2]

## [1] 26.53317

Answer: The Box-Cox Holt method model produces the lowest RMSE.

Recall your retail time series data (from Exercise 3 in Section 2.10).

retaildata = readxl::read_excel("C:/Users/Jake/Documents/PredAnalytics/retail.xlsx", skip=1)
retail_ts = ts(retaildata[,"A3349627V"], frequency = 12, start = c(1982,4))
autoplot(retail_ts)

a. Why is multiplicative seasonality necessary for this series? Answer: Multiplicative seasonality is necessary for this series because there is an increase in the trend as well as an increase in seasonality.

Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

fc1 = hw(retail_ts, seasonal = "multiplicative", h = 100)
fc2 = hw(retail_ts, seasonal = "multiplicative", h = 100, damped = T)

plot1 = autoplot(fc1)
plot2 = autoplot(fc2)

grid.arrange(plot1, plot2, nrow = 2)

c. Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

accuracy(fc1)[2]

## [1] 5.916807

accuracy(fc2)[2]

## [1] 5.628031

Answer: I prefer the RMSE of the Damped Holt-Winters’ multiplicative method.

Check that the residuals from the best method look like white noise.

checkresiduals(fc2)

## 
##  Ljung-Box test
## 
## data:  Residuals from Damped Holt-Winters' multiplicative method
## Q* = 100.48, df = 7, p-value < 2.2e-16
## 
## Model df: 17.   Total lags used: 24

Answer: Does not look like there is a lot of white noise, if any at all.

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 8 in Section 3.7?

train = window(retail_ts, end = c(2010, 12))
test = window(retail_ts, start = c(2011, 1))

# Holt Winters
train_fc = hw(train, damped = T, seasonal = "multiplicative")
accuracy(train_fc, test)[,2]

## Training set     Test set 
##     5.390419     8.694021

# naive
train_n = snaive(train, h = 100)
accuracy(train_n, test)[,2]

## Training set     Test set 
##     12.27525     32.04685

Answer: The Holt Winters’ damped method has a lower RMSE than the naive method.

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

L = BoxCox.lambda(train)
fc1 = stlf(train, lambda = L)
fc2 = ets(seasadj(decompose(train, "multiplicative")))

accuracy(fc1, test)[,2]

## Training set     Test set 
##     4.827817     9.539044

sqrt(fc2$mse)

## [1] 5.260437

Answer: The STL model has the best RMSE compared to all the other models we tested with this data set.

For this exercise use data set ukcars, the quarterly UK passenger vehicle production data from 1977Q1–2005Q1.

Plot the data and describe the main features of the series.

autoplot(ukcars)

Answer: There is seasonality from year to year, but we also see slight seasonality from 1980 to 2000. I would predict that if we would continue the data past 2005Q1, we would see the trend start to decrease like we see from 1977-1980.

Decompose the series using STL and obtain the seasonally adjusted data.

fc1 = stlf(seasadj(decompose(ukcars, "multiplicative")))
autoplot(fc1)

fc1

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005 Q2       388.1092 358.1275 418.0909 342.2562 433.9622
## 2005 Q3       427.9592 390.2526 465.6657 370.2919 485.6264
## 2005 Q4       401.6252 357.5268 445.7237 334.1825 469.0680
## 2006 Q1       399.3611 349.6866 449.0357 323.3905 475.3318
## 2006 Q2       388.1092 333.4242 442.7941 304.4758 471.7426
## 2006 Q3       427.9592 368.6858 487.2325 337.3084 518.6099
## 2006 Q4       401.6252 338.0940 465.1565 304.4626 498.7879
## 2007 Q1       399.3611 331.8400 466.8823 296.0965 502.6258

Forecast the next two years of the series using an additive damped trend method applied to the seasonally adjusted data. (This can be done in one step using stlf() with arguments etsmodel=“AAN”, damped=TRUE.)

fc2 = stlf(seasadj(decompose(ukcars, "multiplicative")),etsmodel="AAN", damped=T)
autoplot(fc2)

fc2

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005 Q2       388.2758 357.7135 418.8381 341.5348 435.0168
## 2005 Q3       428.1237 390.4048 465.8427 370.4376 485.8099
## 2005 Q4       401.7879 358.0672 445.5087 334.9228 468.6531
## 2006 Q1       399.5220 350.5282 448.5159 324.5924 474.4517
## 2006 Q2       388.2684 334.5153 442.0215 306.0601 470.4767
## 2006 Q3       428.1168 369.9921 486.2416 339.2227 517.0110
## 2006 Q4       401.7815 339.5909 463.9721 306.6691 496.8938
## 2007 Q1       399.5160 333.5089 465.5231 298.5668 500.4652

Forecast the next two years of the series using Holt’s linear method applied to the seasonally adjusted data (as before but with damped=FALSE).

fc3 = stlf(seasadj(decompose(ukcars, "multiplicative")),etsmodel="AAN", damped=F)
autoplot(fc3)

fc3

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005 Q2       389.4965 359.1261 419.8669 343.0490 435.9440
## 2005 Q3       430.3284 392.4549 468.2020 372.4058 488.2510
## 2005 Q4       404.9765 360.8563 449.0967 337.5005 472.4525
## 2006 Q1       403.6944 354.1070 453.2818 327.8570 479.5318
## 2006 Q2       393.4244 338.9142 447.9347 310.0582 476.7907
## 2006 Q3       434.2564 375.2313 493.2815 343.9852 524.5276
## 2006 Q4       408.9045 345.6850 472.1239 312.2186 505.5903
## 2007 Q1       407.6224 340.4690 474.7757 304.9202 510.3245

Now use ets() to choose a seasonal model for the data.

fc4 = ets(seasadj(decompose(ukcars, "multiplicative")))
autoplot(fc4)

fc4

## ETS(A,N,N) 
## 
## Call:
##  ets(y = seasadj(decompose(ukcars, "multiplicative"))) 
## 
##   Smoothing parameters:
##     alpha = 0.5493 
## 
##   Initial states:
##     l = 319.8538 
## 
##   sigma:  27.4919
## 
##      AIC     AICc      BIC 
## 1287.116 1287.337 1295.299

forecast(fc4)

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005 Q2       404.2827 369.0504 439.5150 350.3996 458.1659
## 2005 Q3       404.2827 364.0856 444.4799 342.8065 465.7589
## 2005 Q4       404.2827 359.6699 448.8956 336.0533 472.5122
## 2006 Q1       404.2827 355.6535 452.9119 329.9108 478.6547
## 2006 Q2       404.2827 351.9444 456.6210 324.2382 484.3272
## 2006 Q3       404.2827 348.4814 460.0841 318.9419 489.6235
## 2006 Q4       404.2827 345.2210 463.3444 313.9556 494.6098
## 2007 Q1       404.2827 342.1314 466.4340 309.2305 499.3349

Compare the RMSE of the ETS model with the RMSE of the models you obtained using STL decompositions. Which gives the better in-sample fits?

accuracy(fc1)[2]

## [1] 23.18687

accuracy(fc2)[2]

## [1] 23.31431

accuracy(fc3)[2]

## [1] 23.27491

accuracy(fc4)[2]

## [1] 27.24753

Answer: The seasonal adjusted STL model has the best RMSE compared to the 3 other models.

Compare the forecasts from the three approaches? Which seems most reasonable? Answer: fc1 and fc2 look the most reasonable when looking at the forecasts for the next 8 months. My favorite one fc1 because it is reasonable and it has the lowest RMSE.
Check the residuals of your preferred model.

checkresiduals(fc1)

## Warning in checkresiduals(fc1): The fitted degrees of freedom is based on the
## model used for the seasonally adjusted data.

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  ETS(A,N,N)
## Q* = 23.677, df = 6, p-value = 0.0005987
## 
## Model df: 2.   Total lags used: 8

For this exercise use data set visitors, the monthly Australian short-term overseas visitors data, May 1985–April 2005.

Make a time plot of your data and describe the main features of the series.

autoplot(visitors)

Answer: There is an increasing trend with seasonality.

Split your data into a training set and a test set comprising the last two years of available data. Forecast the test set using Holt-Winters’ multiplicative method.

train = window(visitors, end = c(2003, 4))
test = window(visitors, start = c(2003, 5))

fc = hw(train, method = "multiplicative")
autoplot(fc)

accuracy(fc, test)[,2]

## Training set     Test set 
##     17.15453     50.55002

Why is multiplicative seasonality necessary here? Answer: Since the data is showing an increasing trend and seasonality, multiplicative method is necessary.
Forecast the two-year test set using each of the following methods:
an ETS model;

fc1 = forecast(ets(train))
fc1

##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## May 2003       289.1430 269.1593 309.1267 258.5805 319.7055
## Jun 2003       302.8945 278.0253 327.7638 264.8603 340.9288
## Jul 2003       365.1002 331.0208 399.1797 312.9802 417.2203
## Aug 2003       335.4355 300.7517 370.1193 282.3912 388.4798
## Sep 2003       316.4694 280.8376 352.1012 261.9752 370.9635
## Oct 2003       368.6538 324.0039 413.3037 300.3676 436.9399
## Nov 2003       389.4115 339.1366 439.6865 312.5226 466.3005
## Dec 2003       473.8472 409.0950 538.5994 374.8173 572.8771
## Jan 2004       345.3549 295.6837 395.0261 269.3894 421.3204
## Feb 2004       385.2385 327.1902 443.2868 296.4613 474.0157
## Mar 2004       376.2091 317.0470 435.3712 285.7285 466.6897
## Apr 2004       335.2930 280.4421 390.1438 251.4058 419.1801
## May 2004       289.2955 240.1998 338.3913 214.2101 364.3810
## Jun 2004       303.0511 249.8268 356.2754 221.6516 384.4507
## Jul 2004       365.2852 299.0326 431.5378 263.9606 466.6098
## Aug 2004       335.6020 272.8590 398.3450 239.6449 431.5591
## Sep 2004       316.6233 255.7066 377.5400 223.4593 409.7873
## Oct 2004       368.8295 295.9123 441.7466 257.3123 480.3466
## Nov 2004       389.5934 310.5530 468.6337 268.7116 510.4752
## Dec 2004       474.0640 375.4849 572.6431 323.3003 624.8278
## Jan 2005       345.5097 271.9494 419.0701 233.0089 458.0106
## Feb 2005       385.4077 301.4801 469.3354 257.0514 513.7641
## Mar 2005       376.3711 292.6172 460.1249 248.2806 504.4615
## Apr 2005       335.4344 259.2206 411.6482 218.8754 451.9934

an additive ETS model applied to a Box-Cox transformed series;

L = BoxCox.lambda(train)
fc2 = forecast(ets(train, lambda = L))
fc2

##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## May 2003       290.9802 272.3185 310.4376 262.7568 321.0647
## Jun 2003       313.7044 291.2802 337.1995 279.8357 350.0778
## Jul 2003       381.8695 353.5604 411.5832 339.1330 427.8913
## Aug 2003       336.1874 307.7440 366.2529 293.3303 382.8385
## Sep 2003       329.0056 298.8043 361.0849 283.5602 378.8435
## Oct 2003       366.3147 331.7830 403.0552 314.3769 423.4188
## Nov 2003       380.7326 343.2510 420.7253 324.4018 442.9365
## Dec 2003       470.2128 424.9720 518.4111 402.1925 545.1505
## Jan 2004       348.8481 309.8241 390.8706 290.3470 414.3624
## Feb 2004       388.8374 345.1487 435.8990 323.3496 462.2140
## Mar 2004       376.2235 331.7662 424.3051 309.6571 451.2668
## Apr 2004       332.3026 289.8007 378.5808 268.7826 404.6548
## May 2004       291.2614 250.0569 336.5545 229.8424 362.2430
## Jun 2004       313.9935 269.3939 363.0383 247.5212 390.8622
## Jul 2004       382.1907 330.0105 439.3501 304.3364 471.6906
## Aug 2004       336.4776 287.1721 390.8691 263.0563 421.7942
## Sep 2004       329.2861 279.4263 384.4794 255.1108 415.9353
## Oct 2004       366.6091 311.8031 427.1950 285.0445 461.6919
## Nov 2004       381.0283 323.5076 444.6813 295.4484 480.9505
## Dec 2004       470.5443 403.0139 544.8824 369.9245 587.0862
## Jan 2005       349.1167 292.4202 412.3697 264.9548 448.6131
## Feb 2005       389.1195 326.9890 458.3002 296.8408 497.8871
## Mar 2005       376.4941 314.5059 445.7620 284.5184 485.4954
## Apr 2005       332.5477 274.3524 398.0625 246.3800 435.8337

a seasonal naïve method;

fc3 = snaive(train, h = 24)
fc3

##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## May 2003          329.9 289.9718 369.8282 268.8351 390.9649
## Jun 2003          339.4 299.4718 379.3282 278.3351 400.4649
## Jul 2003          418.2 378.2718 458.1282 357.1351 479.2649
## Aug 2003          371.9 331.9718 411.8282 310.8351 432.9649
## Sep 2003          358.6 318.6718 398.5282 297.5351 419.6649
## Oct 2003          428.9 388.9718 468.8282 367.8351 489.9649
## Nov 2003          437.0 397.0718 476.9282 375.9351 498.0649
## Dec 2003          534.0 494.0718 573.9282 472.9351 595.0649
## Jan 2004          396.6 356.6718 436.5282 335.5351 457.6649
## Feb 2004          427.5 387.5718 467.4282 366.4351 488.5649
## Mar 2004          392.5 352.5718 432.4282 331.4351 453.5649
## Apr 2004          321.5 281.5718 361.4282 260.4351 382.5649
## May 2004          329.9 273.4330 386.3670 243.5412 416.2588
## Jun 2004          339.4 282.9330 395.8670 253.0412 425.7588
## Jul 2004          418.2 361.7330 474.6670 331.8412 504.5588
## Aug 2004          371.9 315.4330 428.3670 285.5412 458.2588
## Sep 2004          358.6 302.1330 415.0670 272.2412 444.9588
## Oct 2004          428.9 372.4330 485.3670 342.5412 515.2588
## Nov 2004          437.0 380.5330 493.4670 350.6412 523.3588
## Dec 2004          534.0 477.5330 590.4670 447.6412 620.3588
## Jan 2005          396.6 340.1330 453.0670 310.2412 482.9588
## Feb 2005          427.5 371.0330 483.9670 341.1412 513.8588
## Mar 2005          392.5 336.0330 448.9670 306.1412 478.8588
## Apr 2005          321.5 265.0330 377.9670 235.1412 407.8588

an STL decomposition applied to the Box-Cox transformed data followed by an ETS model applied to the seasonally adjusted (transformed) data.

fc4 = stlf(train, lambda = T)
fc4

##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## May 2003       290.5470 268.6167 312.4773 257.0076 324.0865
## Jun 2003       315.6715 289.9416 341.4013 276.3210 355.0219
## Jul 2003       390.7760 361.7081 419.8440 346.3204 435.2316
## Aug 2003       343.6693 311.5803 375.7583 294.5934 392.7452
## Sep 2003       340.6897 305.8140 375.5653 287.3520 394.0274
## Oct 2003       379.3255 341.8451 416.8058 322.0043 436.6467
## Nov 2003       395.6674 355.7285 435.6062 334.5862 456.7485
## Dec 2003       488.6070 446.3303 530.8837 423.9504 553.2637
## Jan 2004       367.5135 323.0004 412.0266 299.4366 435.5904
## Feb 2004       411.4145 364.7518 458.0771 340.0500 482.7789
## Mar 2004       399.6163 350.8792 448.3533 325.0793 474.1532
## Apr 2004       357.4219 306.6763 408.1675 279.8133 435.0306
## May 2004       310.3401 257.6443 363.0359 229.7488 390.9314
## Jun 2004       335.4645 280.8704 390.0586 251.9701 418.9590
## Jul 2004       410.5691 354.1233 467.0149 324.2427 496.8955
## Aug 2004       363.4623 305.2070 421.7177 274.3684 452.5563
## Sep 2004       360.4827 300.4560 420.5095 268.6797 452.2858
## Oct 2004       399.1186 337.3552 460.8819 304.6597 493.5775
## Nov 2004       415.4604 351.9924 478.9284 318.3945 512.5264
## Dec 2004       508.4001 443.2568 573.5434 408.7719 608.0282
## Jan 2005       387.3066 320.5150 454.0982 285.1576 489.4556
## Feb 2005       431.2075 362.7926 499.6224 326.5760 535.8391
## Mar 2005       419.4093 349.3944 489.4243 312.3307 526.4880
## Apr 2005       377.2150 305.6216 448.8084 267.7223 486.7077

fc5 = forecast(ets(seasadj(decompose(train, "multiplicative"))))
fc5

##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## May 2003       358.4865 334.5560 382.4171 321.8880 395.0851
## Jun 2003       360.4866 331.9576 389.0155 316.8552 404.1179
## Jul 2003       362.4866 329.9576 395.0155 312.7378 412.2353
## Aug 2003       364.4866 328.3565 400.6166 309.2304 419.7427
## Sep 2003       366.4866 327.0447 405.9284 306.1654 426.8077
## Oct 2003       368.4866 325.9543 411.0189 303.4390 433.5341
## Nov 2003       370.4866 325.0398 415.9333 300.9818 439.9913
## Dec 2003       372.4866 324.2693 420.7038 298.7446 446.2285
## Jan 2004       374.4866 323.6189 425.3542 296.6913 452.2819
## Feb 2004       376.4866 323.0707 429.9024 294.7941 458.1791
## Mar 2004       378.4866 322.6104 434.3627 293.0314 463.9417
## Apr 2004       380.4866 322.2268 438.7463 291.3860 469.5872
## May 2004       382.4866 321.9107 443.0625 289.8437 475.1294
## Jun 2004       384.4866 321.6544 447.3188 288.3930 480.5801
## Jul 2004       386.4866 321.4515 451.5216 287.0240 485.9491
## Aug 2004       388.4866 321.2967 455.6764 285.7286 491.2446
## Sep 2004       390.4866 321.1854 459.7878 284.4996 496.4736
## Oct 2004       392.4866 321.1135 463.8597 283.3309 501.6423
## Nov 2004       394.4866 321.0776 467.8956 282.2172 506.7559
## Dec 2004       396.4866 321.0746 471.8985 281.1539 511.8192
## Jan 2005       398.4866 321.1019 475.8713 280.1369 516.8362
## Feb 2005       400.4866 321.1571 479.8161 279.1626 521.8106
## Mar 2005       402.4866 321.2380 483.7352 278.2276 526.7456
## Apr 2005       404.4866 321.3428 487.6304 277.3291 531.6441

Which method gives the best forecasts? Does it pass the residual tests?

accuracy(fc1)[2]

## [1] 14.5348

accuracy(fc2)[2]

## [1] 14.97096

accuracy(fc3)[2]

## [1] 31.15613

accuracy(fc4)[2]

## [1] 13.06549

accuracy(fc5)[2]

## [1] 14.23343

checkresiduals(fc4)

## Warning in checkresiduals(fc4): The fitted degrees of freedom is based on the
## model used for the seasonally adjusted data.

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  ETS(M,A,N)
## Q* = 44.081, df = 20, p-value = 0.001468
## 
## Model df: 4.   Total lags used: 24

Answer: fc4 is the best model because it produces the best RMSE. The residuals seem to check out.

The fets() function below returns ETS forecasts.

Apply tsCV() for a forecast horizon of h=4, for both ETS and seasonal naive methods to the qcement data.

fets = function(y, h){
  forecast(ets(y), h = h)
}

cv1 = tsCV(qcement, fets, h=4)
cv2 = tsCV(qcement, snaive, h=4)

Compute the MSE of the resulting 4-step-ahead errors.Why are there missing values? Comment on which forecasts are more accurate. Is this what you expected?

mean(cv1^2, na.rm = T)

## [1] 0.01250954

mean(cv2^2, na.rm = T)

## [1] 0.01792147

Answer: MSE of the ETS model has a much better MSE than the seasonal naive model.

Compare ets(), snaive() and stlf() on the following six time series. For stlf(), you might need to use a Box-Cox transformation. Use a test set of three years to decide what gives the best forecasts. ausbeer, bricksq, dole, a10, h02, usmelec.

ausbeer.train = window(ausbeer, end = c(2007,2))
ausbeer.test = window(ausbeer, start = c(2007,3))
# ets
fc1.ausbeer = forecast(ets(ausbeer.train), h=12)
# naive
fc2.ausbeer = snaive(ausbeer.train, h=12)
# stlf
fc3.ausbeer = stlf(ausbeer.train, h = 12)
# accuracy
accuracy(fc1.ausbeer, ausbeer.test)

##                      ME      RMSE       MAE          MPE     MAPE      MASE
## Training set -0.3466095 15.781973 11.979955 -0.064073435 2.864311 0.7567076
## Test set      0.1272571  9.620828  8.919347  0.009981598 2.132836 0.5633859
##                    ACF1 Theil's U
## Training set -0.1942276        NA
## Test set      0.3763918 0.1783972

accuracy(fc2.ausbeer, ausbeer.test)

##                     ME     RMSE      MAE        MPE     MAPE      MASE
## Training set  3.336634 19.67005 15.83168  0.9044009 3.771370 1.0000000
## Test set     -2.916667 10.80509  9.75000 -0.6505927 2.338012 0.6158537
##                      ACF1 Theil's U
## Training set 0.0009690785        NA
## Test set     0.3254581810 0.1981463

accuracy(fc3.ausbeer, ausbeer.test)

##                       ME      RMSE       MAE         MPE     MAPE      MASE
## Training set -0.08449615 13.640828 10.495557 -0.02319029 2.523657 0.6629464
## Test set     -0.74491615  9.596294  9.014064 -0.20867707 2.159142 0.5693686
##                    ACF1 Theil's U
## Training set -0.1623773        NA
## Test set      0.3154543 0.1712144

bricksq.train = window(bricksq, end = c(1991,1))
bricksq.test = window(bricksq, start = c(1991,2))
# ets
fc1.bricksq = forecast(ets(bricksq.train), h=12)
# naive
fc2.bricksq = snaive(bricksq.train, h=12)
# stlf
fc3.bricksq = stlf(bricksq.train, h = 12)
# accuracy
accuracy(fc1.bricksq, bricksq.test)

##                     ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -0.420627 21.77283 15.37691 -0.1306659 3.788064 0.4254112
## Test set      6.291878 15.01072 13.21052  1.4030051 3.056918 0.3654767
##                    ACF1 Theil's U
## Training set 0.16821489        NA
## Test set     0.08041443 0.4489536

accuracy(fc2.bricksq, bricksq.test)

##                      ME     RMSE      MAE       MPE      MAPE     MASE
## Training set   7.708029 49.66883 36.14599  1.724113  8.858339 1.000000
## Test set     -33.750000 55.92033 47.58333 -7.691441 11.082932 1.316421
##                     ACF1 Theil's U
## Training set  0.79567285        NA
## Test set     -0.04907781  1.593665

accuracy(fc3.bricksq, bricksq.test)

##                    ME     RMSE      MAE       MPE     MAPE      MASE      ACF1
## Training set 1.488067 19.83378 14.26588 0.3725927 3.516000 0.3946741 0.2530528
## Test set     9.221702 23.13525 19.43615 2.1126151 4.545542 0.5377126 0.3254647
##              Theil's U
## Training set        NA
## Test set     0.6962622

dole.train = window(dole, end=c(1989,7))
dole.test = window(dole, start=c(1989,8))
# ets
fc1.dole = forecast(ets(dole.train), h=12)
# naive
fc2.dole = snaive(dole.train, h=12)
# stlf
fc3.dole = stlf(dole.train, h = 12)
# accuracy
accuracy(fc1.dole, dole.test)

##                       ME     RMSE       MAE       MPE    MAPE      MASE
## Training set    98.00253 16130.58  9427.497 0.5518769 6.20867 0.2995409
## Test set     27788.36057 33298.90 27788.361 6.9987596 6.99876 0.8829225
##                   ACF1 Theil's U
## Training set 0.5139536        NA
## Test set     0.4229686  2.661928

accuracy(fc2.dole, dole.test)

##                     ME     RMSE      MAE       MPE     MAPE     MASE      ACF1
## Training set  12606.58 56384.06 31473.16  3.350334 27.73651 1.000000 0.9781058
## Test set     -28527.50 53010.22 47640.67 -7.974568 12.40354 1.513692 0.7245915
##              Theil's U
## Training set        NA
## Test set      4.091062

accuracy(fc3.dole, dole.test)

##                      ME      RMSE       MAE       MPE     MAPE      MASE
## Training set   256.3644  6248.781  3836.768 0.5425694 4.804379 0.1219060
## Test set     12251.8901 29920.856 20396.704 2.8597442 4.946181 0.6480666
##                    ACF1 Theil's U
## Training set 0.01337877        NA
## Test set     0.64087706  2.371564

a10.train = window(a10, end=c(2005,6))
a10.test = window(a10, start=c(2005,7))
# ets
fc1.a10 = forecast(ets(a10.train), h=12)
# naive
fc2.a10 = snaive(a10.train, h=12)
# stlf
fc3.a10 = stlf(a10.train, h = 12)
# accuracy
accuracy(fc1.a10, a10.test)

##                      ME     RMSE       MAE       MPE     MAPE      MASE
## Training set 0.04378049 0.488989 0.3542285 0.2011915 3.993267 0.3611299
## Test set     0.11379391 1.023291 0.8808195 0.2374252 5.155528 0.8979803
##                     ACF1 Theil's U
## Training set -0.05238173        NA
## Test set     -0.39464995 0.3207898

accuracy(fc2.a10, a10.test)

##                     ME     RMSE       MAE       MPE      MAPE     MASE
## Training set 0.9577207 1.177528 0.9808895 10.862831 11.157672 1.000000
## Test set     1.4016923 1.704054 1.4393090  7.698516  7.962797 1.467351
##                    ACF1 Theil's U
## Training set  0.3779746        NA
## Test set     -0.7220504 0.5840003

accuracy(fc3.a10, a10.test)

##                     ME      RMSE       MAE       MPE     MAPE      MASE
## Training set 0.0554619 0.4582707 0.3448964 0.3743525 4.015951 0.3516159
## Test set     0.3270847 1.3596831 1.1499080 0.7849906 6.684667 1.1723114
##                     ACF1 Theil's U
## Training set  0.02360657        NA
## Test set     -0.16012041 0.4000005

h02.train = window(h02, end=c(2005,7))
h02.test = window(h02, start=c(2005,8))
# ets
fc1.h02 = forecast(ets(h02.train), h=12)
# naive
fc2.h02 = snaive(h02.train, h=12)
# stlf
fc3.h02 = stlf(h02.train, h = 12)
# accuracy
accuracy(fc1.h02, h02.test)

##                        ME       RMSE        MAE        MPE     MAPE      MASE
## Training set 0.0004513622 0.04579930 0.03426080 -0.2209876 4.560572 0.5745005
## Test set     0.0232785622 0.06402092 0.05309284  1.7574790 5.577604 0.8902845
##                     ACF1 Theil's U
## Training set  0.05057522        NA
## Test set     -0.38030773 0.3337125

accuracy(fc2.h02, h02.test)

##                       ME       RMSE        MAE       MPE     MAPE     MASE
## Training set  0.03774904 0.07171709 0.05963581  5.094248 8.197262 1.000000
## Test set     -0.01621003 0.07156409 0.05756597 -1.347556 6.137698 0.965292
##                    ACF1 Theil's U
## Training set 0.40080565        NA
## Test set     0.01845296 0.4285937

accuracy(fc3.h02, h02.test)

##                        ME       RMSE        MAE        MPE     MAPE      MASE
## Training set -0.000830773 0.04053338 0.03033702 -0.2685748 4.274993 0.5087047
## Test set      0.013985124 0.06423336 0.05508629  0.3780307 5.956852 0.9237115
##                     ACF1 Theil's U
## Training set -0.06958533        NA
## Test set     -0.10704436 0.3342446

usmelec.train = window(usmelec, end=c(2010,6))
usmelec.test = window(usmelec, start=c(2010,7))
# ets
fc1.usmelec = forecast(ets(usmelec.train), h=12)
# naive
fc2.usmelec = snaive(usmelec.train, h=12)
# stlf
fc3.usmelec = stlf(usmelec.train, h = 12)
# accuracy
accuracy(fc1.usmelec, usmelec.test)

##                       ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -0.07834851 7.447167 5.601963 -0.1168709 2.163495 0.6225637
## Test set     -1.17486480 7.974503 6.633269 -0.5181480 1.933084 0.7371760
##                   ACF1 Theil's U
## Training set 0.2719249        NA
## Test set     0.2142856 0.2257995

accuracy(fc2.usmelec, usmelec.test)

##                    ME     RMSE       MAE      MPE     MAPE     MASE      ACF1
## Training set 4.921564 11.58553  8.998217 2.000667 3.511648 1.000000 0.4841860
## Test set     9.155417 16.08331 12.285917 2.473758 3.402411 1.365372 0.4247762
##              Theil's U
## Training set        NA
## Test set      0.332999

accuracy(fc3.usmelec, usmelec.test)

##                      ME     RMSE      MAE         MPE     MAPE      MASE
## Training set -0.0657726 6.192160 4.682101 -0.06613967 1.826421 0.5203366
## Test set     -3.8976853 9.228437 7.981593 -1.34866348 2.401607 0.8870194
##                   ACF1 Theil's U
## Training set 0.1106876        NA
## Test set     0.2390297 0.2744262

14a. Use ets() on the following series: bicoal, chicken, dole, usdeaths, lynx, ibmclose, eggs. Does it always give good forecasts?

autoplot(forecast(ets(bicoal), h = 5))

autoplot(forecast(ets(chicken), h = 5))

autoplot(forecast(ets(dole), h = 5))

autoplot(forecast(ets(usdeaths), h = 5))

autoplot(forecast(ets(lynx), h = 5))

autoplot(forecast(ets(ibmclose), h = 5))

autoplot(forecast(ets(eggs), h = 5))

Answer: They do not always produce create good forecasts. b. Find an example where it does not work well. Can you figure out why? Answer: for the lynx data set, I think ets() has a hard time finding the trend of the data which makes it hard for ets() to find the correct smoothing parameters.

Show that the point forecasts from an ETS(M,A,M) model are the same as those obtained using Holt-Winters’ multiplicative method.

fc1 = ets(ausbeer, model = "MAM")
fc2 = hw(ausbeer, seasonal = "multiplicative", h = 100)
accuracy(fc1)

##                      ME     RMSE      MAE         MPE     MAPE      MASE
## Training set -0.3708737 15.56026 11.87092 -0.07125069 2.841161 0.7654043
##                   ACF1
## Training set -0.177729

accuracy(fc2)

##                      ME     RMSE      MAE         MPE     MAPE      MASE
## Training set -0.3297366 15.53971 11.86488 -0.06333093 2.841802 0.7650148
##                    ACF1
## Training set -0.2050978

Show that the forecast variance for an ETS(A,N,N) model is given by

fc = forecast(ets(ausbeer, model = "ANN"), h = 100)
summary(fc)

## 
## Forecast method: ETS(A,N,N)
## 
## Model Information:
## ETS(A,N,N) 
## 
## Call:
##  ets(y = ausbeer, model = "ANN") 
## 
##   Smoothing parameters:
##     alpha = 0.1488 
## 
##   Initial states:
##     l = 259.6135 
## 
##   sigma:  50.8804
## 
##      AIC     AICc      BIC 
## 2891.063 2891.175 2901.216 
## 
## Error measures:
##                    ME     RMSE      MAE       MPE     MAPE     MASE        ACF1
## Training set 4.984426 50.64645 40.86588 0.1806918 9.617963 2.634919 -0.04436312
## 
## Forecasts:
##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2010 Q3       421.3169 356.1111 486.5227 321.5932 521.0406
## 2010 Q4       421.3169 355.3930 487.2408 320.4950 522.1388
## 2011 Q1       421.3169 354.6827 487.9511 319.4086 523.2252
## 2011 Q2       421.3169 353.9798 488.6540 318.3337 524.3001
## 2011 Q3       421.3169 353.2842 489.3496 317.2699 525.3639
## 2011 Q4       421.3169 352.5957 490.0381 316.2169 526.4169
## 2012 Q1       421.3169 351.9140 490.7198 315.1743 527.4595
## 2012 Q2       421.3169 351.2389 491.3949 314.1419 528.4919
## 2012 Q3       421.3169 350.5703 492.0635 313.1193 529.5145
## 2012 Q4       421.3169 349.9079 492.7259 312.1063 530.5275
## 2013 Q1       421.3169 349.2516 493.3822 311.1026 531.5312
## 2013 Q2       421.3169 348.6013 494.0325 310.1079 532.5259
## 2013 Q3       421.3169 347.9567 494.6771 309.1221 533.5117
## 2013 Q4       421.3169 347.3177 495.3161 308.1448 534.4890
## 2014 Q1       421.3169 346.6842 495.9496 307.1760 535.4578
## 2014 Q2       421.3169 346.0560 496.5778 306.2153 536.4186
## 2014 Q3       421.3169 345.4330 497.2008 305.2625 537.3713
## 2014 Q4       421.3169 344.8151 497.8187 304.3175 538.3163
## 2015 Q1       421.3169 344.2021 498.4317 303.3800 539.2538
## 2015 Q2       421.3169 343.5940 499.0398 302.4500 540.1838
## 2015 Q3       421.3169 342.9906 499.6432 301.5272 541.1066
## 2015 Q4       421.3169 342.3918 500.2420 300.6114 542.0224
## 2016 Q1       421.3169 341.7975 500.8363 299.7025 542.9313
## 2016 Q2       421.3169 341.2077 501.4261 298.8004 543.8334
## 2016 Q3       421.3169 340.6221 502.0117 297.9049 544.7289
## 2016 Q4       421.3169 340.0408 502.5930 297.0158 545.6180
## 2017 Q1       421.3169 339.4636 503.1703 296.1330 546.5008
## 2017 Q2       421.3169 338.8904 503.7434 295.2564 547.3774
## 2017 Q3       421.3169 338.3212 504.3126 294.3859 548.2479
## 2017 Q4       421.3169 337.7558 504.8780 293.5213 549.1125
## 2018 Q1       421.3169 337.1943 505.4395 292.6625 549.9713
## 2018 Q2       421.3169 336.6365 505.9973 291.8094 550.8244
## 2018 Q3       421.3169 336.0823 506.5515 290.9619 551.6719
## 2018 Q4       421.3169 335.5317 507.1021 290.1198 552.5140
## 2019 Q1       421.3169 334.9847 507.6491 289.2832 553.3506
## 2019 Q2       421.3169 334.4410 508.1928 288.4518 554.1821
## 2019 Q3       421.3169 333.9008 508.7330 287.6255 555.0083
## 2019 Q4       421.3169 333.3639 509.2699 286.8044 555.8294
## 2020 Q1       421.3169 332.8302 509.8036 285.9882 556.6456
## 2020 Q2       421.3169 332.2997 510.3341 285.1769 557.4569
## 2020 Q3       421.3169 331.7724 510.8614 284.3704 558.2634
## 2020 Q4       421.3169 331.2482 511.3856 283.5687 559.0651
## 2021 Q1       421.3169 330.7270 511.9068 282.7716 559.8623
## 2021 Q2       421.3169 330.2087 512.4251 281.9790 560.6548
## 2021 Q3       421.3169 329.6934 512.9404 281.1909 561.4429
## 2021 Q4       421.3169 329.1810 513.4528 280.4073 562.2265
## 2022 Q1       421.3169 328.6715 513.9623 279.6279 563.0059
## 2022 Q2       421.3169 328.1647 514.4691 278.8529 563.7809
## 2022 Q3       421.3169 327.6606 514.9732 278.0820 564.5518
## 2022 Q4       421.3169 327.1593 515.4745 277.3152 565.3186
## 2023 Q1       421.3169 326.6606 515.9732 276.5526 566.0813
## 2023 Q2       421.3169 326.1645 516.4693 275.7939 566.8399
## 2023 Q3       421.3169 325.6710 516.9628 275.0391 567.5947
## 2023 Q4       421.3169 325.1800 517.4538 274.2882 568.3456
## 2024 Q1       421.3169 324.6915 517.9423 273.5412 569.0926
## 2024 Q2       421.3169 324.2055 518.4283 272.7979 569.8359
## 2024 Q3       421.3169 323.7219 518.9119 272.0583 570.5756
## 2024 Q4       421.3169 323.2407 519.3931 271.3223 571.3115
## 2025 Q1       421.3169 322.7618 519.8720 270.5899 572.0439
## 2025 Q2       421.3169 322.2853 520.3485 269.8611 572.7727
## 2025 Q3       421.3169 321.8110 520.8228 269.1358 573.4980
## 2025 Q4       421.3169 321.3390 521.2948 268.4139 574.2199
## 2026 Q1       421.3169 320.8692 521.7646 267.6954 574.9384
## 2026 Q2       421.3169 320.4016 522.2322 266.9802 575.6536
## 2026 Q3       421.3169 319.9361 522.6977 266.2684 576.3654
## 2026 Q4       421.3169 319.4728 523.1610 265.5598 577.0740
## 2027 Q1       421.3169 319.0116 523.6223 264.8544 577.7794
## 2027 Q2       421.3169 318.5524 524.0814 264.1521 578.4817
## 2027 Q3       421.3169 318.0953 524.5385 263.4530 579.1808
## 2027 Q4       421.3169 317.6402 524.9936 262.7570 579.8768
## 2028 Q1       421.3169 317.1870 525.4468 262.0640 580.5698
## 2028 Q2       421.3169 316.7359 525.8979 261.3741 581.2597
## 2028 Q3       421.3169 316.2867 526.3471 260.6870 581.9468
## 2028 Q4       421.3169 315.8394 526.7944 260.0030 582.6308
## 2029 Q1       421.3169 315.3940 527.2398 259.3218 583.3120
## 2029 Q2       421.3169 314.9504 527.6834 258.6434 583.9904
## 2029 Q3       421.3169 314.5087 528.1251 257.9679 584.6659
## 2029 Q4       421.3169 314.0688 528.5650 257.2951 585.3387
## 2030 Q1       421.3169 313.6307 529.0031 256.6251 586.0087
## 2030 Q2       421.3169 313.1944 529.4394 255.9578 586.6760
## 2030 Q3       421.3169 312.7599 529.8739 255.2932 587.3406
## 2030 Q4       421.3169 312.3270 530.3068 254.6313 588.0025
## 2031 Q1       421.3169 311.8959 530.7379 253.9720 588.6618
## 2031 Q2       421.3169 311.4665 531.1673 253.3152 589.3186
## 2031 Q3       421.3169 311.0387 531.5951 252.6610 589.9728
## 2031 Q4       421.3169 310.6126 532.0212 252.0094 590.6244
## 2032 Q1       421.3169 310.1882 532.4456 251.3602 591.2736
## 2032 Q2       421.3169 309.7653 532.8685 250.7135 591.9203
## 2032 Q3       421.3169 309.3441 533.2897 250.0692 592.5646
## 2032 Q4       421.3169 308.9244 533.7094 249.4274 593.2064
## 2033 Q1       421.3169 308.5063 534.1275 248.7880 593.8458
## 2033 Q2       421.3169 308.0897 534.5441 248.1509 594.4829
## 2033 Q3       421.3169 307.6747 534.9591 247.5161 595.1177
## 2033 Q4       421.3169 307.2612 535.3727 246.8837 595.7501
## 2034 Q1       421.3169 306.8491 535.7847 246.2535 596.3803
## 2034 Q2       421.3169 306.4386 536.1953 245.6256 597.0082
## 2034 Q3       421.3169 306.0295 536.6044 245.0000 597.6338
## 2034 Q4       421.3169 305.6218 537.0120 244.3765 598.2573
## 2035 Q1       421.3169 305.2156 537.4182 243.7553 598.8785
## 2035 Q2       421.3169 304.8108 537.8230 243.1362 599.4976

Chapter 8 1. Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers. a. Explain the differences among these figures. Do they all indicate that the data are white noise? Answer: There is white noise for all three series. Series x1 has less samples b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise? Answer: The critical values are using the calculation of +/-1.96/(N^2), thus as N increases, the critical value decreases. Since there the critical value decreases as N increases, we need higher autocorellation to reject the null hypothesis.

A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

ggtsdisplay(ibmclose)

Answer: Times series with trends or seasonality are not stationary. The ACF decreases slowly which shows non-stationary data. With the PACF, since the first bar is tall and the others are much smaller and vary positive and negative, this shows non-stationary data.

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

usnetelec

ggtsdisplay(usnetelec)

ggtsdisplay(diff(usnetelec, 1))

b. usgdp

ggtsdisplay(usgdp)

L = BoxCox.lambda(usgdp)
ggtsdisplay(usgdp %>% BoxCox(L) %>% diff(1) %>% diff(1))

C. mcopper

ggtsdisplay(mcopper)

L = BoxCox.lambda(mcopper)
ggtsdisplay(mcopper %>% BoxCox(L) %>% diff(1))

d. enplanements

ggtsdisplay(enplanements)

L = BoxCox.lambda(enplanements)
ggtsdisplay(enplanements %>% BoxCox(L) %>% diff(12) %>% diff(1))

e. visitors

ggtsdisplay(visitors)

L = BoxCox.lambda(visitors)
ggtsdisplay(visitors %>% BoxCox(L) %>% diff(12) %>% diff(1))

4. For the enplanements data, write down the differences you chose above using backshift operator notation. Answer: In the enplanements data, we are taking seasonal differencing. Thus, after transformation the backshift should be y(t) = (1-b^12)*y(t)

For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

ggtsdisplay(retail_ts)

L = BoxCox.lambda(retail_ts)
ggtsdisplay(retail_ts %>% BoxCox(L) %>% diff(1) %>% diff(12))

6. Use R to simulate and plot some data from simple ARIMA models. a. Use the following R code to generate data from an AR(1) model with phi = 0.6 and sigma^2 = 1. The process starts with y1 = 0.

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]

Produce a time plot for the series. How does the plot change as you change phi?

autoplot(y)

c. Write your own code to generate data from an MA(1) model with theta = 0.6 and sigma^2 = 1.

y = ts(numeric(100))
e = rnorm(100)
for(i in 2:100)
  y[i] = 0.6*e[i-1] + e[i]

Produce a time plot for the series. How does the plot change as you change theta?

autoplot(y)

e. Generate data from an ARMA(1,1) model with phi = 0.6, theta = 0.6, and sigma^2 = 1.

y = ts(numeric(100))
e = rnorm(100)
for(i in 2:100)
  y[i] = 0.6*y[i-1] + 0.6*e[i-1] + e[i]

autoplot(y)

f. Generate data from an AR(2) model with phi1 = -0.8, and phi2 = 0.3 and sigma^2 = 1.

y = ts(numeric(100))
e = rnorm(100)
for(i in 3:100)
  y[i] = -0.8*y[i-1] + 0.3*y[i-2] + e[i]

autoplot(y)

ARIMA is more random than the AR model. It is clear the AR model is non-stationary since it has seasonality and there is a pattern to the trend. The ARIMA generated data has no apparent trend and no seasonality.

Consider wmurders, the number of women murdered each year in the United States.

By studying appropriate graphs of the series in R, find an appropriate ARIMA(p, d, q) model for these data.

ggtsdisplay(wmurders)

d = wmurders %>% diff(1) %>% diff(1)
ggtsdisplay(d)

d %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.0458 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

Should you include a constant in the model? Explain. Since the polynomial order of this is greater than one, so there is no need to include a constant.
Write this model in terms of the backshift operator. (1-phi1*B)(1-B)^2y_t = c + (1+theta1B)epsilon_t
Fit the model using R and examine the residuals. Is the model satisfactory?

fc = Arima(wmurders, order = c(1,2,1))
checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,2,1)
## Q* = 12.419, df = 8, p-value = 0.1335
## 
## Model df: 2.   Total lags used: 10

Answer: The residuals check out.

Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.

forecast(fc, h = 3)

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005       2.470660 2.194836 2.746484 2.048824 2.892496
## 2006       2.363106 1.986351 2.739862 1.786908 2.939304
## 2007       2.252833 1.765391 2.740276 1.507354 2.998313

Create a plot of the series with forecasts and predictionintervals for the next three periods shown.

autoplot(forecast(fc, h = 3))

g. Does auto.arima() give the same model you have chosen? If not, which model do you think is better?

fc2 = auto.arima(wmurders, seasonal = FALSE, stepwise = FALSE, 
                 approximation = FALSE)
summary(fc2)

## Series: wmurders 
## ARIMA(0,2,3) 
## 
## Coefficients:
##           ma1     ma2      ma3
##       -1.0154  0.4324  -0.3217
## s.e.   0.1282  0.2278   0.1737
## 
## sigma^2 estimated as 0.04475:  log likelihood=7.77
## AIC=-7.54   AICc=-6.7   BIC=0.35
## 
## Training set error measures:
##                       ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.01336585 0.2016929 0.1531053 -0.3332051 4.387024 0.9415259
##                     ACF1
## Training set -0.03193856

forecast(fc2, h = 3)

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005       2.408817 2.137718 2.679916 1.994206 2.823428
## 2006       2.365555 1.985092 2.746018 1.783687 2.947423
## 2007       2.290976 1.753245 2.828706 1.468588 3.113363

ARIMA(0,2,3) has the better log-likelihood and AIC statistics than the ARIMA(1,2,1).

Consider austa, the total international visitors to Australia (in millions) for the period 1980-2015.

Use auto.arima() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

fc = auto.arima(austa, seasonal = FALSE)
checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1) with drift
## Q* = 2.297, df = 5, p-value = 0.8067
## 
## Model df: 2.   Total lags used: 7

forecast(fc, h = 10)

##      Point Forecast    Lo 80    Hi 80    Lo 95     Hi 95
## 2016       7.108647 6.873183 7.344111 6.748536  7.468758
## 2017       7.282129 6.895831 7.668428 6.691337  7.872922
## 2018       7.455612 6.962652 7.948571 6.701695  8.209529
## 2019       7.629094 7.048756 8.209432 6.741543  8.516644
## 2020       7.802576 7.146393 8.458758 6.799031  8.806120
## 2021       7.976058 7.251932 8.700184 6.868603  9.083513
## 2022       8.149540 7.363321 8.935760 6.947121  9.351960
## 2023       8.323023 7.479266 9.166779 7.032609  9.613436
## 2024       8.496505 7.598893 9.394117 7.123725  9.869284
## 2025       8.669987 7.721572 9.618402 7.219512 10.120462

Plot forecasts from an ARIMA(0,1,1) model with no drift and compare these to part a. Remove the MA term and plot again.

fc2 = Arima(austa, c(0,1,1))
checkresiduals(fc2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1)
## Q* = 3.8403, df = 6, p-value = 0.6983
## 
## Model df: 1.   Total lags used: 7

Answer: The plots look really similar but the statistics are a little worse for the ARIMA(0,1,1) with drift. c. Plot forecasts from an ARIMA(2,1,3) model with drift. Remove the constant and see what happens.

fc3 = Arima(wmurders, order = c(2,1,3), method = "ML")
autoplot(forecast(fc3, h = 10), include = 10)

fc4 = Arima(wmurders - mean(wmurders), order = c(2,1,3), method = "ML")
autoplot(forecast(fc4, h=10), include = 10)

d. Plot forecasts from an ARIMA(0,0,1) model with a constant. Remove the MA term and plot again.

fc5 = Arima(wmurders, order = c(0,0,1), method = "ML")
autoplot(forecast(fc5, h = 10), include = 10)

fc6 = Arima(wmurders - mean(wmurders), order = c(0,0,1), method = "ML")
autoplot(forecast(fc6, h = 10), include = 10)

e. Plot forecasts from an ARIMA(0,2,1) model with no constant.

fc7 = Arima(austa, order = c(0,2,1), method = "ML")
autoplot(forecast(fc7, h = 10), include = 10)

9. For the usgdp series: a. if necessary, find a suitable Box-Cox transformation for the data;

autoplot(usgdp)

L = BoxCox.lambda(usgdp)
BC = BoxCox(usgdp, lambda = L)
tsdisplay(BC)

b. fit a suitable ARIMA model to the transformed data using auto.arima();

fc = auto.arima(BC, lambda = L)
fc

## Series: BC 
## ARIMA(1,1,0) with drift 
## Box Cox transformation: lambda= 0.366352 
## 
## Coefficients:
##          ar1   drift
##       0.3213  0.0141
## s.e.  0.0617  0.0015
## 
## sigma^2 estimated as 0.0002318:  log likelihood=653.82
## AIC=-1301.63   AICc=-1301.53   BIC=-1291.24

try some other plausible models by experimenting with the orders chosen;

fc2 = Arima(BC, order = c(0,0,2), lambda = L)
fc3 = Arima(BC, order = c(0,1,2), lambda = L)
accuracy(fc)

##                        ME      RMSE       MAE         MPE      MAPE      MASE
## Training set 0.0004149997 0.1868365 0.1412417 0.001054993 0.2628386 0.1782806
##                     ACF1
## Training set -0.05059537

accuracy(fc2)

##                     ME     RMSE      MAE        MPE     MAPE     MASE      ACF1
## Training set 0.4233275 3.376962 2.876613 -0.5229817 5.131773 3.630969 0.9206221

accuracy(fc3)

##                     ME      RMSE       MAE       MPE      MAPE      MASE
## Training set 0.1029991 0.2159418 0.1693205 0.1817483 0.3085567 0.2137227
##                    ACF1
## Training set -0.1622126

choose what you think is the best model and check the residual diagnostics;

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,1,0) with drift
## Q* = 10.408, df = 6, p-value = 0.1085
## 
## Model df: 2.   Total lags used: 8

produce forecasts of your fitted model. Do the forecasts look reasonable?

forecast(fc, h = 5)

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2006 Q2       81.19904 80.88299 81.51587 80.71600 81.68390
## 2006 Q3       81.44771 80.92341 81.97415 80.64673 82.25371
## 2006 Q4       81.68334 80.99111 82.37931 80.62618 82.74925
## 2007 Q1       81.91506 81.08190 82.75362 80.64303 83.19972
## 2007 Q2       82.14579 81.19003 83.10864 80.68696 83.62122

autoplot(forecast(fc, h = 5))

f. compare the results with what you would obtain using ets().

forecast(ets(usgdp))

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2006 Q2       11508.06 11454.38 11561.75 11425.96 11590.16
## 2006 Q3       11612.53 11525.42 11699.64 11479.31 11745.76
## 2006 Q4       11717.00 11596.31 11837.69 11532.42 11901.58
## 2007 Q1       11821.47 11665.72 11977.22 11583.27 12059.67
## 2007 Q2       11925.94 11733.29 12118.59 11631.30 12220.57
## 2007 Q3       12030.41 11798.94 12261.87 11676.41 12384.40
## 2007 Q4       12134.87 11862.68 12407.07 11718.58 12551.17
## 2008 Q1       12239.34 11924.53 12554.15 11757.88 12720.80

autoplot(forecast(ets(usgdp)))

10. Consider austourists, the quarterly visitor nights (in millions) spent by international tourists to Australia for period 1999-2015.

ggtsdisplay(austourists)

a. Describe the time plot. Answer: There is an increase in the trend throughout time and seasonality. b. What can you learn from the ACF graph? Answer: Decrease in autocorrelation c. What can you learn from the PACF graph? Answer: There is a moving average term in the data. d. Produce plots of the seasonally differenced data (1−B4)Yt. What model do these graphs suggest?

ggtsdisplay(diff(austourists, lag = 4))

e. Does auto.arima() give the same model that you chose? If not, which model do you think is better?

auto.arima(austourists)

## Series: austourists 
## ARIMA(1,0,0)(1,1,0)[4] with drift 
## 
## Coefficients:
##          ar1     sar1   drift
##       0.4705  -0.5305  0.5489
## s.e.  0.1154   0.1122  0.0864
## 
## sigma^2 estimated as 5.15:  log likelihood=-142.48
## AIC=292.97   AICc=293.65   BIC=301.6

Consider usmelec, the total net generation of electricity (in billion kilowatt hours) by the U.S. electric industry (monthly for the period January 1973 – June 2013). In general there are two peaks per year: in mid-summer and mid-winter.

Examine the 12-month moving average of this series to see what kind of trend is involved.

m_avg = ma(usmelec, order = 12)
autoplot(m_avg)

b. Do the data need transforming? If so, find a suitable transformation.

L = BoxCox.lambda(usmelec)
BC = BoxCox(usmelec, lambda = L)
autoplot(BC)

c. Are the data stationary? If not, find an appropriate differencing which yields stationary data.

ndiffs(BC)

## [1] 1

Answer: No d. Identify a couple of ARIMA models that might be useful in describing the time series. Which of your models is the best according to their AIC values?

fc1 = Arima(usmelec, order = c(12,1,0), seasonal = c(1,0,0), lambda = L)
fc2 = Arima(usmelec, order = c(6,2,0), seasonal = c(1,2,0), lambda = L)
fc1

## Series: usmelec 
## ARIMA(12,1,0)(1,0,0)[12] 
## Box Cox transformation: lambda= -0.5738331 
## 
## Coefficients:
##           ar1      ar2      ar3      ar4      ar5      ar6      ar7      ar8
##       -0.4200  -0.3639  -0.2116  -0.2446  -0.1964  -0.0705  -0.0800  -0.0583
## s.e.   0.0435   0.0474   0.0504   0.0515   0.0521   0.0526   0.0524   0.0521
##          ar9    ar10    ar11     ar12    sar1
##       0.0129  0.0892  0.1022  -0.3012  0.9605
## s.e.  0.0516  0.0506  0.0478   0.0442  0.0108
## 
## sigma^2 estimated as 1.636e-06:  log likelihood=2536.94
## AIC=-5045.88   AICc=-5044.99   BIC=-4987.3

fc2

## Series: usmelec 
## ARIMA(6,2,0)(1,2,0)[12] 
## Box Cox transformation: lambda= -0.5738331 
## 
## Coefficients:
##           ar1      ar2      ar3      ar4      ar5      ar6     sar1
##       -1.1636  -1.2053  -0.9518  -0.7682  -0.5109  -0.2117  -0.6027
## s.e.   0.0464   0.0679   0.0810   0.0813   0.0680   0.0468   0.0376
## 
## sigma^2 estimated as 4.776e-06:  log likelihood=2166.55
## AIC=-4317.09   AICc=-4316.78   BIC=-4284.04

Answer: Between the above two the better model is ARIMA(12,1,0)(1,0,0)

Estimate the parameters of your best model and do diagnostic testing on the residuals. Do the residuals resemble white noise? If not, try to find another ARIMA model which fits better.

checkresiduals(fc1)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(12,1,0)(1,0,0)[12]
## Q* = 63.02, df = 11, p-value = 2.535e-09
## 
## Model df: 13.   Total lags used: 24

For the mcopper data:

if necessary, find a suitable Box-Cox transformation for the data;

autoplot(mcopper)

L = BoxCox.lambda(mcopper)
BC = BoxCox(mcopper, lambda = L)
autoplot(BC)

b. fit a suitable ARIMA model to the transformed data using auto.arima();

fc = auto.arima(BC, lambda = L)
fc

## Series: BC 
## ARIMA(0,1,1) 
## Box Cox transformation: lambda= 0.1919047 
## 
## Coefficients:
##          ma1
##       0.3754
## s.e.  0.0385
## 
## sigma^2 estimated as 0.0006993:  log likelihood=1246.8
## AIC=-2489.6   AICc=-2489.58   BIC=-2480.94

try some other plausible models by experimenting with the orders chosen;

fc2 = Arima(BC, lambda = L, order = c(1,0,0))
fc3 = Arima(BC, lambda = L, order = c(1,1,0))
fc2

## Series: BC 
## ARIMA(1,0,0) with non-zero mean 
## Box Cox transformation: lambda= 0.1919047 
## 
## Coefficients:
##          ar1    mean
##       0.9979  3.4059
## s.e.  0.0025  0.3453
## 
## sigma^2 estimated as 0.0007983:  log likelihood=1209.5
## AIC=-2413   AICc=-2412.96   BIC=-2400

fc3

## Series: BC 
## ARIMA(1,1,0) 
## Box Cox transformation: lambda= 0.1919047 
## 
## Coefficients:
##          ar1
##       0.3240
## s.e.  0.0399
## 
## sigma^2 estimated as 0.0007134:  log likelihood=1241.18
## AIC=-2478.37   AICc=-2478.35   BIC=-2469.7

choose what you think is the best model and check the residual diagnostics;

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1)
## Q* = 30.585, df = 23, p-value = 0.1333
## 
## Model df: 1.   Total lags used: 24

produce forecasts of your fitted model. Do the forecasts look reasonable?

forecast(fc, h=10)

##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 2007       19.58034 19.20828 19.95820 19.01365 20.16060
## Feb 2007       19.58034 18.95106 20.22639 18.62462 20.57530
## Mar 2007       19.58034 18.77403 20.41440 18.35819 20.86741
## Apr 2007       19.58034 18.63076 20.56864 18.14337 21.10791
## May 2007       19.58034 18.50744 20.70294 17.95903 21.31793
## Jun 2007       19.58034 18.39770 20.82366 17.79544 21.50720
## Jul 2007       19.58034 18.29797 20.93437 17.64714 21.68119
## Aug 2007       19.58034 18.20602 21.03730 17.51071 21.84330
## Sep 2007       19.58034 18.12034 21.13396 17.38386 21.99584
## Oct 2007       19.58034 18.03984 21.22544 17.26493 22.14045

compare the results with what you would obtain using ets()

forecast(ets(mcopper))

##          Point Forecast     Lo 80    Hi 80      Lo 95    Hi 95
## Jan 2007       3362.057 3089.9095 3634.205 2945.84316 3778.271
## Feb 2007       3303.048 2886.5408 3719.555 2666.05529 3940.040
## Mar 2007       3255.840 2712.0515 3799.629 2424.18712 4087.493
## Apr 2007       3218.074 2556.4686 3879.680 2206.23587 4229.912
## May 2007       3187.861 2415.5047 3960.217 2006.64386 4369.078
## Jun 2007       3163.691 2286.5145 4040.867 1822.16554 4505.216
## Jul 2007       3144.354 2167.5706 4121.138 1650.49263 4638.216
## Aug 2007       3128.885 2057.1539 4200.616 1489.81367 4767.956
## Sep 2007       3116.510 1954.0218 4278.998 1338.63788 4894.382
## Oct 2007       3106.609 1857.1395 4356.079 1195.71014 5017.509
## Nov 2007       3098.689 1765.6373 4431.741 1059.96220 5137.416
## Dec 2007       3092.353 1678.7802 4505.926  930.47992 5254.226
## Jan 2008       3087.284 1595.9457 4578.622  806.47889 5368.089
## Feb 2008       3083.229 1516.6053 4649.852  687.28494 5479.173
## Mar 2008       3079.985 1440.3094 4719.660  572.31785 5587.651
## Apr 2008       3077.389 1366.6751 4788.103  461.07766 5693.701
## May 2008       3075.313 1295.3751 4855.251  353.13295 5797.493
## Jun 2008       3073.652 1226.1299 4921.174  248.11080 5899.193
## Jul 2008       3072.323 1158.6994 4985.947  145.68821 5998.958
## Aug 2008       3071.260 1092.8773 5049.643   45.58475 6096.935
## Sep 2008       3070.410 1028.4857 5112.334  -52.44361 6193.263
## Oct 2008       3069.729  965.3704 5174.088 -148.60993 6288.068
## Nov 2008       3069.185  903.3978 5234.972 -243.10074 6381.471
## Dec 2008       3068.750  842.4512 5295.048 -336.07992 6473.579

Choose one of the following seasonal time series:hsales, auscafe, qauselec, qcement, qgas.

ggtsdisplay(qgas)

a. Do the data need transforming? If so, find a suitable transformation. Answer: No, nothing suggesting a transformation is needed. b. Are the data stationary? If not, find an appropriate differencing which yields stationary data. Answer: No, the data is not stationary.

L = BoxCox.lambda(qgas)
BC = BoxCox(qgas, lambda = L)
ndiffs(BC)

## [1] 2

Identify a couple of ARIMA models that might be useful in describing the time series. Which of your models is the best according to their AIC values?

auto.arima(BC)

## Series: BC 
## ARIMA(0,1,0)(0,1,1)[4] 
## 
## Coefficients:
##         sma1
##       -0.694
## s.e.   0.069
## 
## sigma^2 estimated as 0.00464:  log likelihood=269.17
## AIC=-534.35   AICc=-534.29   BIC=-527.62

Arima(BC, order = c(0,1,0))

## Series: BC 
## ARIMA(0,1,0) 
## 
## sigma^2 estimated as 0.07193:  log likelihood=-22.33
## AIC=46.67   AICc=46.69   BIC=50.05

Arima(BC, order = c(0,1,1))

## Series: BC 
## ARIMA(0,1,1) 
## 
## Coefficients:
##          ma1
##       0.7789
## s.e.  0.0335
## 
## sigma^2 estimated as 0.05212:  log likelihood=12.66
## AIC=-21.31   AICc=-21.26   BIC=-14.55

Answer:ARIMA(0,1,1) has the best AIC. d. Estimate the parameters of your best model and do diagnostic testing on the residuals.Do the residuals resemble white noise? If not, try to find another ARIMA model which fits better.

checkresiduals(Arima(BC, order = c(0,1,1)))

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1)
## Q* = 572.63, df = 7, p-value < 2.2e-16
## 
## Model df: 1.   Total lags used: 8

Answer: yes, there is white noise. e. Forecast the next 24 months of data using your preferred model.

forecast(Arima(BC, order = c(0,1,1)), h = 24)

##         Point Forecast    Lo 80    Hi 80    Lo 95     Hi 95
## 2010 Q3       7.285726 6.993151 7.578300 6.838272  7.733180
## 2010 Q4       7.285726 6.688666 7.882786 6.372602  8.198850
## 2011 Q1       7.285726 6.493665 8.077787 6.074373  8.497079
## 2011 Q2       7.285726 6.337970 8.233482 5.836259  8.735193
## 2011 Q3       7.285726 6.204467 8.366985 5.632083  8.939369
## 2011 Q4       7.285726 6.085725 8.485727 5.450483  9.120969
## 2012 Q1       7.285726 5.977719 8.593733 5.285302  9.286150
## 2012 Q2       7.285726 5.877975 8.693477 5.132757  9.438695
## 2012 Q3       7.285726 5.784845 8.786607 4.990327  9.581125
## 2012 Q4       7.285726 5.697166 8.874286 4.856233  9.715219
## 2013 Q1       7.285726 5.614079 8.957373 4.729163  9.842289
## 2013 Q2       7.285726 5.534931 9.036521 4.608116  9.963336
## 2013 Q3       7.285726 5.459209 9.112243 4.492309 10.079143
## 2013 Q4       7.285726 5.386504 9.184948 4.381116 10.190336
## 2014 Q1       7.285726 5.316481 9.254971 4.274026 10.297426
## 2014 Q2       7.285726 5.248864 9.322588 4.170615 10.400837
## 2014 Q3       7.285726 5.183421 9.388031 4.070528 10.500924
## 2014 Q4       7.285726 5.119954 9.451497 3.973464 10.597988
## 2015 Q1       7.285726 5.058295 9.513156 3.879165 10.692287
## 2015 Q2       7.285726 4.998298 9.573154 3.787407 10.784045
## 2015 Q3       7.285726 4.939834 9.631618 3.697994 10.873458
## 2015 Q4       7.285726 4.882793 9.688659 3.610757 10.960695
## 2016 Q1       7.285726 4.827074 9.744378 3.525543 11.045909
## 2016 Q2       7.285726 4.772591 9.798861 3.442217 11.129235

Compare the forecasts obtained using ets().

forecast(ets(BC))

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2010 Q3       7.364062 7.277022 7.451101 7.230946 7.497178
## 2010 Q4       7.028569 6.906037 7.151101 6.841172 7.215965
## 2011 Q1       6.904139 6.749318 7.058960 6.667361 7.140917
## 2011 Q2       7.241776 7.055811 7.427741 6.957367 7.526184
## 2011 Q3       7.391110 7.174165 7.608056 7.059321 7.722900
## 2011 Q4       7.055617 6.807792 7.303442 6.676602 7.434632
## 2012 Q1       6.931187 6.652173 7.210202 6.504472 7.357903
## 2012 Q2       7.268824 6.958176 7.579472 6.793729 7.743919

For the same time series you used in the previous exercise, try using a non-seasonal model applied to the seasonally adjusted data obtained from STL. The stlf() function will make the calculations easy (with method=“arima”). Compare the forecasts with those obtained in the previous exercise. Which do you think is the best approach?

fc = stlf(BC, method = "arima")
fc

##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2010 Q3       7.328233 7.258315 7.398151 7.221302 7.435163
## 2010 Q4       7.002417 6.901010 7.103823 6.847329 7.157505
## 2011 Q1       6.892495 6.758620 7.026370 6.687751 7.097239
## 2011 Q2       7.222952 7.049670 7.396233 6.957940 7.487963
## 2011 Q3       7.332189 7.123628 7.540749 7.013223 7.651154
## 2011 Q4       7.002113 6.758365 7.245862 6.629332 7.374895
## 2012 Q1       6.915498 6.636375 7.194621 6.488616 7.342380
## 2012 Q2       7.241571 6.928447 7.554696 6.762689 7.720454

Answer: best approach is the STL model.

For your retail time series (Exercise 5 above):

develop an appropriate seasonal ARIMA model;

fc = auto.arima(retail_ts, seasonal = TRUE)
fc

## Series: retail_ts 
## ARIMA(3,1,3)(0,1,2)[12] 
## 
## Coefficients:
##          ar1     ar2     ar3      ma1      ma2     ma3     sma1     sma2
##       0.0132  0.6722  0.0864  -0.5178  -0.7786  0.3196  -0.2643  -0.1219
## s.e.  0.1846  0.1503  0.1274   0.1827   0.1724  0.1709   0.0556   0.0484
## 
## sigma^2 estimated as 49.43:  log likelihood=-1237.44
## AIC=2492.88   AICc=2493.38   BIC=2528.05

forecast(fc)

##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 2014       272.3899 263.3794 281.4004 258.6095 286.1703
## Feb 2014       233.6990 223.6436 243.7545 218.3206 249.0775
## Mar 2014       271.7203 261.0909 282.3496 255.4641 287.9764
## Apr 2014       259.1572 247.7883 270.5261 241.7700 276.5445
## May 2014       253.6492 241.9002 265.3982 235.6807 271.6177
## Jun 2014       246.6133 234.4211 258.8055 227.9670 265.2596
## Jul 2014       254.2708 241.8059 266.7357 235.2073 273.3342
## Aug 2014       268.6146 255.8629 281.3663 249.1125 288.1166
## Sep 2014       269.5092 256.5572 282.4613 249.7008 289.3177
## Oct 2014       274.0854 260.9358 287.2350 253.9748 294.1960
## Nov 2014       284.2037 270.9033 297.5041 263.8625 304.5449
## Dec 2014       406.2039 392.7601 419.6476 385.6434 426.7643
## Jan 2015       282.9976 267.1438 298.8514 258.7513 307.2439
## Feb 2015       246.1238 229.4987 262.7490 220.6979 271.5498
## Mar 2015       281.4956 264.3656 298.6256 255.2975 307.6937
## Apr 2015       269.9174 252.2041 287.6307 242.8273 297.0076
## May 2015       264.4830 246.4015 282.5644 236.8298 292.1362
## Jun 2015       257.7574 239.2813 276.2334 229.5007 286.0141
## Jul 2015       265.0810 246.3225 283.8395 236.3924 293.7696
## Aug 2015       278.7800 259.7381 297.8218 249.6580 307.9020
## Sep 2015       280.6464 261.3829 299.9100 251.1853 310.1075
## Oct 2015       285.8418 266.3639 305.3197 256.0529 315.6307
## Nov 2015       297.1241 277.4677 316.7805 267.0622 327.1860
## Dec 2015       418.3295 398.5036 438.1554 388.0084 448.6506

compare the forecasts with those you obtained in earlier chapters; Answer: Similar forecast following the same trend

Consider sheep, the sheep population of England and Wales from 1867–1939.

Produce a time plot of the time series.

autoplot(sheep)

c. By examining the ACF and PACF of the differenced data, explain why this model is appropriate.

ggtsdisplay(sheep)

Answer: ACF shows autocorrelation and PACF shows spikes at Lags 1 & 3.

Without using the forecast function, calculate forecasts for the next three years (1940–1942).

sheep1 = 1797 + 0.42*(1797 - 1791) -0.20*(1791 - 1627) - 0.30*(1627 - 1665)
sheep2 = sheep1 + 0.42*(sheep1 - 1797) -0.20*(1797 - 1791) - 0.30*(1791 - 1627)
sheep3 = sheep2 + 0.42*(sheep2 - sheep1) -0.20*(sheep1 - 1797) - 0.30*(1797 - 1791)
sheep1

## [1] 1778.12

sheep2

## [1] 1719.79

sheep3

## [1] 1697.268

Now fit the model in R and obtain the forecasts using forecast. How are they different from yours? Why?

fc = forecast(Arima(sheep, order = c(3,1,0)))
autoplot(fc)

17. The annual bituminous coal production in the United States from 1920 to 1968 is in data set bicoal. a. Produce a time plot of the data.

autoplot(bicoal)

c. Explain why this model was chosen using the ACF and PACF

ggtsdisplay(bicoal)

Answer: ACF shows a decrease in autocorrelation and PACF only has 2 spikes at lags 1 & 3. d. Without using the forecast function, calculate forecasts for the next three years (1969–1971).

a = 162 + 0.83*545 + -0.34*552 + 0.55*534 + -0.38*512
b = 162 + 0.83*a + -0.34*545 + 0.55*552 + -0.38*534
c = 162 + 0.83*b + -0.34*a + 0.55*545 + -0.38*552
a

## [1] 525.81

## [1] 513.8023

## [1] 499.6705

Now fit the model in R and obtain the forecasts from the same model. How are they different from yours? Why?

fc = forecast(Arima(bicoal, order = c(4,0,0)))
autoplot(fc)

18. Before doing this exercise, you will need to install the Quandl package in R a. Select a time series from Quandl. Then copy its short URL and import the data

y = Quandl("EIA/AEO_2016_REF_NO_CPP_PRCE_NA_COMM_NA_NG_NA_SATL_Y13DLRPMCF_A",
           api_key="otkvu4KBWBAykqK83Zxi", type="ts")

Plot graphs of the data, and try to identify an appropriate ARIMA model.

autoplot(y)

fc = auto.arima(y, seasonal = F)
checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,2)
## Q* = 7.0725, df = 3, p-value = 0.06962
## 
## Model df: 2.   Total lags used: 5

Do residual diagnostic checking of your ARIMA model. Are the residuals white noise? Answer: There is some white noise present.
Use your chosen ARIMA model to forecast the next four years.

forecast(fc, h = 4)

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2041       11.75328 11.49143 12.01513 11.35281 12.15375
## 2042       11.78068 11.22669 12.33466 10.93342 12.62793
## 2043       11.78068 10.87335 12.68800 10.39303 13.16832
## 2044       11.78068 10.62327 12.93808 10.01057 13.55078

Now try to identify an appropriate ETS model.

fc2 = ets(y)

Do residual diagnostic checking of your ETS model. Are the residuals white noise?

checkresiduals(fc2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,N,N)
## Q* = 14.02, df = 3, p-value = 0.002878
## 
## Model df: 2.   Total lags used: 5

Use your chosen ETS model to forecast the next four years.

forecast(fc2, h = 4)

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2041        11.7849 11.40500 12.16480 11.20390 12.36590
## 2042        11.7849 11.24767 12.32213 10.96328 12.60652
## 2043        11.7849 11.12694 12.44286 10.77864 12.79116
## 2044        11.7849 11.02516 12.54464 10.62298 12.94682

Which of the two models do you prefer? Answer: Both models are really close in the forecasts, the ARIMA plot has more white noise, so I prefer the ARIMA work.

Homework2

Jake Stemmerich

11/22/2021