Chapter 7

  1. Consider the pigs series - the number of pigs slaughtered in Victoria each month. Use the ses function in R to find the optimal values of alpha and l0, and generate forecasts for the next four months. Compute a 95% prediction interval for the first forecast using y ± 1.96s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.
##  Time-Series [1:188] from 1980 to 1996: 76378 71947 33873 96428 105084 ...
##         Jan    Feb    Mar    Apr    May    Jun
## 1980  76378  71947  33873  96428 105084  95741
## Simple exponential smoothing 
## 
## Call:
##  ses(y = pigs, h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.2971 
## 
##   Initial states:
##     l = 77260.0561 
## 
##   sigma:  10308.58
## 
##      AIC     AICc      BIC 
## 4462.955 4463.086 4472.665
## The optimal alpha is .3, and the optimal l0 is 77260
## The upper and lower 95% confidence intervals are 119020.843765435 & 78611.9684577467
## Using the formula, the upper and lower 95% confidence intervals calculated are 118952.844969765 & 78679.9672534162

  1. Write your own function to implement simple exponential smoothing. The function should take arguments y (the time series), alpha (the smoothing parameter α) and level (the initial level ℓ0). It should return the forecast of the next observation in the series. Does it give the same forecast as ses()?
## Forecast of next observation by SES function: 98816.4061115907
## Forecast of next observation by ses function:  98816.4061115907
## Forecast of next observation by SES function: 421.313649201742
## Forecast of next observation by ses function:  421.313649201742
  1. Modify your function from the previous exercise to return the sum of squared errors rather than the forecast of the next observation. Then use the optim() function to find the optimal values of α and ℓ0. Do you get the same values as the ses() function?
SES <- function(pars = c(alpha, l0), y){
  
  # change the first argument as vector of alpha and l0, rather than separate alpha and l0 because optim function wants to take a function that requires vector as its first argument as fn argument.
  
  error <- 0
  
  SSE <- 0
  
  alpha <- pars[1]
  
  l0 <- pars[2]
  
  y_hat <- l0
  
  for(index in 1:length(y)){
    error <- y[index] - y_hat
    SSE <- SSE + error^2
    
    y_hat <- alpha*y[index] + (1 - alpha)*y_hat 
  }
  
  return(SSE)
}

# compare ses and SES using pigs data
# set initial values as alpha = 0.5 and l0 = first observation value of data.

opt_SES_pigs <- optim(par = c(0.5, pigs[1]), y = pigs, fn = SES)

writeLines(paste(
  "Optimal parameters for the result of SES function: ",
  "\n",
  as.character(opt_SES_pigs$par[1]),
  ", ",
  as.character(opt_SES_pigs$par[2]),
  sep = ""
  ))
## Optimal parameters for the result of SES function: 
## 0.299008094014243, 76379.2653476235
writeLines(paste(
  "Parameters got from the result of ses function: ",
  "\n",
  as.character(ses_pigs$model$par[1]),
  ", ",
  as.character(ses_pigs$model$par[2]),
  sep = ""
))
## Parameters got from the result of ses function: 
## 0.297148833372095, 77260.0561458528
# In this case, alphas were almost same, but l0s were different.

# compare ses and SES using ausbeer data

# set initial values as alpha = 0.5 and l0 = first observation value of data.

opt_SES_ausbeer <- optim(par = c(0.5, ausbeer[1]), y = ausbeer, fn = SES)

writeLines(paste(
  "Optimal parameters for the result of SES function: ",
  "\n",
  as.character(opt_SES_ausbeer$par[1]),
  ", ",
  as.character(opt_SES_ausbeer$par[2]),
  sep = ""
  ))
## Optimal parameters for the result of SES function: 
## 0.148803623284766, 259.658459712619
writeLines(paste(
  "Parameters got from the result of ses function: ",
  "\n",
  as.character(ses_ausbeer$model$par[1]),
  ", ",
  as.character(ses_ausbeer$model$par[2]),
  sep = ""
))
## Parameters got from the result of ses function: 
## 0.148900381680056, 259.65918938777
# In this case, alphas were almost same regardless of the function used. And got the same result for l0s.
    1. Combine your previous two functions to produce a function which both finds the optimal values of alpha and
      l0, and produces a forecast of the next observation in the series.
## $Next_observation_forecast
## [1] 98814.63
## 
## $alpha
## [1] 0.2990081
## 
## $l0
## [1] 76379.27
## [1] "Next observation forecast by ses function"
## [1] 98816.41
## [1] "alpha calculated by ses function"
##     alpha 
## 0.2971488
## [1] "l0 calculated by ses function"
##        l 
## 77260.06
## $Next_observation_forecast
## [1] 421.3166
## 
## $alpha
## [1] 0.1488036
## 
## $l0
## [1] 259.6585
## [1] "Next observation forecast by ses function"
## [1] 421.3136
## [1] "alpha calculated by ses function"
##     alpha 
## 0.1489004
## [1] "l0 calculated by ses function"
##        l 
## 259.6592
  1. Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.
##  Time-Series [1:30, 1:2] from 1 to 30: 199 172 111 209 161 119 195 195 131 183 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:2] "Paperback" "Hardcover"
## Time Series:
## Start = 1 
## End = 6 
## Frequency = 1 
##   Paperback Hardcover
## 1       199       139
## 2       172       128
## 3       111       172
## 4       209       139
## 5       161       191
## 6       119       168

## [1] 33.63769
## [1] 31.93101
## [1] 31.13692
## [1] 27.19358
## 95% PI of paperback sales calculated by holt function
##      95% 
## 275.0205
##     95% 
## 143.913
## 95% PI of paperback sales calculated by formula
## [1] 270.4951
## [1] 148.4384
## 95% PI of hardcover sales calculated by holt function
##      95% 
## 307.4256
##      95% 
## 192.9222
## 95% PI of hardcover sales calculated by formula
## [1] 303.4733
## [1] 196.8745
  1. For this exercise use data set eggs, the price of a dozen eggs in the United States from 1900-1993. Experiment with the various options in the holt() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.

[Hint: use h=100 when calling holt() so you can clearly see the differences between the various options when plotting the forecasts.]

Which model gives the best RMSE?

##  Time-Series [1:94] from 1900 to 1993: 277 315 315 321 315 ...
## Time Series:
## Start = 1900 
## End = 1905 
## Frequency = 1 
## [1] 276.79 315.42 314.87 321.25 314.54 317.92

## RMSE when using holt function
## [1] 26.58219
## RMSE when using holt function with damped option
## [1] 26.54019
## RMSE when using holt function with Box-Cox transformation
## [1] 1.032217
## RMSE when using holt function with damped option and Box-Cox transformation
## [1] 1.039187
  1. Recall your retail time series data (from Exercise 3 in Section 2.10).

## [1] 14.72762
## [1] 14.94306

## 
##  Ljung-Box test
## 
## data:  Residuals from Damped Holt-Winters' multiplicative method
## Q* = 42.932, df = 7, p-value = 3.437e-07
## 
## Model df: 17.   Total lags used: 24

##                      ME       RMSE      MAE        MPE      MAPE      MASE
## Training set  0.4556121   8.681456  6.24903  0.2040939  3.151257 0.3916228
## Test set     94.7346169 111.911266 94.73462 24.2839784 24.283978 5.9369594
##                     ACF1 Theil's U
## Training set -0.01331859        NA
## Test set      0.60960299   1.90013

##                       ME      RMSE       MAE          MPE      MAPE      MASE
## Training set  0.03021223  9.107356  6.553533  0.001995484  3.293399 0.4107058
## Test set     78.34068365 94.806617 78.340684 19.945024968 19.945025 4.9095618
##                    ACF1 Theil's U
## Training set 0.02752875        NA
## Test set     0.52802701  1.613903
  1. For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

##                      ME      RMSE       MAE        MPE      MAPE      MASE
## Training set -0.6782982  8.583559  5.918078 -0.3254076  2.913104 0.3708823
## Test set     82.1015276 98.384220 82.101528 21.0189982 21.018998 5.1452516
##                    ACF1 Theil's U
## Training set 0.02704667        NA
## Test set     0.52161725  1.679783

##                      ME     RMSE       MAE       MPE      MAPE      MASE
## Training set -0.5020795 10.00826  6.851597 -0.391432  3.489759 0.4293853
## Test set     74.2529959 91.04491 74.252996 18.837766 18.837766 4.6533890
##                    ACF1 Theil's U
## Training set 0.09741266        NA
## Test set     0.48917501  1.549271
  1. For this exercise use data set ukcars, the quarterly UK passenger vehicle production data from 1977Q1-2005Q1.
##  Time-Series [1:113] from 1977 to 2005: 330 371 271 344 358 ...
##         Qtr1    Qtr2    Qtr3    Qtr4
## 1977 330.371 371.051 270.670 343.880
## 1978 358.491 362.822

## ETS(A,N,A) 
## 
## Call:
##  ets(y = ukcars) 
## 
##   Smoothing parameters:
##     alpha = 0.6199 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 314.2568 
##     s = -1.7579 -44.9601 21.1956 25.5223
## 
##   sigma:  25.9302
## 
##      AIC     AICc      BIC 
## 1277.752 1278.819 1296.844 
## 
## Training set error measures:
##                    ME     RMSE      MAE        MPE     MAPE      MASE
## Training set 1.313884 25.23244 20.17907 -0.1570979 6.629003 0.6576259
##                    ACF1
## Training set 0.02573334

## [1] "Accuracy of STL + ETS(A, Ad, N) model"
##                    ME     RMSE      MAE        MPE     MAPE     MASE       ACF1
## Training set 1.551267 23.32113 18.48987 0.04121971 6.042764 0.602576 0.02262668
## [1] "Accuracy of STL + ETS(A, A, N) model"
##                      ME   RMSE     MAE        MPE    MAPE      MASE       ACF1
## Training set -0.3412727 23.295 18.1605 -0.5970778 5.98018 0.5918418 0.02103582
## [1] "Accuracy of ETS(A, N, A) model"
##                    ME     RMSE      MAE        MPE     MAPE      MASE
## Training set 1.313884 25.23244 20.17907 -0.1570979 6.629003 0.6576259
##                    ACF1
## Training set 0.02573334

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  ETS(A,Ad,N)
## Q* = 24.138, df = 3, p-value = 2.337e-05
## 
## Model df: 5.   Total lags used: 8
  1. For this exercise use data set visitors, the monthly Australian short-term overseas visitors data, May 1985-April 2005.
##  Time-Series [1:240] from 1985 to 2005: 75.7 75.4 83.1 82.9 77.3 ...
##        May   Jun   Jul   Aug   Sep   Oct
## 1985  75.7  75.4  83.1  82.9  77.3 105.7

##                      ME     RMSE      MAE        MPE      MAPE      MASE
## Training set -0.9749466 14.06539 10.35763 -0.5792169  4.223204 0.3970304
## Test set     72.9189889 83.23541 75.89673 15.9157249 17.041927 2.9092868
##                   ACF1 Theil's U
## Training set 0.1356528        NA
## Test set     0.6901318  1.151065
##                      ME     RMSE      MAE        MPE      MAPE     MASE
## Training set  0.7640074 14.53480 10.57657  0.1048224  3.994788 0.405423
## Test set     72.1992664 80.23124 74.55285 15.9202832 16.822384 2.857773
##                     ACF1 Theil's U
## Training set -0.05311217        NA
## Test set      0.58716982  1.127269
##                     ME     RMSE      MAE        MPE      MAPE      MASE
## Training set  1.001363 14.97096 10.82396  0.1609336  3.974215 0.4149057
## Test set     69.458843 78.61032 72.41589 15.1662261 16.273089 2.7758586
##                     ACF1 Theil's U
## Training set -0.02535299        NA
## Test set      0.67684148  1.086953
##                    ME     RMSE      MAE      MPE      MAPE     MASE      ACF1
## Training set 17.29363 31.15613 26.08775 7.192445 10.285961 1.000000 0.6327669
## Test set     32.87083 50.30097 42.24583 6.640781  9.962647 1.619375 0.5725430
##              Theil's U
## Training set        NA
## Test set     0.6594016
##                      ME     RMSE       MAE         MPE     MAPE      MASE
## Training set  0.5803348 13.36431  9.551391  0.08767744  3.51950 0.3661256
## Test set     76.3637263 84.24658 78.028992 16.87750474 17.51578 2.9910209
##                     ACF1 Theil's U
## Training set -0.05924203        NA
## Test set      0.64749552  1.178154
## [1] 32.78675
## [1] 18.86439
## [1] 17.49642
## [1] 18.52985
## [1] 19.62107
  1. The fets function below returns ETS forecasts.

fets <- function(y, h) { forecast(ets(y), h = h) }

  1. Apply tsCV() for a forecast horizon of h=4, for both ETS and seasonal naive methods to the cement data, XXX. (Hint: use the newly created fets and the existing snaive functions as your forecast function arguments.)
  2. Compute the MSE of the resulting 4-steps-ahead errors. (Hint: make sure you remove missing values.) Why is there missing values? Comment on which forecasts are more accurate. Is this what you expected?

I can get MSE(Mean Squared Errors) by mean(tsCV(data, function, h = 4)^2, na.rm = TRUE).

  1. Compare ets, snaive and stlf on the following six time series. For stlf, you might need to use a Box-Cox transformation. Use a test set of three years to decide what gives the best forecasts. ausbeer, bricksq, dole, a10, h02, usmelec.
##  Time-Series [1:218] from 1956 to 2010: 284 213 227 308 262 228 236 320 272 233 ...
##      Qtr1 Qtr2 Qtr3 Qtr4
## 1956  284  213  227  308
## 1957  262  228

##                      ME      RMSE       MAE          MPE     MAPE      MASE
## Training set -0.3466095 15.781973 11.979955 -0.064073435 2.864311 0.7567076
## Test set      0.1272571  9.620828  8.919347  0.009981598 2.132836 0.5633859
##                    ACF1 Theil's U
## Training set -0.1942276        NA
## Test set      0.3763918 0.1783972
##                     ME     RMSE      MAE        MPE     MAPE      MASE
## Training set  3.336634 19.67005 15.83168  0.9044009 3.771370 1.0000000
## Test set     -2.916667 10.80509  9.75000 -0.6505927 2.338012 0.6158537
##                      ACF1 Theil's U
## Training set 0.0009690785        NA
## Test set     0.3254581810 0.1981463
##                      ME     RMSE      MAE        MPE     MAPE      MASE
## Training set  0.5704833 13.61609 9.816091  0.1279204 2.334194 0.6200283
## Test set     -4.0433841 10.11436 8.429747 -0.9670308 2.051915 0.5324605
##                    ACF1 Theil's U
## Training set -0.1131945        NA
## Test set      0.2865992 0.1689574

## $acc_ets
##                     ME     RMSE      MAE       MPE     MAPE      MASE      ACF1
## Training set  1.228708 21.96146 15.81586 0.2606171 3.912882 0.4299638 0.2038074
## Test set     37.682916 42.36696 37.68292 8.4157549 8.415755 1.0244329 0.6190546
##              Theil's U
## Training set        NA
## Test set      1.116608
## 
## $acc_snaive
##                     ME     RMSE      MAE      MPE     MAPE      MASE
## Training set  6.438849 50.25482 36.78417 1.430237 8.999949 1.0000000
## Test set     15.333333 37.15284 33.66667 3.231487 7.549326 0.9152487
##                     ACF1 Theil's U
## Training set  0.81169827        NA
## Test set     -0.09314867 0.9538702
## 
## $acc_stlf
##                     ME     RMSE      MAE        MPE      MAPE      MASE
## Training set  1.318736 21.49082 14.41766  0.2855898  3.501287 0.3919528
## Test set     44.266931 50.20039 45.89301 10.0933739 10.487099 1.2476294
##                   ACF1 Theil's U
## Training set 0.1502759        NA
## Test set     0.1053316   1.31732
## 
## $acc_stlf_with_BoxCox
##                    ME     RMSE      MAE       MPE     MAPE      MASE      ACF1
## Training set  1.42724 21.40466 14.30375 0.3185472 3.500221 0.3888560 0.1625098
## Test set     34.69481 40.13618 36.34649 7.7322091 8.132133 0.9881014 0.4746242
##              Theil's U
## Training set        NA
## Test set       1.04561

## $acc_ets
##                        ME      RMSE        MAE        MPE     MAPE      MASE
## Training set     98.00253  16130.58   9427.497  0.5518769  6.20867 0.2995409
## Test set     221647.42595 279668.65 221647.426 32.5447637 32.54476 7.0424271
##                   ACF1 Theil's U
## Training set 0.5139536        NA
## Test set     0.9394267  11.29943
## 
## $acc_snaive
##                     ME      RMSE       MAE       MPE     MAPE     MASE
## Training set  12606.58  56384.06  31473.16  3.350334 27.73651 1.000000
## Test set     160370.33 240830.92 186201.00 20.496197 27.38678 5.916184
##                   ACF1 Theil's U
## Training set 0.9781058        NA
## Test set     0.9208325  9.479785
## 
## $acc_stlf
##                       ME       RMSE       MAE        MPE      MAPE       MASE
## Training set    -96.4811   7801.958   4530.06 -0.2719573  5.116149  0.1439341
## Test set     328005.9874 407547.190 328005.99 48.6435815 48.643581 10.4217690
##                    ACF1 Theil's U
## Training set 0.08037493        NA
## Test set     0.93373958  16.57033
## 
## $acc_stlf_with_BoxCox
##                       ME       RMSE        MAE        MPE     MAPE      MASE
## Training set    145.1468   6688.083   3659.404  0.1464869  3.82385 0.1162706
## Test set     205385.6547 268135.992 207317.238 29.3540384 29.85051 6.5871126
##                     ACF1 Theil's U
## Training set -0.09446256        NA
## Test set      0.93778748  10.68587

## $acc_ets
##                      ME     RMSE       MAE       MPE     MAPE      MASE
## Training set 0.04378049 0.488989 0.3542285 0.2011915 3.993267 0.3611299
## Test set     1.62559331 2.562046 2.0101563 7.0011207 9.410027 2.0493197
##                     ACF1 Theil's U
## Training set -0.05238173        NA
## Test set      0.23062972 0.6929693
## 
## $acc_snaive
##                     ME     RMSE       MAE      MPE     MAPE     MASE      ACF1
## Training set 0.9577207 1.177528 0.9808895 10.86283 11.15767 1.000000 0.3779746
## Test set     4.3181585 5.180738 4.3306974 19.80542 19.89352 4.415071 0.6384822
##              Theil's U
## Training set        NA
## Test set      1.383765
## 
## $acc_stlf
##                      ME      RMSE       MAE       MPE      MAPE      MASE
## Training set 0.06704027 0.6809938 0.4297557 0.3200743  4.732989 0.4381286
## Test set     1.83324369 3.1118106 2.4570607 7.0963340 11.223584 2.5049311
##                     ACF1 Theil's U
## Training set -0.01970509        NA
## Test set      0.23918322 0.8089697
## 
## $acc_stlf_with_BoxCox
##                      ME      RMSE       MAE         MPE     MAPE      MASE
## Training set 0.01686093 0.4244591 0.3098511 0.003515845 3.594686 0.3158879
## Test set     1.25869582 2.2242431 1.8428354 5.032385108 8.701050 1.8787390
##                    ACF1 Theil's U
## Training set -0.1780966        NA
## Test set      0.1502463 0.5964321

## $acc_ets
##                       ME       RMSE        MAE          MPE     MAPE      MASE
## Training set 0.001532209 0.04653434 0.03463451 0.0008075938 4.575471 0.5850192
## Test set     0.023667588 0.07667744 0.06442193 1.7152319315 7.030603 1.0881653
##                     ACF1 Theil's U
## Training set  0.07982687        NA
## Test set     -0.12074883  0.450176
## 
## $acc_snaive
##                       ME       RMSE        MAE       MPE     MAPE    MASE
## Training set  0.03880677 0.07122149 0.05920234  5.220203 8.156509 1.00000
## Test set     -0.01479486 0.08551752 0.07161055 -1.308298 7.884556 1.20959
##                    ACF1 Theil's U
## Training set 0.40465289        NA
## Test set     0.02264221 0.5009677
## 
## $acc_stlf
##                       ME       RMSE        MAE        MPE     MAPE      MASE
## Training set 0.001987465 0.04609626 0.03396307 -0.1760405 4.751272 0.5736778
## Test set     0.046520958 0.09174622 0.07544541  3.5302072 8.001263 1.2743653
##                    ACF1 Theil's U
## Training set 0.01953927        NA
## Test set     0.05661406 0.5085785
## 
## $acc_stlf_with_BoxCox
##                       ME       RMSE        MAE        MPE     MAPE      MASE
## Training set 0.003628017 0.04230875 0.03068561 0.02682146 4.159581 0.5183175
## Test set     0.031706735 0.07574975 0.06381807 2.51742657 6.903027 1.0779653
##                     ACF1 Theil's U
## Training set -0.09600393        NA
## Test set     -0.25297574 0.4385025

## $acc_ets
##                        ME      RMSE       MAE        MPE     MAPE      MASE
## Training set  -0.07834851  7.447167  5.601963 -0.1168709 2.163495 0.6225637
## Test set     -10.20080652 15.291359 13.123441 -3.1908161 3.926338 1.4584491
##                   ACF1 Theil's U
## Training set 0.2719249        NA
## Test set     0.4679496 0.4859495
## 
## $acc_snaive
##                    ME     RMSE       MAE      MPE     MAPE     MASE      ACF1
## Training set 4.921564 11.58553  8.998217 2.000667 3.511648 1.000000 0.4841860
## Test set     5.475972 17.20037 13.494750 1.391437 3.767514 1.499714 0.2692544
##              Theil's U
## Training set        NA
## Test set     0.4765145
## 
## $acc_stlf
##                        ME      RMSE       MAE         MPE     MAPE      MASE
## Training set  -0.07064178  6.659696  4.907696 -0.07729202 1.910234 0.5454076
## Test set     -11.41131201 17.776488 15.918880 -3.66455560 4.779167 1.7691150
##                   ACF1 Theil's U
## Training set 0.1126383        NA
## Test set     0.5099758 0.5639709
## 
## $acc_stlf_with_BoxCox
##                       ME      RMSE       MAE         MPE     MAPE      MASE
## Training set  -0.1351692  6.474779  4.761143 -0.05026415 1.848092 0.5291207
## Test set     -19.9969503 24.406877 20.951753 -6.06977735 6.315723 2.3284338
##                    ACF1 Theil's U
## Training set 0.08255205        NA
## Test set     0.60877348 0.7777035

Chapter 8

  1. A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

  1. For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

## 
##  Box-Ljung test
## 
## data:  diff(usnetelec)
## X-squared = 0.8508, df = 1, p-value = 0.3563
## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(usnetelec)
## KPSS Level = 0.15848, Truncation lag parameter = 3, p-value = 0.1

## 
##  Box-Ljung test
## 
## data:  diff(usgdp)
## X-squared = 39.187, df = 1, p-value = 3.85e-10

## [1] 2

## 
##  Box-Ljung test
## 
## data:  diff(diff(usgdp))
## X-squared = 53.294, df = 1, p-value = 2.872e-13

## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(diff(usnetelec))
## KPSS Level = 0.098532, Truncation lag parameter = 3, p-value = 0.1

## 
##  Box-Ljung test
## 
## data:  diff(BoxCox(mcopper, lambda_mcopper))
## X-squared = 57.517, df = 1, p-value = 3.353e-14

## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(BoxCox(mcopper, lambda_mcopper))
## KPSS Level = 0.057275, Truncation lag parameter = 6, p-value = 0.1

## [1] 1
## [1] 1

## 
##  Box-Ljung test
## 
## data:  diff(diff(BoxCox(enplanements, lambda_enplanements), lag = 12))
## X-squared = 29.562, df = 1, p-value = 5.417e-08

## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(diff(BoxCox(enplanements, lambda_enplanements), lag = 12))
## KPSS Level = 0.042424, Truncation lag parameter = 5, p-value = 0.1

## [1] 1
## [1] 1

## 
##  Box-Ljung test
## 
## data:  diff(diff(BoxCox(visitors, lambda_visitors), lag = 12))
## X-squared = 21.804, df = 1, p-value = 3.02e-06

## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(diff(BoxCox(visitors, lambda_visitors), lag = 12))
## KPSS Level = 0.015833, Truncation lag parameter = 4, p-value = 0.1
  1. For the enplanements data, write down the differences you chose above using backshift operator notation.

the data needed 1 first difference, 1 seasonal difference after Box-Cox transformation. The model of the data can be written as ARIMA(0, 1, 0)(0, 1, 0)12 with Box-Cox transformation(lambda = -0.227).

The model expression using backshift operator notation B:

first equation : wt = (yt^(-0.227) - 1)/(-0.227)

second equation : (1 - B)(1 - B^12)wt = et, where et is a white noise series.

  1. For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

## [1] 1
## [1] 1
## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(diff(BoxCox(retail.ts, BoxCox.lambda(retail.ts)), lag = 12))
## KPSS Level = 0.013817, Truncation lag parameter = 5, p-value = 0.1
  1. Use R to simulate and plot some data from simple ARIMA models.

  1. Consider the number of women murdered each year (per 100,000 standard population) in the United States. (Data set wmurders).

## [1] 2

## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(wmurders, differences = 2)
## KPSS Level = 0.045793, Truncation lag parameter = 3, p-value = 0.1

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,2,2)
## Q* = 11.764, df = 8, p-value = 0.1621
## 
## Model df: 2.   Total lags used: 10
## Time Series:
## Start = 2005 
## End = 2007 
## Frequency = 1 
## [1] 2.480525 2.374890 2.269256
## Series: wmurders 
## ARIMA(0,2,2) 
## 
## Coefficients:
##           ma1     ma2
##       -1.0181  0.1470
## s.e.   0.1220  0.1156
## 
## sigma^2 estimated as 0.04702:  log likelihood=6.03
## AIC=-6.06   AICc=-5.57   BIC=-0.15
## [1] 2.480523 2.374887 2.269252

##                      ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.0113461 0.2088162 0.1525773 -0.2403396 4.331729 0.9382785
##                     ACF1
## Training set -0.05094066
##                       ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.01065956 0.2072523 0.1528734 -0.2149476 4.335214 0.9400996
##                    ACF1
## Training set 0.02176343
##                       ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.01336585 0.2016929 0.1531053 -0.3332051 4.387024 0.9415259
##                     ACF1
## Training set -0.03193856

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,2,2)
## Q* = 11.764, df = 8, p-value = 0.1621
## 
## Model df: 2.   Total lags used: 10

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,2,3)
## Q* = 10.706, df = 7, p-value = 0.152
## 
## Model df: 3.   Total lags used: 10
  1. Consider the total international visitors to Australia (in millions) for the period 1980-2015. (Data set austa.)

## Series: austa 
## ARIMA(0,1,1) with drift 
## 
## Coefficients:
##          ma1   drift
##       0.3006  0.1735
## s.e.  0.1647  0.0390
## 
## sigma^2 estimated as 0.03376:  log likelihood=10.62
## AIC=-15.24   AICc=-14.46   BIC=-10.57

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1) with drift
## Q* = 2.297, df = 5, p-value = 0.8067
## 
## Model df: 2.   Total lags used: 7

## Time Series:
## Start = 2016 
## End = 2025 
## Frequency = 1 
##      fc_austa_arima.0.1.1$upper.80% fc_austa_arima.0.1.1$upper.95%
## 2016                      0.1138117                     0.09101057
## 2017                      0.2039418                     0.22885273
## 2018                      0.2527212                     0.30345426
## 2019                      0.2886706                     0.35843413
## 2020                      0.3180942                     0.40343373
## 2021                      0.3434784                     0.44225553
## 2022                      0.3660754                     0.47681466
## 2023                      0.3866116                     0.50822207
## 2024                      0.4055495                     0.53718500
## 2025                      0.4232034                     0.56418442
## Time Series:
## Start = 2016 
## End = 2025 
## Frequency = 1 
##      fc_austa_arima.0.1.0$lower.80% fc_austa_arima.0.1.0$lower.95%
## 2016                   -0.199956384                    -0.22275751
## 2017                   -0.109826244                    -0.08491535
## 2018                   -0.061046922                    -0.01031382
## 2019                   -0.025097519                     0.04466605
## 2020                    0.004326137                     0.08966565
## 2021                    0.029710347                     0.12848745
## 2022                    0.052307350                     0.16304658
## 2023                    0.072843552                     0.19445399
## 2024                    0.091781392                     0.22341692
## 2025                    0.109435361                     0.25041634

  1. For the usgdp series:

## Series: usgdp 
## ARIMA(2,1,0) with drift 
## Box Cox transformation: lambda= 0.366352 
## 
## Coefficients:
##          ar1     ar2   drift
##       0.2795  0.1208  0.1829
## s.e.  0.0647  0.0648  0.0202
## 
## sigma^2 estimated as 0.03518:  log likelihood=61.56
## AIC=-115.11   AICc=-114.94   BIC=-101.26
## [1] 1

## Series: usgdp 
## ARIMA(1,1,0) 
## Box Cox transformation: lambda= 0.366352 
## 
## Coefficients:
##          ar1
##       0.6326
## s.e.  0.0504
## 
## sigma^2 estimated as 0.04384:  log likelihood=34.39
## AIC=-64.78   AICc=-64.73   BIC=-57.85

## Series: usgdp 
## ARIMA(1,1,0) with drift 
## Box Cox transformation: lambda= 0.366352 
## 
## Coefficients:
##          ar1   drift
##       0.3180  0.1831
## s.e.  0.0619  0.0179
## 
## sigma^2 estimated as 0.03555:  log likelihood=59.83
## AIC=-113.66   AICc=-113.56   BIC=-103.27

##                    ME    RMSE      MAE         MPE      MAPE      MASE
## Training set 1.195275 39.2224 29.29521 -0.01363259 0.6863491 0.1655687
##                     ACF1
## Training set -0.03824844
##                    ME     RMSE      MAE       MPE      MAPE     MASE       ACF1
## Training set 15.45449 45.49569 35.08393 0.3101283 0.7815664 0.198285 -0.3381619
##                    ME     RMSE     MAE         MPE      MAPE      MASE
## Training set 1.315796 39.90012 29.5802 -0.01678591 0.6834509 0.1671794
##                     ACF1
## Training set -0.08544569

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,1,0) with drift
## Q* = 6.5772, df = 5, p-value = 0.254
## 
## Model df: 3.   Total lags used: 8

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,1,0) with drift
## Q* = 10.274, df = 6, p-value = 0.1136
## 
## Model df: 2.   Total lags used: 8

  1. Consider austourists, the quarterly number of international tourists to Australia for the period 1999-2010. (Data set austourists.)

## Series: austourists 
## ARIMA(1,0,0)(1,1,0)[4] with drift 
## 
## Coefficients:
##          ar1     sar1   drift
##       0.4705  -0.5305  0.5489
## s.e.  0.1154   0.1122  0.0864
## 
## sigma^2 estimated as 5.15:  log likelihood=-142.48
## AIC=292.97   AICc=293.65   BIC=301.6

##                      ME     RMSE      MAE        MPE     MAPE      MASE
## Training set 0.02200144 2.149384 1.620917 -0.7072593 4.388288 0.5378929
##                     ACF1
## Training set -0.06393238
##                      ME     RMSE      MAE        MPE    MAPE      MASE
## Training set 0.07547467 2.288429 1.661709 -0.3718168 4.49153 0.5514295
##                     ACF1
## Training set -0.03831574

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,0)(1,1,0)[4] with drift
## Q* = 4.0937, df = 5, p-value = 0.536
## 
## Model df: 3.   Total lags used: 8
## Series: austourists 
## ARIMA(1,0,0)(1,1,0)[4] with drift 
## 
## Coefficients:
##          ar1     sar1   drift
##       0.4705  -0.5305  0.5489
## s.e.  0.1154   0.1122  0.0864
## 
## sigma^2 estimated as 5.15:  log likelihood=-142.48
## AIC=292.97   AICc=293.65   BIC=301.6
  1. Consider the total net generation of electricity (in billion kilowatt hours) by the U.S. electric industry (monthly for the period January 1973 - June 2013). (Data set usmelec.) In general there are two peaks per year: in mid-summer and mid-winter.

## [1] 1
## [1] 1

## [1] -5081.506
## [1] -5080.441
## Series: usmelec 
## ARIMA(0,1,2)(0,1,1)[12] 
## Box Cox transformation: lambda= -0.5738331 
## 
## Coefficients:
##           ma1      ma2     sma1
##       -0.4317  -0.2552  -0.8536
## s.e.   0.0439   0.0440   0.0261
## 
## sigma^2 estimated as 1.284e-06:  log likelihood=2544.75
## AIC=-5081.51   AICc=-5081.42   BIC=-5064.87

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,2)(0,1,1)[12]
## Q* = 32.525, df = 21, p-value = 0.05176
## 
## Model df: 3.   Total lags used: 24
## Series: usmelec 
## ARIMA(1,1,3)(2,1,1)[12] 
## Box Cox transformation: lambda= -0.5738331 
## 
## Coefficients:
##          ar1      ma1      ma2     ma3    sar1     sar2     sma1
##       0.3952  -0.8194  -0.0468  0.0414  0.0403  -0.0934  -0.8462
## s.e.  0.3717   0.3734   0.1707  0.1079  0.0560   0.0533   0.0343
## 
## sigma^2 estimated as 1.278e-06:  log likelihood=2547.75
## AIC=-5079.5   AICc=-5079.19   BIC=-5046.23

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,1,3)(2,1,1)[12]
## Q* = 25.995, df = 17, p-value = 0.07454
## 
## Model df: 7.   Total lags used: 24
##        MSN YYYYMM    Value Column_Order
## 25 ELEGPUS 197301 0.159913            1
## 26 ELEGPUS 197302 0.143257            1
## 27 ELEGPUS 197303 0.147847            1
## 28 ELEGPUS 197304 0.139292            1
## 29 ELEGPUS 197305 0.147088            1
## 30 ELEGPUS 197306 0.160945            1
##                                          Description                  Unit Year
## 25 Electricity Net Generation, Electric Power Sector Billion Kilowatthours 1973
## 26 Electricity Net Generation, Electric Power Sector Billion Kilowatthours 1973
## 27 Electricity Net Generation, Electric Power Sector Billion Kilowatthours 1973
## 28 Electricity Net Generation, Electric Power Sector Billion Kilowatthours 1973
## 29 Electricity Net Generation, Electric Power Sector Billion Kilowatthours 1973
## 30 Electricity Net Generation, Electric Power Sector Billion Kilowatthours 1973
##    Month
## 25     1
## 26     2
## 27     3
## 28     4
## 29     5
## 30     6
##          MSN YYYYMM    Value Column_Order                Description
## 7012 ELTCPUS 201910 0.314442           11 Electricity End Use, Total
## 7013 ELTCPUS 201911 0.293909           11 Electricity End Use, Total
## 7014 ELTCPUS 201912 0.318200           11 Electricity End Use, Total
## 7016 ELTCPUS 202001 0.322578           11 Electricity End Use, Total
## 7017 ELTCPUS 202002 0.302213           11 Electricity End Use, Total
## 7018 ELTCPUS 202003 0.297063           11 Electricity End Use, Total
##                       Unit Year Month
## 7012 Billion Kilowatthours 2019    10
## 7013 Billion Kilowatthours 2019    11
## 7014 Billion Kilowatthours 2019    12
## 7016 Billion Kilowatthours 2020     1
## 7017 Billion Kilowatthours 2020     2
## 7018 Billion Kilowatthours 2020     3
##           Apr      May      Jun      Jul      Aug      Sep
## 2492 0.314442 0.293909 0.318200 0.322578 0.302213 0.297063
##                        ME       RMSE        MAE           MPE         MAPE
## Training set   -0.4393834   7.255962   5.331941 -1.768805e-01 2.012093e+00
## Test set     -341.2426831 342.841381 341.242683 -1.047474e+05 1.047474e+05
##                    MASE        ACF1 Theil's U
## Training set  0.5921258 -0.02547411        NA
## Test set     37.8958842  0.52432326  11272.24

  1. For the mcopper data:

## Series: mcopper 
## ARIMA(0,1,1) 
## Box Cox transformation: lambda= 0.1919047 
## 
## Coefficients:
##          ma1
##       0.3720
## s.e.  0.0388
## 
## sigma^2 estimated as 0.04997:  log likelihood=45.05
## AIC=-86.1   AICc=-86.08   BIC=-77.43
## [1] 1
## [1] 0

## Series: mcopper 
## ARIMA(1,1,0) 
## Box Cox transformation: lambda= 0.1919047 
## 
## Coefficients:
##          ar1
##       0.3231
## s.e.  0.0399
## 
## sigma^2 estimated as 0.05091:  log likelihood=39.83
## AIC=-75.66   AICc=-75.64   BIC=-66.99
## Series: mcopper 
## ARIMA(5,1,0) 
## Box Cox transformation: lambda= 0.1919047 
## 
## Coefficients:
##          ar1      ar2     ar3      ar4     ar5
##       0.3706  -0.1475  0.0609  -0.0038  0.0134
## s.e.  0.0421   0.0450  0.0454   0.0450  0.0422
## 
## sigma^2 estimated as 0.05028:  log likelihood=45.32
## AIC=-78.63   AICc=-78.48   BIC=-52.63
## Series: mcopper 
## ARIMA(0,1,1) 
## Box Cox transformation: lambda= 0.1919047 
## 
## Coefficients:
##          ma1
##       0.3720
## s.e.  0.0388
## 
## sigma^2 estimated as 0.04997:  log likelihood=45.05
## AIC=-86.1   AICc=-86.08   BIC=-77.43

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1)
## Q* = 22.913, df = 23, p-value = 0.4659
## 
## Model df: 1.   Total lags used: 24

  1. Choose one of the following seasonal time series: hsales, auscafe, qauselec, qcement, qgas.

## [1] 1
## [1] 1
## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(qauselec, lag = 4)
## KPSS Level = 0.39382, Truncation lag parameter = 4, p-value = 0.07982

## Series: qauselec 
## ARIMA(0,0,1)(0,1,1)[4] 
## Box Cox transformation: lambda= 0.5195274 
## 
## Coefficients:
##          ma1    sma1
##       0.6495  0.2738
## s.e.  0.0746  0.0642
## 
## sigma^2 estimated as 0.03639:  log likelihood=51.5
## AIC=-97.01   AICc=-96.89   BIC=-86.91
## Series: qauselec 
## ARIMA(0,1,1)(0,1,1)[4] 
## Box Cox transformation: lambda= 0.5195274 
## 
## Coefficients:
##           ma1     sma1
##       -0.4976  -0.6910
## s.e.   0.0772   0.0418
## 
## sigma^2 estimated as 0.01435:  log likelihood=149.29
## AIC=-292.59   AICc=-292.47   BIC=-282.5
## Series: qauselec 
## ARIMA(0,1,1)(0,1,2)[4] 
## Box Cox transformation: lambda= 0.5195274 
## 
## Coefficients:
##           ma1     sma1    sma2
##       -0.5037  -0.7993  0.1249
## s.e.   0.0710   0.0870  0.0864
## 
## sigma^2 estimated as 0.01425:  log likelihood=150.36
## AIC=-292.73   AICc=-292.53   BIC=-279.28
## Series: qauselec 
## ARIMA(1,1,1)(1,1,2)[4] 
## Box Cox transformation: lambda= 0.5195274 
## 
## Coefficients:
##          ar1      ma1    sar1     sma1    sma2
##       0.2523  -0.6905  0.8878  -1.6954  0.7641
## s.e.  0.1562   0.1279  0.0864   0.0983  0.0772
## 
## sigma^2 estimated as 0.01345:  log likelihood=156.42
## AIC=-300.84   AICc=-300.43   BIC=-280.67

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,1,1)(1,1,2)[4]
## Q* = 10.605, df = 3, p-value = 0.01407
## 
## Model df: 5.   Total lags used: 8

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,1)(0,1,1)[4]
## Q* = 84.595, df = 6, p-value = 4.441e-16
## 
## Model df: 2.   Total lags used: 8

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1)(0,1,1)[4]
## Q* = 15.102, df = 6, p-value = 0.01948
## 
## Model df: 2.   Total lags used: 8

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1)(0,1,2)[4]
## Q* = 13.684, df = 5, p-value = 0.01775
## 
## Model df: 3.   Total lags used: 8

  1. For the same time series you used in the previous exercise, try using a non-seasonal model applied to the seasonally adjusted data obtained from STL. The stlf() function will make the calculations easy (with method=“arima”). Compare the forecasts with those obtained in the previous exercise. Which do you think is the best approach?

  1. For your retail time series (Exercise 5 above):

##                       ME     RMSE       MAE        MPE      MAPE      MASE
## Training set  0.03930634 13.76615  9.062442 -0.3131317  3.915038 0.4786227
## Test set     65.34294522 73.07251 65.822365 12.3651196 12.488809 3.4763343
##                      ACF1 Theil's U
## Training set -0.001873967        NA
## Test set      0.688276017  1.375289
##                      ME     RMSE       MAE        MPE     MAPE      MASE
## Training set -0.2307731 13.98123  8.610772 -0.1580526 3.659636 0.4547683
## Test set     51.0151385 57.78753 52.147109  9.6439187 9.935091 2.7540910
##                     ACF1 Theil's U
## Training set -0.02406204        NA
## Test set      0.69112460  1.097206
##                     ME      RMSE       MAE       MPE      MAPE     MASE
## Training set  11.39295  25.68062  18.93442  5.389008  8.518759 1.000000
## Test set     102.20588 109.36633 102.20588 19.718171 19.718171 5.397889
##                   ACF1 Theil's U
## Training set 0.7665687        NA
## Test set     0.7553498    2.0616
## [1] 1778.120 1719.790 1697.268
## Time Series:
## Start = 1940 
## End = 1942 
## Frequency = 1 
## [1] 1777.996 1718.869 1695.985
##      ar1      ar1      ar1 
## 1777.996 1718.869 1695.985
## [1] 525.8100 513.8023 499.6705
## Time Series:
## Start = 1969 
## End = 1971 
## Frequency = 1 
## [1] 526.2057 514.0658 500.0111
## [1] 526.2057 514.0658 500.0111