Homework #2

\[Chapter 7\]

1. Consider the pigs series — the number of pigs slaughtered in Victoria each month. a. Use the ses() function in R to find the optimal values of α and ℓ0, and generate forecasts for the next four months. Optimal alpha is 0.2971 and optimal ℓ0 is 77260.06

#Optimization of alpha
fc<-ses(pigs, h=4)
summary(fc)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = pigs, h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.2971 
## 
##   Initial states:
##     l = 77260.0561 
## 
##   sigma:  10308.58
## 
##      AIC     AICc      BIC 
## 4462.955 4463.086 4472.665 
## 
## Error measures:
##                    ME    RMSE      MAE       MPE     MAPE      MASE       ACF1
## Training set 385.8721 10253.6 7961.383 -0.922652 9.274016 0.7966249 0.01282239
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Sep 1995       98816.41 85605.43 112027.4 78611.97 119020.8
## Oct 1995       98816.41 85034.52 112598.3 77738.83 119894.0
## Nov 1995       98816.41 84486.34 113146.5 76900.46 120732.4
## Dec 1995       98816.41 83958.37 113674.4 76092.99 121539.8

b. Compute a 95% prediction interval for the first forecast using ^y±1.96s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R. Prediction intervals are very similar, though not quite the same

s<-sd(fc$residuals)
x<-fc$mean[1]

fc.sd.high<-x+1.96*s
fc.sd.low<-x-1.96*s

fc.sd.high

## [1] 118952.8

fc.sd.low

## [1] 78679.97

2. Write your own function to implement simple exponential smoothing. The function should take arguments y (the time series), alpha (the smoothing parameter α) and level (the initial level ℓ0). It should return the forecast of the next observation in the series. Does it give the same forecast as ses()? The estimates are only off by 0.04

#ses function
ses.fctn<-function(y, alpha, l0){
  #initialize yhat
  yhat<-l0
  #loop through observations
  for(z in 1:length(y)){
    yhat <- alpha*y[z] + (1 - alpha)*yhat 
  }
  return(yhat)
}

ses.fctn(pigs, 0.2971, 77260.0561)

## [1] 98816.45

3. Modify your function from the previous exercise to return the sum of squared errors rather than the forecast of the next observation. Then use the optim() function to find the optimal values of α and ℓ0. Do you get the same values as the ses() function? The values are very similar

#ses function
ses.error<-function(y, pars = c(alpha, l0)){
  error<-0
  sse<-0
  alpha<-pars[1]
  #initialize yhat
  l0 <- pars[2]
  yhat <- l0
  #loop through observations
  for(z in 1:length(y)){
    error <- y[z] - yhat
    sse <- sse + error^2
    
    yhat <- alpha*y[z] + (1 - alpha)*yhat 
  }
  return(sse)
}

optim.var<-optim(par = c(0.5, pigs[1]), y=pigs, fn=ses.error)
optim.var$par[1]

## [1] 0.2990081

optim.var$par[2]

## [1] 76379.27

4. Combine your previous two functions to produce a function which both finds the optimal values of α and ℓ0, and produces a forecast of the next observation in the series.

#combine functions
optim.ses<-function(y){
  optim.var<-optim(par = c(0.5, y[1]), y=y, fn=ses.error)
  alpha<-optim.var$par[1]
  l0<-optim.var$par[2]
  pred<-ses.fctn(pigs, alpha, l0)
  return(pred)
}

optim.ses(pigs)

## [1] 98814.55

5. Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books. a. Plot the series and discuss the main features of the data. Both series have an upward trend, though there is a lot of variability in the data

#plot
autoplot(books)

b. Use the ses() function to forecast each series, and plot the forecasts.

#plot
fc.pb<-ses(books[,1])
fc.hc<-ses(books[,2])
fc.pb

##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       207.1097 162.4882 251.7311 138.8670 275.3523
## 32       207.1097 161.8589 252.3604 137.9046 276.3147
## 33       207.1097 161.2382 252.9811 136.9554 277.2639
## 34       207.1097 160.6259 253.5935 136.0188 278.2005
## 35       207.1097 160.0215 254.1979 135.0945 279.1249
## 36       207.1097 159.4247 254.7946 134.1818 280.0375
## 37       207.1097 158.8353 255.3840 133.2804 280.9389
## 38       207.1097 158.2531 255.9663 132.3899 281.8294
## 39       207.1097 157.6777 256.5417 131.5099 282.7094
## 40       207.1097 157.1089 257.1105 130.6400 283.5793

autoplot(fc.pb)+
  xlab("Day") + ylab("Paperback") +
  ggtitle("Book Sales")

autoplot(fc.hc)+
  xlab("Day") + ylab("Hardcover") +
  ggtitle("Book Sales")

c. Compute the RMSE values for the training data in each case.

sqrt(mean(fc.pb$residuals^2))

## [1] 33.63769

sqrt(mean(fc.hc$residuals^2))

## [1] 31.93101

6. We will continue with the daily sales of paperback and hardcover books in data set books. a. Apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.

#plot
fc.pb.holt<-holt(books[,1], h=4)
fc.hc.holt<-holt(books[,2], h=4)

autoplot(fc.pb.holt)+
  xlab("Day") + ylab("Paperback") +
  ggtitle("Book Sales")

autoplot(fc.hc.holt)+
  xlab("Day") + ylab("Hardcover") +
  ggtitle("Book Sales")

b. Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets. While both SES and holt are reasonably similar in its RMSE measures, Holt does outperform for both paper back and hard cover, even if Holts method uses one more parameter.

sqrt(mean(fc.pb.holt$residuals^2))

## [1] 31.13692

sqrt(mean(fc.hc.holt$residuals^2))

## [1] 27.19358

c. Compare the forecasts for the two series using both methods. Which do you think is best? I think Holts is better. Not only is the RMSE better, but it captures the upward trend in the dad, unlike SES which is a little too naive in that it doesn’t account for trend in the point forecast.

d. Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt. Holt has narrower prediction intervals than SES

#get rmse
ses.pb.rmse<-sqrt(mean(fc.pb$residuals^2))
ses.hc.rmse<-sqrt(mean(fc.hc$residuals^2))
holt.pb.rmse<-sqrt(mean(fc.pb.holt$residuals^2))
holt.hc.rmse<-sqrt(mean(fc.hc.holt$residuals^2))

#ses pb
ses.pb.rmse.low<-fc.pb$mean[1]-1.96*ses.pb.rmse
ses.pb.rmse.high<-fc.pb$mean[1]+1.96*ses.pb.rmse

#ses hc
ses.hc.rmse.low<-fc.hc$mean[1]-1.96*ses.hc.rmse
ses.hc.rmse.high<-fc.hc$mean[1]+1.96*ses.hc.rmse

#holt pb
holt.pb.rmse.low<-fc.pb.holt$mean[1]-1.96*holt.pb.rmse
holt.pb.rmse.high<-fc.pb.holt$mean[1]+1.96*holt.pb.rmse

#holt hc
holt.hc.rmse.low<-fc.hc.holt$mean[1]-1.96*holt.hc.rmse
holt.hc.rmse.high<-fc.hc.holt$mean[1]+1.96*holt.hc.rmse


x<-c(ses.pb.rmse.low, holt.pb.rmse.low, ses.hc.rmse.low, holt.hc.rmse.low, ses.pb.rmse.high, holt.pb.rmse.high, ses.hc.rmse.high, holt.hc.rmse.high)

dim(x) <- c(4,2)
colnames(x)<-c("95% Low", "95% High")
rownames(x)<-c("SES PB", "Holt PB", "SES HC", "Holt HC")
knitr::kable(x, "pipe")

	95% Low	95% High
SES PB	141.1798	273.0395
Holt PB	148.4384	270.4951
SES HC	176.9753	302.1449
Holt HC	196.8745	303.4733

7. For this exercise use data set eggs, the price of a dozen eggs in the United States from 1900–1993. Experiment with the various options in the holt() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.

[Hint: use h=100 when calling holt() so you can clearly see the differences between the various options when plotting the forecasts.]

Which model gives the best RMSE

In this case, I used a simple Holt estimation, a damped estimation, and a box-cox transformation. The simple prediction predicts a negative price by 2020, and continues to go negative, which is not a reasonable estimate. The damped forecast is essentially a straight line at the last observation, which completely ignores the downward trend of price. Essentially, the simple Holt forecast only takes into account the linear trend, while damped completely ignores it. The box-cox transformation does a little of both, where it predicts a continuing downward trend, but with a decreasing slope. This means the price approaches 0, but at least by the end of the forecast, doesn’t get there. This seems like the most reasonable forecast, and that is confirmed by the RMSE value, which is lowest by far for the box-cox model

#plot eggs
autoplot(eggs)

#simple forecast
fc.holt.1<-holt(eggs, h=100)
fc.holt.1.rmse<-sqrt(mean(fc.holt.1$residuals^2))

#damped
fc.holt.2<-holt(eggs, damped=TRUE, h=100)
fc.holt.2.rmse<-sqrt(mean(fc.holt.2$residuals^2))

#box cox transformation
fc.holt.3<-holt(eggs, lambda=BoxCox.lambda(eggs), h=100)
fc.holt.3.rmse<-sqrt(mean(fc.holt.3$residuals^2))

autoplot(eggs) +
  autolayer(fc.holt.1, series="Simple", PI=FALSE) +
  autolayer(fc.holt.2, series="Damped", PI=FALSE) +
  autolayer(fc.holt.3, series="Box-Cox", PI=FALSE) +
  ggtitle("Forecasts For Eggs") + xlab("Year") +
  ylab("Price Of Dozens Of Eggs") +
  guides(colour=guide_legend(title="Forecast"))

x<-c(fc.holt.1.rmse, fc.holt.2.rmse, fc.holt.3.rmse)

dim(x) <- c(1,3)
colnames(x)<-c("Simple", "Damped", "Box-Cox")
rownames(x)<-c("RMSE")
knitr::kable(x, "pipe")

	Simple	Damped	Box-Cox
RMSE	26.58219	26.54019	1.032217

8. Recall your retail time series data (from Exercise 3 in Section 2.10). a. Why is multiplicative seasonality necessary for this series? Multiplicative seasonality is necessary because the seasonal variations are changing proportional to the level of the series

library(seasonal)

## Warning: package 'seasonal' was built under R version 4.0.3

ret.data <- readxl::read_excel("D:/predictive analytics/assignments/retail.xlsx", skip = 1)
ret <- ts(ret.data[, "A3349337W"], frequency = 12, start = c(1982, 4))

autoplot(ret)

ret %>% seas(x11="") -> fit
autoplot(fit) +
  ggtitle("X11 decomposition of retail")

b. Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

hw.fit <- hw(ret,seasonal="multiplicative", h=36)
hw.fit.damped <- hw(ret,damped=TRUE, seasonal="multiplicative", h=36)

autoplot(ret) +
  autolayer(hw.fit, series="HW multiplicative", PI=FALSE) +
  autolayer(hw.fit.damped, series="HW multiplicative damped", PI=FALSE) +
  xlab("Year") +
  ylab("Retail Sales")+
  xlim(c(2010, 2017))

## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.

c. Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer? I would prefer without damped, it has a smaller RMSE and retail sales tends to continue tog row with population and other demographics, which is fairly monotonic

hw.rsme<- tsCV(ret, hw, h=1,seasonal = "multiplicative")
hw.rsme.damped<-tsCV(ret, hw, h=1, seasonal = "multiplicative", damped=TRUE)

sqrt(mean(hw.rsme^2, na.rm = TRUE))

## [1] 15.9046

sqrt(mean(hw.rsme.damped^2, na.rm = TRUE))

## [1] 15.98782

d. Check that the residuals from the best method look like white noise. It is close enough to while noise for me to accept the residuals here

checkresiduals(hw.rsme)

## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.

e. Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 8 in Section 3.7? The naive does better than holt-winter, damped or not damped. This makes sense, as the retail sales flattened out after 2011. Long term however, as retail sales should continue to grow as population and inflation increases spending, it should go back to growing. So damped and naive will do a worse job than Holt-Winter

ret.train<-window(ret, end=c(2010,12))
ret.test<-window(ret, start=c(2011, 1))

autoplot(ret) +
  autolayer(ret.train, series="Training") +
  autolayer(ret.test, series="Test")

naive.fit<-snaive(ret.train, h=36)
hw.fit.train <- hw(ret.train, seasonal="multiplicative", h=36)
hw.fit.damped.train <- hw(ret.train, seasonal="multiplicative", h=36, damped=TRUE)

accuracy(naive.fit, ret.test)

##                     ME     RMSE      MAE      MPE      MAPE      MASE      ACF1
## Training set  9.460661 26.30758 21.23363 4.655690 12.762886 1.0000000 0.8070166
## Test set     14.094444 20.91121 17.30556 3.802915  4.911945 0.8150068 0.5265917
##              Theil's U
## Training set        NA
## Test set     0.6200865

accuracy(hw.fit.train, ret.test)

##                      ME     RMSE       MAE        MPE     MAPE      MASE
## Training set -0.8547669 13.33918  9.850521 -0.6888481 6.402940 0.4639112
## Test set     -8.8886652 22.87216 17.728638 -3.0496673 5.336166 0.8349319
##                    ACF1 Theil's U
## Training set 0.07325471        NA
## Test set     0.80376528 0.7365416

accuracy(hw.fit.damped.train, ret.test)

##                      ME     RMSE       MAE        MPE     MAPE      MASE
## Training set  0.1340865 13.14634  9.755892 0.05575759 6.322517 0.4594546
## Test set     15.1173579 23.10869 18.810903 3.99168849 5.241861 0.8859013
##                    ACF1 Theil's U
## Training set 0.09808033        NA
## Test set     0.63781919 0.6747655

autoplot(ret) +
  autolayer(naive.fit, series="Naive", PI=FALSE) +
  autolayer(hw.fit.train, series="Holt-Winter", PI=FALSE)+
  xlim(c(2008, 2014))

## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.

9. For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set? The STL decomposition using a box-cox transformation performed worse

stl.fit.train<-ret.train%>%stlm(s.window=13, robust=TRUE, method="ets", lambda=BoxCox.lambda(ret.train))%>%forecast(h=36, lambda=BoxCox.lambda(ret.train))

## Warning in InvBoxCox(fcast$mean, lambda, biasadj, fcast): biasadj information
## not found, defaulting to FALSE.

accuracy(naive.fit, ret.test)

##                     ME     RMSE      MAE      MPE      MAPE      MASE      ACF1
## Training set  9.460661 26.30758 21.23363 4.655690 12.762886 1.0000000 0.8070166
## Test set     14.094444 20.91121 17.30556 3.802915  4.911945 0.8150068 0.5265917
##              Theil's U
## Training set        NA
## Test set     0.6200865

accuracy(hw.fit.train, ret.test)

##                      ME     RMSE       MAE        MPE     MAPE      MASE
## Training set -0.8547669 13.33918  9.850521 -0.6888481 6.402940 0.4639112
## Test set     -8.8886652 22.87216 17.728638 -3.0496673 5.336166 0.8349319
##                    ACF1 Theil's U
## Training set 0.07325471        NA
## Test set     0.80376528 0.7365416

accuracy(hw.fit.damped.train, ret.test)

##                      ME     RMSE       MAE        MPE     MAPE      MASE
## Training set  0.1340865 13.14634  9.755892 0.05575759 6.322517 0.4594546
## Test set     15.1173579 23.10869 18.810903 3.99168849 5.241861 0.8859013
##                    ACF1 Theil's U
## Training set 0.09808033        NA
## Test set     0.63781919 0.6747655

accuracy(stl.fit.train, ret.test)

##                     ME     RMSE       MAE       MPE     MAPE      MASE
## Training set  1.160974 12.03861  8.398332 0.8282614 5.354964 0.3955203
## Test set     17.744477 27.87913 21.913666 4.6529147 6.087432 1.0320262
##                    ACF1 Theil's U
## Training set 0.01707693        NA
## Test set     0.69510092 0.8203021

autoplot(ret) +
  autolayer(naive.fit, series="Naive", PI=FALSE) +
  autolayer(hw.fit.train, series="Holt-Winter", PI=FALSE)+
  autolayer(stl.fit.train, series="STL", PI=FALSE)+
  xlim(c(2008, 2014))

## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.

10. For this exercise use data set ukcars, the quarterly UK passenger vehicle production data from 1977Q1–2005Q1 a. Plot the data and describe the main features of the series. b. Decompose the series using STL and obtain the seasonally adjusted data. Data generally trends up, but does trend down at the beginning. The decomp shows strong seasonality, but an additive model is appropriate here

#autoplot
autoplot(ukcars)

#decomp
ukcars %>%
  stl(t.window=13, s.window="periodic", robust=TRUE) %>%
  autoplot()

c. Forecast the next two years of the series using an additive damped trend method applied to the seasonally adjusted data. (This can be done in one step using stlf() with arguments etsmodel=“AAN”, damped=TRUE.)

fcast.stlf <- stlf(ukcars, etsmodel="AAN", damped=TRUE, h=8)
autoplot(fcast.stlf, series="Forecast") + ylab("UK Cars") +
  ggtitle("Random Walk forecasts STLF")

d. Forecast the next two years of the series using Holt’s linear method applied to the seasonally adjusted data (as before but with damped=FALSE).

fcast.stlf.linear <- stlf(ukcars, etsmodel="AAN", damped=FALSE, h=8)
autoplot(fcast.stlf.linear, series="Forecast") + ylab("UK Cars") +
  ggtitle("Random Walk forecasts STLF")

e. Now use ets() to choose a seasonal model for the data.

ets.forecast<-ets(ukcars)
ets.forecast %>% forecast(h=8) %>%
  autoplot() +
  ylab("ukcars")

f. Compare the RMSE of the ETS model with the RMSE of the models you obtained using STL decompositions. Which gives the better in-sample fits? stl gives better in sample fits

accuracy(fcast.stlf)

##                    ME     RMSE      MAE        MPE     MAPE     MASE       ACF1
## Training set 1.551267 23.32113 18.48987 0.04121971 6.042764 0.602576 0.02262668

accuracy(fcast.stlf.linear)

##                      ME   RMSE     MAE        MPE    MAPE      MASE       ACF1
## Training set -0.3412727 23.295 18.1605 -0.5970778 5.98018 0.5918418 0.02103582

accuracy(ets.forecast)

##                    ME     RMSE      MAE        MPE     MAPE      MASE
## Training set 1.313884 25.23244 20.17907 -0.1570979 6.629003 0.6576259
##                    ACF1
## Training set 0.02573334

g. Compare the forecasts from the three approaches? Which seems most reasonable? They are very similar, so any one of them would fit, though I think the non-damped ETS would be best

autoplot(ukcars) +
  autolayer(fcast.stlf, series="Damped STLF", PI=FALSE) +
  autolayer(fcast.stlf.linear, series="Not Damped STLF", PI=FALSE) +
  autolayer(forecast(ets.forecast, h=8), series="ETS", PI=FALSE) +
  xlab("Year") +
  ylab("UK cars")+
  xlim(c(2000, 2008))

## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.

h. Check the residuals of your preferred model. They are very similar, so any one of them would fit

checkresiduals(fcast.stlf.linear)

## Warning in checkresiduals(fcast.stlf.linear): The fitted degrees of freedom is
## based on the model used for the seasonally adjusted data.

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  ETS(A,A,N)
## Q* = 22.061, df = 4, p-value = 0.0001949
## 
## Model df: 4.   Total lags used: 8

11. For this exercise use data set visitors, the monthly Australian short-term overseas visitors data, May 1985–April 2005. a. Make a time plot of your data and describe the main features of the series.

#autoplot
autoplot(visitors)

#decomp
visitors %>%
  stl(t.window=13, s.window="periodic", robust=TRUE) %>%
  autoplot()

b. Split your data into a training set and a test set comprising the last two years of available data. Forecast the test set using Holt-Winters’ multiplicative method.

#train and test set
visitors.train<-window(visitors, end=c(2003,12))
visitors.test<-window(visitors, start=c(2004, 1))

visitors.hw.fit <- hw(visitors.train,seasonal="multiplicative", h=24)

autoplot(visitors)+
  autolayer(visitors.hw.fit, PI=FALSE) +
  xlab("Year") +
  ylab("Visitors")+
  xlim(c(1990, 2006))

## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.

c. Why is multiplicative seasonality necessary here? Because the seasonal variation increases over time

d. Forecast the two-year test set using each of the following methods: i. an ETS model

ets.forecast.visitors <- forecast(ets(visitors.train), h=24)

ii. an additive ETS model applied to a Box-Cox transformed series;

ets.forecast.visitors.box <- forecast(ets(visitors.train, additive.only=TRUE, lambda = BoxCox.lambda(visitors.train)), h=24)

iii. a seasonal naïve method;

snaive.forecast.visitors <- snaive(visitors.train, h=24)

iii. an STL decomposition applied to the Box-Cox transformed data followed by an ETS model applied to the seasonally adjusted (transformed) data.;

stlf.forecast.visitors <- stlf(visitors.train, t.window = 13, s.window = "periodic", h = 24, robust = TRUE, method = "ets", lambda = BoxCox.lambda(visitors.train))

Charts

#ETS model
autoplot(ets.forecast.visitors)+
  autolayer(visitors.test)+
  xlim(c(2000, 2005))

## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.

## Warning: Removed 176 row(s) containing missing values (geom_path).

## Warning: Removed 3 row(s) containing missing values (geom_path).

#ETS box model
autoplot(ets.forecast.visitors.box)+
  autolayer(visitors.test)+
  xlim(c(2000, 2005))

## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.

## Warning: Removed 176 row(s) containing missing values (geom_path).

## Warning: Removed 3 row(s) containing missing values (geom_path).

#snaive model
autoplot(snaive.forecast.visitors)+
  autolayer(visitors.test)+
  xlim(c(2000, 2005))

## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.

## Warning: Removed 176 row(s) containing missing values (geom_path).

## Warning: Removed 3 row(s) containing missing values (geom_path).

#stlf model
autoplot(stlf.forecast.visitors)+
  autolayer(visitors.test)+
  xlim(c(2000, 2005))

## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.

## Warning: Removed 177 row(s) containing missing values (geom_path).

## Warning: Removed 3 row(s) containing missing values (geom_path).

e. Which method gives the best forecasts? Does it pass the residual tests? The additive ETS model applied to a Box-Cox transformed series has the lowest test RMSE and does pass the residual tests

#ETS model
accuracy(ets.forecast.visitors, visitors.test)

##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set  1.232439 14.95912 10.80756 0.2153496 4.044462 0.4138209
## Test set     12.617119 22.90797 16.63487 2.5783148 3.521617 0.6369485
##                      ACF1 Theil's U
## Training set -0.002560158        NA
## Test set      0.518168412 0.3411748

#ETS box model
accuracy(ets.forecast.visitors.box, visitors.test)

##                    ME     RMSE      MAE         MPE     MAPE      MASE
## Training set 1.168056 15.51547 11.22989  0.11241764 4.046645 0.4299918
## Test set     1.581019 20.22801 16.00030 -0.04770367 3.536397 0.6126507
##                     ACF1 Theil's U
## Training set -0.02809955        NA
## Test set      0.67007296 0.3064412

#snaive model
accuracy(snaive.forecast.visitors, visitors.test)

##                    ME     RMSE      MAE       MPE     MAPE     MASE      ACF1
## Training set 16.59292 31.24792 26.11651  6.836282 10.18912 1.000000 0.6514168
## Test set     50.58125 59.02680 50.58125 11.730236 11.73024 1.936754 0.6653705
##              Theil's U
## Training set        NA
## Test set     0.9308711

#stlf model
accuracy(stlf.forecast.visitors, visitors.test)

##                        ME     RMSE      MAE         MPE     MAPE      MASE
## Training set  -0.02532068 15.39434 11.08552 -0.09928427 4.107846 0.4244642
## Test set     -18.53595827 24.35689 20.18924 -4.57352057 4.909926 0.7730451
##                     ACF1 Theil's U
## Training set -0.02566624        NA
## Test set      0.12433192 0.3995668

checkresiduals(ets.forecast.visitors.box)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,Ad,A)
## Q* = 15.475, df = 7, p-value = 0.03037
## 
## Model df: 17.   Total lags used: 24

f. Compare the same four methods using time series cross-validation with the tsCV() function instead of using a training and test set. Do you come to the same conclusions? No, the STL forecast does the best, though not by much

fets_add_BoxCox <- function(y, h) {
  forecast(ets(y, lambda = BoxCox.lambda(y), additive.only = TRUE),h = h)
}
fstlm <- function(y, h) {
  forecast(stlm(y, lambda = BoxCox.lambda(y),s.window = 13,robust = TRUE,method = "ets"),h = h)
}
fets <- function(y, h) {
  forecast(ets(y),h = h)
}


sqrt(mean(tsCV(visitors, fets, h = 1)^2, na.rm = TRUE))

## [1] 18.52985

sqrt(mean(tsCV(visitors, fets_add_BoxCox, h = 1)^2, na.rm = TRUE))

## [1] 18.86439

sqrt(mean(tsCV(visitors, snaive, h = 1)^2, na.rm = TRUE))

## [1] 32.78675

sqrt(mean(tsCV(visitors, fstlm, h = 1)^2, na.rm = TRUE))

## [1] 17.49642

12. The fets() function below returns ETS forecasts.

fets <- function(y, h) {
 forecast(ets(y),
          h = h)
}

a. Apply tsCV() for a forecast horizon of h=4, for both ETS and seasonal naïve methods to the qcement data, (Hint: use the newly created fets() and the existing snaive() functions as your forecast function arguments.)

ets.cv<-tsCV(qcement, fets, h = 4)
snaive.cv<-tsCV(qcement, snaive, h = 4)

Compute the MSE of the resulting 4-step-ahead errors. (Hint: make sure you remove missing values.) Why are there missing values? Comment on which forecasts are more accurate. Is this what you expected? The reason there are missing variables is that the step-ahead has no additional data for all of the steps. the ets model is more accurate in terms of MSE, which I would expect over a seasonal naive model

#MSE
mean(ets.cv^2,na.rm=TRUE)

## [1] 0.01250954

mean(snaive.cv^2,na.rm=TRUE)

## [1] 0.01792147

13. Compare ets(), snaive() and stlf() on the following six time series. For stlf(), you might need to use a Box-Cox transformation. Use a test set of three years to decide what gives the best forecasts. ausbeer, bricksq, dole, a10, h02, usmelec. Compare functions

compare.methods<-function(y.train, y.test, period){

  
  h_param<-period*3
  
  
  naive.fit<-snaive(y.train, h=h_param)
  ets.fit.inter <- ets(y.train)
  ets.fit<-forecast(ets.fit.inter, h=h_param)
  stlf.fit <- stlf(y.train, t.window = period+1, s.window = "periodic", h = h_param, robust = TRUE, lambda = BoxCox.lambda(y.train))
  
  
  snaive.acc<-accuracy(naive.fit, y.test)[4]
  ets.acc<-accuracy(ets.fit, y.test)[4]
  stlf.acc<-accuracy(stlf.fit, y.test)[4]

  return(c(snaive.acc, ets.acc, stlf.acc))
}

ausbeer

#test and train set
ausbeer.train<-window(ausbeer, end=c(2007, 2))
ausbeer.test<-window(ausbeer, start=c(2007, 3))
#plot
autoplot(ausbeer.train, series="Training") +
autolayer(ausbeer.test, series="Test")

#get accuracy
ausbeer.acc<-compare.methods(ausbeer.train, ausbeer.test, 4)
ausbeer.acc

## [1] 10.805091  9.620828 14.619820

bricksq

#test and train set
bricksq.train<-window(bricksq, end=c(1991, 1))
bricksq.test<-window(bricksq, start=c(1991, 2))
#plot
autoplot(bricksq.train, series="Training") +
autolayer(bricksq.test, series="Test")

#get accuracy
bricksq.acc<-compare.methods(bricksq.train, bricksq.test, 4)

dole

#test and train set
dole.train<-window(dole, end=c(1989, 7))
dole.test<-window(dole, start=c(1989, 8))
#plot
autoplot(dole.train, series="Training") +
autolayer(dole.test, series="Test")

#get accuracy
dole.acc<-compare.methods(dole.train, dole.test, 12)

a10

#test and train set
a10.train<-window(a10, end=c(2005, 6))
a10.test<-window(a10, start=c(2005, 7))
#plot
autoplot(a10.train, series="Training") +
autolayer(a10.test, series="Test")

#get accuracy
a10.acc<-compare.methods(a10.train, a10.test, 12)

h02

#test and train set
h02.train<-window(h02, end=c(2005, 6))
h02.test<-window(h02, start=c(2005, 7))
#plot
autoplot(h02.train, series="Training") +
autolayer(h02.test, series="Test")

#get accuracy
h02.acc<-compare.methods(h02.train, h02.test, 12)

usmelec

#test and train set
usmelec.train<-window(usmelec, end=c(2005, 6))
usmelec.test<-window(usmelec, start=c(2005, 7))
#plot
autoplot(usmelec.train, series="Training") +
autolayer(usmelec.test, series="Test")

#get accuracy
usmelec.acc<-compare.methods(usmelec.train, usmelec.test, 12)

Comparison

#create matrix
x<-round(c(ausbeer.acc[1], bricksq.acc[1], dole.acc[1], a10.acc[1], h02.acc[1], usmelec.acc[1],ausbeer.acc[2], bricksq.acc[2], dole.acc[2], a10.acc[2], h02.acc[2], usmelec.acc[2],ausbeer.acc[3], bricksq.acc[3], dole.acc[3], a10.acc[3], h02.acc[3], usmelec.acc[3]),2)

dim(x) <- c(6,3)
colnames(x)<-c("Naive", "ETS", "STLF")
rownames(x)<-c("Ausbeer", "bricksq", "dole", "a10", "h02", "usmelec")
knitr::kable(x, "pipe")

	Naive	ETS	STLF
Ausbeer	10.81	9.62	14.62
bricksq	55.92	15.01	28.08
dole	240830.92	279668.65	272047.97
a10	5.18	2.56	2.15
h02	0.09	0.08	0.08
usmelec	18.61	12.07	17.92

14. a.Use ets() on the following series: bicoal, chicken, dole, usdeaths, lynx, ibmclose, eggs. Does it always give good forecasts? bicoal (Does better than Naive)

mean(tsCV(bicoal, fets, h=10)^2, na.rm = T)

## [1] 11565.88

mean(tsCV(bicoal, snaive, h=10)^2, na.rm = T)

## [1] 12081.3

chicken (Does worse than Naive)

mean(tsCV(chicken, fets, h=10)^2, na.rm = T)

## [1] 1339.194

mean(tsCV(chicken, snaive, h=10)^2, na.rm = T)

## [1] 1212.813

dole (Does better than Naive)

mean(tsCV(dole, fets, h=10)^2, na.rm = T)

## [1] 2758010177

mean(tsCV(dole, snaive, h=10)^2, na.rm = T)

## [1] 5331238533

usdeaths (Does better than Naive)

mean(tsCV(usdeaths, fets, h=10)^2, na.rm = T)

## [1] 1789494

mean(tsCV(usdeaths, snaive, h=10)^2, na.rm = T)

## [1] 556867.2

lynx (Does worse than Naive)

mean(tsCV(lynx, fets, h=10)^2, na.rm = T)

## [1] 5928549

mean(tsCV(lynx, snaive, h=10)^2, na.rm = T)

## [1] 4777830

ibmclose (Does worse than Naive)

mean(tsCV(ibmclose, fets, h=10)^2, na.rm = T)

## [1] 346.7712

mean(tsCV(ibmclose, snaive, h=10)^2, na.rm = T)

## [1] 322.2696

eggs (Does better than Naive)

mean(tsCV(eggs, fets, h=10)^2, na.rm = T)

## [1] 2257.868

mean(tsCV(eggs, snaive, h=10)^2, na.rm = T)

## [1] 2361.556

b.Find an example where it does not work well. Can you figure out why? In this case, there is no seasonality. Instead, there is periodocity. Also there is no growth, it is relatively flat, which benefits a naive estimation

lynx.fit<-ets(lynx)

lynx.fit %>% forecast(h=4) %>%
  autoplot() +
  ylab("International visitor night in Australia (millions)")

15. Show that the point forecasts from an ETS(M,A,M) model are the same as those obtained using Holt-Winters’ multiplicative method.

#ETS
usdeaths.ets<-ets(usdeaths, model="MAM")
usdeaths.ets.fcst<-forecast(usdeaths.ets, h=1)
usdeaths.ets.fcst$mean

##           Jan
## 1979 8233.107

#Holt-Winter
usdeaths.hw<-hw(usdeaths, seasonal="multiplicative", h=1)
usdeaths.hw$mean

##          Jan
## 1979 8217.64

16. Show that the forecast variance for an ETS(A,N,N) model is given by: \[σ^2[1+α^2*(h−1)]\]

ibmclose.ets <- ets(ibmclose, model = "ANN")
ibmclose.ets.fcst <- forecast(ibmclose.ets,h=1)
ibmclose.ets.fcst

##     Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 370       356.9995 347.6907 366.3083 342.7629 371.2361

#alpha is 0.9999, sigma is 7.2637
alpha<-0.9999
sigma<-7.2637
h<-2

ibmclose.ets.fcst$mean[1]-((sigma)*(1+(alpha^2)*(h-1)))

## [1] 342.4736

ibmclose.ets.fcst$lower[1,'95%']

##      95% 
## 342.7629

Write down 95% prediction intervals for an ETS(A,N,N) model as a function of ℓT, α, h and σ, assuming normally distributed errors.

\[point forecast=ℓ_T+(1−α)*ℓ_{t|t−1}\] \[interval = σ^2[1+α^2*(h−1)]\] \[Upper = ℓ_T+(1−α)*ℓ_{t|t−1}+(σ^2[1+α^2*(h−1)])\] \[Lower = ℓ_T+(1−α)*ℓ_{t|t−1}-(σ^2[1+α^2*(h−1)])\]

\[Chapter 8\] 1. Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers. a. Explain the differences among these figures. Do they all indicate that the data are white noise? The differences here are simply the 95% limits, however all of them show white noise b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise? Because the law of large numbers means that as more observations are added, we should approacha mean and the variation in the lags will become smaller as more data is added

2. A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced. A nonstationary time series will have an ACF plot that decreases slowly and have values outside of the 95% limit

autoplot(ibmclose)

#ACF plot
ggAcf(ibmclose)

#PACF plot
ggPacf(ibmclose)

3. For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data. a.usnetelec

autoplot(usnetelec)

usnetelec.bc<-BoxCox(usnetelec,BoxCox.lambda(usnetelec))
summary(ur.kpss(usnetelec.bc))

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 1.4583 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

#Since test stat is muich higher than 1pct value, we need to difference and try test again
BoxCox(usnetelec,BoxCox.lambda(usnetelec))  %>% diff(differences = 2) %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.072 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

ndiffs(BoxCox(usnetelec,BoxCox.lambda(usnetelec)))

## [1] 2

#ACF plot
ggAcf(diff(BoxCox(usnetelec,BoxCox.lambda(usnetelec)), differences = 2))

#PACF plot
ggPacf(diff(BoxCox(usnetelec,BoxCox.lambda(usnetelec)), differences = 2))

b.usgdp

autoplot(usgdp)

BoxCox(usgdp,BoxCox.lambda(usgdp)) %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 4.8114 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

#Since test stat is muich higher than 1pct value, we need to difference and try test again
BoxCox(usgdp,BoxCox.lambda(usgdp))  %>% diff(differences=2) %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.012 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

ndiffs(BoxCox(usgdp,BoxCox.lambda(usgdp)))

## [1] 1

#ACF plot
ggAcf(diff(BoxCox(usgdp,BoxCox.lambda(usgdp)), differences=2))

#PACF plot
ggPacf(diff(BoxCox(usgdp,BoxCox.lambda(usgdp)), differences=2))

c.mcopper

autoplot(mcopper)

BoxCox(mcopper,BoxCox.lambda(mcopper)) %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 6 lags. 
## 
## Value of test-statistic is: 6.2659 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

#Since test stat is much higher than 1pct value, we need to difference and try test again
BoxCox(mcopper,BoxCox.lambda(mcopper))  %>% diff() %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 6 lags. 
## 
## Value of test-statistic is: 0.0573 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

ndiffs(BoxCox(mcopper,BoxCox.lambda(mcopper)))

## [1] 1

#ACF plot
ggAcf(diff(BoxCox(mcopper,BoxCox.lambda(mcopper))))

#PACF plot
ggPacf(diff(BoxCox(mcopper,BoxCox.lambda(mcopper))))

d.enplanements

autoplot(enplanements)

BoxCox(enplanements,BoxCox.lambda(enplanements)) %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 4.3785 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

#Since test stat is much higher than 1pct value, we need to difference and try test again
BoxCox(enplanements,BoxCox.lambda(enplanements))  %>% diff() %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0151 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

ndiffs(BoxCox(enplanements,BoxCox.lambda(enplanements)))

## [1] 1

#ACF plot
ggAcf(diff(BoxCox(enplanements,BoxCox.lambda(enplanements))))

#PACF plot
ggPacf(diff(BoxCox(enplanements,BoxCox.lambda(enplanements))))

d.visitors

autoplot(visitors)

BoxCox(visitors,BoxCox.lambda(visitors)) %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 4.5233 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

#Since test stat is much higher than 1pct value, we need to difference and try test again
BoxCox(visitors,BoxCox.lambda(visitors))  %>% diff() %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.0519 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

ndiffs(BoxCox(visitors,BoxCox.lambda(visitors)))

## [1] 1

#ACF plot
ggAcf(diff(BoxCox(visitors,BoxCox.lambda(visitors))))

#PACF plot
ggPacf(diff(BoxCox(visitors,BoxCox.lambda(visitors))))

4. For the enplanements data, write down the differences you chose above using backshift operator notation. \[B(ln(y_t))\]

5. For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

ret.data <- readxl::read_excel("D:/predictive analytics/assignments/retail.xlsx", skip = 1)
ret <- ts(ret.data[, "A3349337W"], frequency = 12, start = c(1982, 4))
#plot
autoplot(ret)

ggtsdisplay(ret)

#box cox
BoxCox(ret,BoxCox.lambda(ret)) %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 5.8558 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

ndiffs(BoxCox(ret,BoxCox.lambda(ret)))

## [1] 1

#diff with box cox, which passes unit root test but theres clearly seasonality problems
BoxCox(ret,BoxCox.lambda(ret))  %>% diff() %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0229 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

ggtsdisplay(diff(ret))

#add in seasonality
BoxCox(ret,BoxCox.lambda(ret))  %>% diff()%>% diff(12) %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.014 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

ggtsdisplay(ret %>% BoxCox(BoxCox.lambda(ret)) %>% diff(1) %>% diff(12))

6. Use R to simulate and plot some data from simple ARIMA models. a. Use the following R code to generate data from an AR(1) model with ϕ1=0.6 and σ2=1. The process starts with y1=0

ar.1.fct<-function(phi, n=100){
  y <- ts(numeric(n))
  e <- rnorm(n)
  for(i in 2:n)
    y[i] <- phi*y[i-1] + e[i]
  return(y)
}
ar.1.6<-ar.1.fct(.6)

b. Produce a time plot for the series. How does the plot change as you change ϕ1? As phi increases, the variance of the data also increases

ar.1.1<-ar.1.fct(.1)
ar.1.9<-ar.1.fct(.9)


autoplot(ar.1.1, series="phi=0.1")+
  autolayer(ar.1.6, series="phi=0.6")+
  autolayer(ar.1.9, series="phi=0.9")

c. Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1.

ma.1.fct<-function(theta, n=100){
  y <- ts(numeric(n))
  e <- rnorm(n)
  e[1]<-0
  for(i in 2:n)
    y[i] <- theta*e[i-1] + e[i]
  return(y)
}
ma.1.6<-ar.1.fct(.6)

d. Produce a time plot for the series. How does the plot change as you change θ1? As theta increases, the variance of the data also increases

ma.1.1<-ar.1.fct(.1)
ma.1.9<-ar.1.fct(.9)


autoplot(ma.1.1, series="theta=0.1")+
  autolayer(ma.1.6, series="theta=0.6")+
  autolayer(ma.1.9, series="theta=0.9")

e. Generate data from an ARMA(1,1) model with ϕ1=0.6, θ1=0.6 and σ2=1.

arma.1.fct<-function(theta, phi, n=100){
  y <- ts(numeric(n))
  e <- rnorm(n)
  e[1]<-0
  for(i in 2:n)
    y[i] <- phi*y[i-1]+theta*e[i-1] + e[i]
  return(y)
}
arma.1.6<-arma.1.fct(0.6, 0.6)

f. Generate data from an AR(2) model with ϕ1=−0.8, ϕ2=0.3 and σ2=1. (Note that these parameters will give a non-stationary series.)

ar.2.fct<-function(theta, phi, phi2, n=100){
  y <- ts(numeric(n))
  e <- rnorm(n)
  e[1]<-0
  for(i in 3:n)
    y[i] <- phi*y[i-1]+phi2*y[i-2]+theta*e[i-1] + e[i]
  return(y)
}
ar.2.6<-ar.2.fct(1, -0.8, 0.3)

g. Graph the latter two series and compare them. The ARMA series stays relatively homoskedastic while the AR 2 models variance increases drastically as time goes on

autoplot(arma.1.6, series="ARMA")+
  autolayer(ar.2.6, series="AR.2")

7. Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States. a. By studying appropriate graphs of the series in R, find an appropriate ARIMA(p,d,q) model for these data. An arima(0,2,0) moidel seems to work here, as just taking two differences seems to get the data to white noise

autoplot(wmurders)

#Trying differencing
wmurders.d1<-wmurders%>%diff()
autoplot(wmurders.d1)

wmurders.d1 %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.4697 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

ggtsdisplay(wmurders.d1)

wmurders.d2<-wmurders%>%diff()%>%diff()
autoplot(wmurders.d2)

wmurders.d2 %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.0458 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

ggtsdisplay(wmurders.d2)

b. Should you include a constant in the model? Explain. It should be included. Without a constant and d=2, the forecast would follow a quadratic path. With a constant, it is a straight line

c. Write this model in terms of the backshift operator. \[(1-B)^2y_t=(1+θ_1B+θ_2B)e_t\]

d. Fit the model using R and examine the residuals. Is the model satisfactory? This model looks reasonable

(fit <- Arima(wmurders, order=c(0,2,2)))

## Series: wmurders 
## ARIMA(0,2,2) 
## 
## Coefficients:
##           ma1     ma2
##       -1.0181  0.1470
## s.e.   0.1220  0.1156
## 
## sigma^2 estimated as 0.04702:  log likelihood=6.03
## AIC=-6.06   AICc=-5.57   BIC=-0.15

checkresiduals(fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,2,2)
## Q* = 11.764, df = 8, p-value = 0.1621
## 
## Model df: 2.   Total lags used: 10

e. Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.

summary(fit)

## Series: wmurders 
## ARIMA(0,2,2) 
## 
## Coefficients:
##           ma1     ma2
##       -1.0181  0.1470
## s.e.   0.1220  0.1156
## 
## sigma^2 estimated as 0.04702:  log likelihood=6.03
## AIC=-6.06   AICc=-5.57   BIC=-0.15
## 
## Training set error measures:
##                      ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.0113461 0.2088162 0.1525773 -0.2403396 4.331729 0.9382785
##                     ACF1
## Training set -0.05094066

fit.fcst<-forecast(fit, h=3)
fit.fcst

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005       2.480525 2.202620 2.758430 2.055506 2.905544
## 2006       2.374890 1.985422 2.764359 1.779250 2.970531
## 2007       2.269256 1.772305 2.766206 1.509235 3.029276

years <- length(wmurders)
e <- fit$residuals
fc1 <- 2*wmurders[years] - wmurders[years - 1] - 1.0181*e[years] + 0.1470*e[years - 1]
fc2 <- 2*fc1 - wmurders[years] + 0.1470*e[years]
fc3 <- 2*fc2 - fc1

c(fc1, fc2, fc3)

## [1] 2.480523 2.374887 2.269252

f. Create a plot of the series with forecasts and prediction intervals for the next three periods shown.

autoplot(fit.fcst)

g. Does auto.arima() give the same model you have chosen? If not, which model do you think is better? Autoarima suggests a (1,2,1) model, which is superior in terms of AIC, but not by much

auto.arima(wmurders)

## Series: wmurders 
## ARIMA(1,2,1) 
## 
## Coefficients:
##           ar1      ma1
##       -0.2434  -0.8261
## s.e.   0.1553   0.1143
## 
## sigma^2 estimated as 0.04632:  log likelihood=6.44
## AIC=-6.88   AICc=-6.39   BIC=-0.97

8. Consider austa, the total international visitors to Australia (in millions) for the period 1980-2015. a. Use auto.arima() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

austa.fit<-auto.arima(austa)
austa.fcst<-forecast(austa.fit, h=10)

austa.fit

## Series: austa 
## ARIMA(0,1,1) with drift 
## 
## Coefficients:
##          ma1   drift
##       0.3006  0.1735
## s.e.  0.1647  0.0390
## 
## sigma^2 estimated as 0.03376:  log likelihood=10.62
## AIC=-15.24   AICc=-14.46   BIC=-10.57

autoplot(austa.fcst)

b.Plot forecasts from an ARIMA(0,1,1) model with no drift and compare these to part a. Remove the MA term and plot again. The difference is without drift the forecast becomes a flat line

#no drift
austa.0.1.1<-Arima(austa, order=c(0,1,1), include.drift =FALSE)
austa.fcst.0.1.1<-forecast(austa.0.1.1, h=10)

austa.0.1.1

## Series: austa 
## ARIMA(0,1,1) 
## 
## Coefficients:
##          ma1
##       0.4936
## s.e.  0.1265
## 
## sigma^2 estimated as 0.04833:  log likelihood=3.73
## AIC=-3.45   AICc=-3.08   BIC=-0.34

autoplot(austa.fcst.0.1.1)

#no drift/MA
austa.0.1.0<-Arima(austa, order=c(0,1,0), include.drift =FALSE)
austa.fcst.0.1.0<-forecast(austa.0.1.0, h=10)

austa.0.1.0

## Series: austa 
## ARIMA(0,1,0) 
## 
## sigma^2 estimated as 0.06423:  log likelihood=-1.62
## AIC=5.24   AICc=5.36   BIC=6.8

autoplot(austa.fcst.0.1.0)

c. Plot forecasts from an ARIMA(2,1,3) model with drift. Remove the constant and see what happens.

#With constant
austa.2.1.3<-Arima(austa, order=c(2,1,3), include.drift =TRUE)
austa.fcst.2.1.3<-forecast(austa.2.1.3, h=10)

austa.2.1.3

## Series: austa 
## ARIMA(2,1,3) with drift 
## 
## Coefficients:
##          ar1      ar2      ma1     ma2     ma3   drift
##       1.7648  -0.8513  -1.7684  0.5571  0.2224  0.1725
## s.e.  0.1136   0.1182   0.2433  0.4351  0.2150  0.0051
## 
## sigma^2 estimated as 0.0276:  log likelihood=13.72
## AIC=-13.44   AICc=-9.29   BIC=-2.55

autoplot(austa.fcst.2.1.3)

#Without constant
m.austa<-austa-mean(austa)
austa.2.1.3.nc<-Arima(m.austa, order=c(2,1,3), include.drift =TRUE)
austa.fcst.2.1.3.nc<-forecast(austa.2.1.3.nc, h=10)

austa.2.1.3.nc

## Series: m.austa 
## ARIMA(2,1,3) with drift 
## 
## Coefficients:
##          ar1      ar2      ma1     ma2     ma3   drift
##       1.7648  -0.8513  -1.7684  0.5571  0.2224  0.1725
## s.e.  0.1136   0.1182   0.2433  0.4351  0.2150  0.0051
## 
## sigma^2 estimated as 0.0276:  log likelihood=13.72
## AIC=-13.44   AICc=-9.29   BIC=-2.55

autoplot(austa.fcst.2.1.3.nc)

d. Plot forecasts from an ARIMA(0,0,1) model with a constant. Remove the MA term and plot again.

#With MA
austa.0.0.1<-Arima(austa, order=c(0,0,1))
austa.fcst.0.0.1<-forecast(austa.0.0.1, h=10)

austa.0.0.1

## Series: austa 
## ARIMA(0,0,1) with non-zero mean 
## 
## Coefficients:
##          ma1    mean
##       1.0000  3.5515
## s.e.  0.0907  0.3090
## 
## sigma^2 estimated as 0.9347:  log likelihood=-50.64
## AIC=107.28   AICc=108.03   BIC=112.04

autoplot(austa.fcst.0.0.1)

#Without MA
austa.0.0.0<-Arima(austa, order=c(0,0,0))
austa.fcst.0.0.0<-forecast(austa.0.0.0, h=10)

austa.0.0.0

## Series: austa 
## ARIMA(0,0,0) with non-zero mean 
## 
## Coefficients:
##         mean
##       3.5414
## s.e.  0.2972
## 
## sigma^2 estimated as 3.27:  log likelihood=-71.9
## AIC=147.8   AICc=148.17   BIC=150.97

autoplot(austa.fcst.0.0.0)

e. Plot forecasts from an ARIMA(0,2,1) model with no constant.

#With MA
m.austa<-austa-mean(austa)
austa.0.2.1<-Arima(m.austa, order=c(0,2,1))
austa.fcst.0.2.1<-forecast(austa.0.2.1, h=10)

austa.0.2.1

## Series: m.austa 
## ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.7263
## s.e.   0.2277
## 
## sigma^2 estimated as 0.0397:  log likelihood=6.74
## AIC=-9.48   AICc=-9.09   BIC=-6.43

autoplot(austa.fcst.0.2.1)

9. For the usgdp series a. if necessary, find a suitable Box-Cox transformation for the data;

usgdp.bc<-BoxCox(usgdp, lambda=BoxCox.lambda(usgdp))

b. fit a suitable ARIMA model to the transformed data using auto.arima();

gdp.arima<-auto.arima(usgdp.bc)
gdp.arima

## Series: usgdp.bc 
## ARIMA(2,1,0) with drift 
## 
## Coefficients:
##          ar1     ar2   drift
##       0.2795  0.1208  0.1829
## s.e.  0.0647  0.0648  0.0202
## 
## sigma^2 estimated as 0.03518:  log likelihood=61.56
## AIC=-115.11   AICc=-114.94   BIC=-101.26

c. try some other plausible models by experimenting with the orders chosen;

#different models
austa.0.0.1<-Arima(austa, order=c(0,0,1))
austa.0.1.0<-Arima(austa, order=c(0,1,0))
austa.1.0.0<-Arima(austa, order=c(1,0,0))
austa.0.1.1<-Arima(austa, order=c(0,1,1))
austa.1.0.1<-Arima(austa, order=c(1,0,1))
austa.1.1.0<-Arima(austa, order=c(1,1,0))
austa.2.1.0<-Arima(austa, order=c(2,1,0))

#get the AIC
AIC(austa.0.0.1)

## [1] 107.2847

AIC(austa.0.1.0)

## [1] 5.242931

AIC(austa.1.0.0)

## [1] 14.18156

AIC(austa.0.1.1)

## [1] -3.451137

AIC(austa.1.0.1)

## [1] 5.476199

AIC(austa.1.1.0)

## [1] -7.964581

AIC(austa.2.1.0)

## [1] -6.470481

d. choose what you think is the best model and check the residual diagnostics;

checkresiduals(austa.1.1.0)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,1,0)
## Q* = 5.1686, df = 6, p-value = 0.5224
## 
## Model df: 1.   Total lags used: 7

e. produce forecasts of your fitted model. Do the forecasts look reasonable? No, the forecast is not reasonable because we

austa.1.1.0.fcst<-forecast(austa.1.1.0, h=12)
autoplot(austa.1.1.0.fcst)

f. compare the results with what you would obtain using ets() (with no transformation). ARIMA performs better

fets <- function(x, h) {
  forecast(ets(x), h = h)
}
farima <- function(x, h) {
  forecast(Arima(x, order=c(1,1,0)), h=h)
}


# Compute CV errors for ETS as e1
e1 <- tsCV(austa, fets, h=1)
# Compute CV errors for ARIMA as e2
e2 <- tsCV(austa, farima, h=1)
# Find MSE of each model class
mean(e1^2, na.rm=TRUE)

## [1] 0.05623684

mean(e2^2, na.rm=TRUE)

## [1] 0.04235035

10. Consider austourists, the quarterly visitor nights (in millions) spent by international tourists to Australia for the period 1999–2015. a. Describe the time plot. The plot is highly seasonal, with a fairly linear upward trend

autoplot(austourists)

b. What can you learn from the ACF graph? The data is not stationary, and is definitely seasonal

#ACF plot
ggAcf(diff(austourists))

c. What can you learn from the PACF graph? data has dependency on 3 lags

#PACF plot
ggPacf(diff(austourists))

d. Produce plots of the seasonally differenced data (1−B4)Yt. What model do these graphs suggest? a single difference seems good enough for arima models

#PACF plot
plot(diff(austourists, lag=4, differences=1))

ggtsdisplay(diff(austourists, lag=4, differences=1))

e. Does auto.arima() give the same model that you chose? If not, which model do you think is better? This looks about what we were predicting. only 1 for p in the difference and 4 in the seasonality difference

auto.arima(austourists)

## Series: austourists 
## ARIMA(1,0,0)(1,1,0)[4] with drift 
## 
## Coefficients:
##          ar1     sar1   drift
##       0.4705  -0.5305  0.5489
## s.e.  0.1154   0.1122  0.0864
## 
## sigma^2 estimated as 5.15:  log likelihood=-142.48
## AIC=292.97   AICc=293.65   BIC=301.6

f. Write the model in terms of the backshift operator, then without using the backshift operator. \[BY_{(t−1)}+e_t\] \[Y_{(t−2)}+e_t\]

11. Consider usmelec, the total net generation of electricity (in billion kilowatt hours) by the U.S. electric industry (monthly for the period January 1973 – June 2013). In general there are two peaks per year: in mid-summer and mid-winter. a. Examine the 12-month moving average of this series to see what kind of trend is involved.

usmelec.ma<-ma(usmelec, order=12)

autoplot(usmelec.ma)+
  autolayer(usmelec)

b. Do the data need transforming? If so, find a suitable transformation.

usmelec %>% BoxCox(BoxCox.lambda(usmelec))%>% autoplot

usmelec.bc<- BoxCox(usmelec, BoxCox.lambda(usmelec))

c. Are the data stationary? If not, find an appropriate differencing which yields stationary data.

usmelec.bc %>% diff(lag=12) %>% diff()%>% ggtsdisplay()

d. Identify a couple of ARIMA models that might be useful in describing the time series. Which of your models is the best according to their AIC values? Arima(1,1,1)(2,1,1) does the best

usmelec.bc.1<-Arima(usmelec.bc, order=c(1,1,1), seasonal=c(0,1,0))
usmelec.bc.2<-Arima(usmelec.bc, order=c(1,1,2), seasonal=c(0,1,1))
usmelec.bc.3<-Arima(usmelec.bc, order=c(1,1,1), seasonal=c(1,1,0))
usmelec.bc.4<-Arima(usmelec.bc, order=c(1,1,2), seasonal=c(1,1,1))
usmelec.bc.5<-Arima(usmelec.bc, order=c(1,1,1), seasonal=c(2,1,0))
usmelec.bc.6<-Arima(usmelec.bc, order=c(1,1,2), seasonal=c(2,1,1))
usmelec.bc.7<-Arima(usmelec.bc, order=c(1,1,1), seasonal=c(3,1,0))
usmelec.bc.8<-Arima(usmelec.bc, order=c(1,1,2), seasonal=c(3,1,1))

AIC(usmelec.bc.1)

## [1] -4861.89

AIC(usmelec.bc.2)

## [1] -5080.929

AIC(usmelec.bc.3)

## [1] -4951.984

AIC(usmelec.bc.4)

## [1] -5080.284

AIC(usmelec.bc.5)

## [1] -5001.763

AIC(usmelec.bc.6)

## [1] -5081.348

AIC(usmelec.bc.7)

## [1] -5034.12

AIC(usmelec.bc.8)

## [1] -5080.753

e. Estimate the parameters of your best model and do diagnostic testing on the residuals. Do the residuals resemble white noise? If not, try to find another ARIMA model which fits better. The residuals do look like white noise, though there is some left skew in the histogram. Not enough for me to be worried however

summary(usmelec.bc.6)

## Series: usmelec.bc 
## ARIMA(1,1,2)(2,1,1)[12] 
## 
## Coefficients:
##          ar1     ma1      ma2    sar1     sar2     sma1
##       0.2562  -0.681  -0.1017  0.0379  -0.0941  -0.8461
## s.e.  0.1902   0.194   0.1290  0.0556   0.0533   0.0342
## 
## sigma^2 estimated as 1.276e-06:  log likelihood=2547.67
## AIC=-5081.35   AICc=-5081.11   BIC=-5052.23
## 
## Training set error measures:
##                         ME        RMSE          MAE          MPE       MAPE
## Training set -5.846906e-05 0.001107131 0.0008377701 -0.003508599 0.05023644
##                   MASE         ACF1
## Training set 0.5585052 0.0009370377

checkresiduals(usmelec.bc.6)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,1,2)(2,1,1)[12]
## Q* = 26.609, df = 18, p-value = 0.08663
## 
## Model df: 6.   Total lags used: 24

f. Forecast the next 15 years of electricity generation by the U.S. electric industry. Get the latest figures from the EIA to check the accuracy of your forecasts. g. Eventually, the prediction intervals are so wide that the forecasts are not particularly useful. How many years of forecasts do you think are sufficiently accurate to be usable? The forecast is useful short term, but the width of the prediction interval indicates after that there is a lot of variation

autoplot(forecast(usmelec.bc.6, h=180))

12. For the mcopper data: a.if necessary, find a suitable Box-Cox transformation for the data;

mcopper %>% BoxCox(BoxCox.lambda(usmelec))%>% autoplot

mcopper.bc<- BoxCox(mcopper, BoxCox.lambda(mcopper))

b. fit a suitable ARIMA model to the transformed data using auto.arima();

arima.copper<-auto.arima(mcopper.bc)
arima.copper

## Series: mcopper.bc 
## ARIMA(0,1,1) 
## 
## Coefficients:
##          ma1
##       0.3720
## s.e.  0.0388
## 
## sigma^2 estimated as 0.04997:  log likelihood=45.05
## AIC=-86.1   AICc=-86.08   BIC=-77.43

c.try some other plausible models by experimenting with the orders chosen;

arima.copper.1<-Arima(mcopper.bc, order=c(0,0,1))
arima.copper.2<-Arima(mcopper.bc, order=c(0,1,0))
arima.copper.3<-Arima(mcopper.bc, order=c(1,0,0))
arima.copper.4<-Arima(mcopper.bc, order=c(0,1,1))
arima.copper.5<-Arima(mcopper.bc, order=c(1,0,1))
arima.copper.6<-Arima(mcopper.bc, order=c(1,1,0))
arima.copper.7<-Arima(mcopper.bc, order=c(1,1,1))

d.choose what you think is the best model and check the residual diagnostics;

AIC(arima.copper.1)

## [1] 1784.89

AIC(arima.copper.2)

## [1] -15.69837

AIC(arima.copper.3)

## [1] -6.41073

AIC(arima.copper.4)

## [1] -86.0969

AIC(arima.copper.5)

## [1] -77.54979

AIC(arima.copper.6)

## [1] -75.65902

AIC(arima.copper.7)

## [1] -84.10448

checkresiduals(arima.copper.4)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1)
## Q* = 22.913, df = 23, p-value = 0.4659
## 
## Model df: 1.   Total lags used: 24

e.produce forecasts of your fitted model. Do the forecasts look reasonable? It looks somewhat reasonable with the prediction interval, but the point forecasts probably are not

autoplot(forecast(arima.copper.4, h=36))

f. compare the results with what you would obtain using ets() (with no transformation). Arima does better

fets <- function(x, h) {
  forecast(ets(x), h = h)
}
farima <- function(x, h) {
  forecast(Arima(x, order=c(0,1,1)), h=h)
}


# Compute CV errors for ETS as e1
e1 <- tsCV(mcopper.bc, fets, h=1)
# Compute CV errors for ARIMA as e2
e2 <- tsCV(mcopper.bc, farima, h=1)
# Find MSE of each model class
mean(e1^2, na.rm=TRUE)

## [1] 0.05717561

mean(e2^2, na.rm=TRUE)

## [1] 0.0506208

13. Choose one of the following seasonal time series: hsales, auscafe, qauselec, qcement, qgas. a. Do the data need transforming? If so, find a suitable transformation. No transformation needed

autoplot(hsales)

b. Are the data stationary? If not, find an appropriate differencing which yields stationary data. The data is not stationary

ggtsdisplay(hsales)

usmelec.bc %>% diff(lag=12) %>% diff(2)%>% ggtsdisplay()

c. Identify a couple of ARIMA models that might be useful in describing the time series. Which of your models is the best according to their AIC values? (1,0,0)(1,1,1) is the best

#arima.hsales<-auto.arima(hsales)
#arima.hsales
min.aic<-10000000
min.combo<-c(0,0,0,0,0,0)
for (a in 0:1){
  for (b in 0:1){
    for (c in 0:1){
      for(d in 0:1){
        for(e in 0:1){
          for(f in 0:1){
            mod.AIC<-AIC(Arima(hsales, order=c(a,b,c), seasonal=c(d,e,f)))
            if (mod.AIC<min.aic){
              best.hsale.arima<-Arima(hsales, order=c(a,b,c), seasonal=c(d,e,f))
              min.combo<-c(a,b,c,d,e,f)
              min.aic<-mod.AIC
            }
            print(min.combo)
            print(min.aic)
          }
        }
      }
    }
  }
}

## [1] 0 0 0 0 0 0
## [1] 2147.299
## [1] 0 0 0 0 0 1
## [1] 2039.32
## [1] 0 0 0 0 1 0
## [1] 1976.051
## [1] 0 0 0 0 1 0
## [1] 1976.051
## [1] 0 0 0 0 1 0
## [1] 1976.051
## [1] 0 0 0 0 1 0
## [1] 1976.051
## [1] 0 0 0 0 1 0
## [1] 1976.051
## [1] 0 0 0 1 1 1
## [1] 1928.429
## [1] 0 0 0 1 1 1
## [1] 1928.429
## [1] 0 0 1 0 0 1
## [1] 1848.414
## [1] 0 0 1 0 1 0
## [1] 1805.108
## [1] 0 0 1 0 1 1
## [1] 1801.187
## [1] 0 0 1 0 1 1
## [1] 1801.187
## [1] 0 0 1 0 1 1
## [1] 1801.187
## [1] 0 0 1 0 1 1
## [1] 1801.187
## [1] 0 0 1 1 1 1
## [1] 1745.596
## [1] 0 0 1 1 1 1
## [1] 1745.596
## [1] 0 1 0 0 0 1
## [1] 1740.13
## [1] 0 1 0 0 1 0
## [1] 1690.202
## [1] 0 1 0 0 1 1
## [1] 1568.614
## [1] 0 1 0 0 1 1
## [1] 1568.614
## [1] 0 1 0 0 1 1
## [1] 1568.614
## [1] 0 1 0 0 1 1
## [1] 1568.614
## [1] 0 1 0 1 1 1
## [1] 1567.972
## [1] 0 1 0 1 1 1
## [1] 1567.972
## [1] 0 1 0 1 1 1
## [1] 1567.972
## [1] 0 1 0 1 1 1
## [1] 1567.972
## [1] 0 1 0 1 1 1
## [1] 1567.972
## [1] 0 1 0 1 1 1
## [1] 1567.972
## [1] 0 1 0 1 1 1
## [1] 1567.972
## [1] 0 1 0 1 1 1
## [1] 1567.972
## [1] 0 1 1 1 1 1
## [1] 1567.82
## [1] 0 1 1 1 1 1
## [1] 1567.82
## [1] 0 1 1 1 1 1
## [1] 1567.82
## [1] 0 1 1 1 1 1
## [1] 1567.82
## [1] 1 0 0 0 1 1
## [1] 1563.88
## [1] 1 0 0 0 1 1
## [1] 1563.88
## [1] 1 0 0 0 1 1
## [1] 1563.88
## [1] 1 0 0 0 1 1
## [1] 1563.88
## [1] 1 0 0 1 1 1
## [1] 1562.055
## [1] 1 0 0 1 1 1
## [1] 1562.055
## [1] 1 0 0 1 1 1
## [1] 1562.055
## [1] 1 0 0 1 1 1
## [1] 1562.055
## [1] 1 0 0 1 1 1
## [1] 1562.055
## [1] 1 0 0 1 1 1
## [1] 1562.055
## [1] 1 0 0 1 1 1
## [1] 1562.055
## [1] 1 0 0 1 1 1
## [1] 1562.055
## [1] 1 0 0 1 1 1
## [1] 1562.055
## [1] 1 0 0 1 1 1
## [1] 1562.055
## [1] 1 0 0 1 1 1
## [1] 1562.055
## [1] 1 0 0 1 1 1
## [1] 1562.055
## [1] 1 0 0 1 1 1
## [1] 1562.055
## [1] 1 0 0 1 1 1
## [1] 1562.055
## [1] 1 0 0 1 1 1
## [1] 1562.055
## [1] 1 0 0 1 1 1
## [1] 1562.055
## [1] 1 0 0 1 1 1
## [1] 1562.055
## [1] 1 0 0 1 1 1
## [1] 1562.055
## [1] 1 0 0 1 1 1
## [1] 1562.055
## [1] 1 0 0 1 1 1
## [1] 1562.055
## [1] 1 0 0 1 1 1
## [1] 1562.055
## [1] 1 0 0 1 1 1
## [1] 1562.055
## [1] 1 0 0 1 1 1
## [1] 1562.055
## [1] 1 0 0 1 1 1
## [1] 1562.055
## [1] 1 0 0 1 1 1
## [1] 1562.055

d. Estimate the parameters of your best model and do diagnostic testing on the residuals. Do the residuals resemble white noise? If not, try to find another ARIMA model which fits better. Residuals look good enough

best.hsale.arima

## Series: hsales 
## ARIMA(1,0,0)(1,1,1)[12] 
## 
## Coefficients:
##          ar1    sar1     sma1
##       0.8934  0.1260  -1.0000
## s.e.  0.0289  0.0648   0.0483
## 
## sigma^2 estimated as 19.04:  log likelihood=-777.03
## AIC=1562.06   AICc=1562.21   BIC=1576.34

checkresiduals(best.hsale.arima)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,0)(1,1,1)[12]
## Q* = 19.239, df = 21, p-value = 0.5698
## 
## Model df: 3.   Total lags used: 24

e. Forecast the next 24 months of data using your preferred model.

autoplot(forecast(best.hsale.arima, h=24))

f. Compare the forecasts obtained using ets()

fets <- function(x, h) {
  forecast(ets(x), h = h)
}
farima <- function(x, h) {
  forecast(Arima(x, order=c(1,0,0), seasonal=c(1,1,1)), h=h)
}


# Compute CV errors for ETS as e1
e1 <- tsCV(hsales, fets, h=1)
# Compute CV errors for ARIMA as e2
e2 <- tsCV(hsales, farima, h=1)
# Find MSE of each model class
mean(e1^2, na.rm=TRUE)

## [1] 24.727

mean(e2^2, na.rm=TRUE)

## [1] 22.6424

14. For the same time series you used in the previous exercise, try using a non-seasonal model applied to the seasonally adjusted data obtained from STL. The stlf() function will make the calculations easy (with method=“arima”). Compare the forecasts with those obtained in the previous exercise. Which do you think is the best approach? ARIMA captures the seasonality better

hsales.dec <- stl(hsales, t.window=13, s.window="periodic") 
hsales.sadj <- seasadj(hsales.dec)

hsales.stl <- stlf(hsales.sadj, method="arima")
hsales.fcst <- forecast(hsales.stl, h=10)
plot(hsales.stl)

15. For your retail time series (Exercise 5 above):

a. develop an appropriate seasonal ARIMA model;

ret.seas.arima<-auto.arima(ret)
ret.seas.arima

## Series: ret 
## ARIMA(1,0,1)(1,1,1)[12] with drift 
## 
## Coefficients:
##          ar1      ma1    sar1     sma1   drift
##       0.9047  -0.2277  0.3904  -0.8288  0.7500
## s.e.  0.0268   0.0635  0.0819   0.0536  0.1447
## 
## sigma^2 estimated as 169.2:  log likelihood=-1471.51
## AIC=2955.02   AICc=2955.25   BIC=2978.49

b. compare the forecasts with those you obtained in earlier chapters; The model I used in problem 5 is also seasonal, though it does not have the (1,0,1) ARIMA, but a (1,0,0), so it is slightly worse

c. Obtain up-to-date retail data from the ABS website (Cat 8501.0, Table 11), and compare your forecasts with the actual numbers. How good were the forecasts from the various models?

I could not find the proper table

16. Consider sheep, the sheep population of England and Wales from 1867–1939. a. Produce a time plot of the time series.

autoplot(sheep)

b. Assume you decide to fit the following model: \[y_t=y_{t−1}+ϕ_1(y_{t−1}−y_{t−2})+ϕ_2(y_{t−2}−y_{t−3})+ϕ_3(y_{t−3}−y_{t−4})+εt\] where εt is a white noise series. What sort of ARIMA model is this (i.e., what are p, d, and q)?

\[(1 - B)yt - ϕ_1B(1- B)y_t - ϕ_2B^2(1- B)y_t - ϕ_3B^3(1- B)y_t = et\] \[(1 - ϕ_1B - ϕ_2B^2 - ϕ_3B^3)(1 - B)y_t = e_t\] It is ARIMA(3, 1, 0) model.

c. By examining the ACF and PACF of the differenced data, explain why this model is appropriate. Both the ACF and PACF mostly stay within the bounds

#ACF plot
ggAcf(diff(sheep))

#PACF plot
ggPacf(diff(sheep))

d. The last five values of the series are given below: \[y_t=y_{t−1}+ϕ_1(y_{t−1}−y_{t−2})+ϕ_2(y_{t−2}−y_{t−3})+ϕ_3(y_{t−3}−y_{t−4})+εt\] 1940 \[y_t=1797+0.42(1797−1791)-0.2(1791−1627)-0.3(1627−1665)\] \[y_t=1797+2.52-32.8+11.4\] \[y_t=1778.12\]

1941 \[y_t=1778.12+0.42(1778.12−1797)-0.2(1797−1791)-0.3(1791−1627)\] \[y_t=1778.12-7.93-1.2-49.2\] \[y_t=1719.79\]

1942 \[y_t=1719.79+0.42(1719.79−1778.12)-0.2(1778.12−1797)-0.3(1797−1791)\] \[y_t=1719.79-24.5+3.8-1.8\] \[y_t=1687.29\]

e. Now fit the model in R and obtain the forecasts using forecast. How are they different from yours? Why? The estimates are slightly off, because we did not account for the error term

forecast(Arima(sheep, order=c(3,1,0)))

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1940       1777.996 1687.454 1868.537 1639.524 1916.467
## 1941       1718.869 1561.544 1876.193 1478.261 1959.476
## 1942       1695.985 1494.151 1897.819 1387.306 2004.664
## 1943       1704.067 1482.975 1925.160 1365.935 2042.199
## 1944       1730.084 1499.953 1960.215 1378.129 2082.039
## 1945       1746.371 1508.361 1984.381 1382.366 2110.376
## 1946       1745.518 1495.743 1995.292 1363.520 2127.515
## 1947       1733.953 1468.206 1999.699 1327.529 2140.377
## 1948       1724.299 1442.092 2006.506 1292.701 2155.898
## 1949       1722.829 1426.874 2018.784 1270.204 2175.453

18. Before doing this exercise, you will need to install the Quandl package in R using install.packages(“Quandl”) a. Select a time series from Quandl. Then copy its short URL and import the data using y <- Quandl(“?????”, api_key=“?????”, type=“ts”) (Replace each ????? with the appropriate values.)

library(Quandl)

## Warning: package 'Quandl' was built under R version 4.0.3

## Loading required package: xts

## Warning: package 'xts' was built under R version 4.0.3

## Loading required package: zoo

## Warning: package 'zoo' was built under R version 4.0.3

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

y <- Quandl("UMICH/SOC1", api_key="t_QEjkEczJraydsfJGw6", type="ts", start_date="1980-01-01")

b. Plot graphs of the data, and try to identify an appropriate ARIMA model.

y.arima<-auto.arima(y)
y.arima

## Series: y 
## ARIMA(2,1,1)(1,0,0)[12] 
## 
## Coefficients:
##          ar1      ar2      ma1    sar1
##       0.7740  -0.1038  -0.8036  0.0676
## s.e.  0.1172   0.0515   0.1104  0.0488
## 
## sigma^2 estimated as 15.44:  log likelihood=-1358.38
## AIC=2726.77   AICc=2726.89   BIC=2747.72

c. Do residual diagnostic checking of your ARIMA model. Are the residuals white noise? Residuals look like white noise

checkresiduals(y.arima)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,1,1)(1,0,0)[12]
## Q* = 18.694, df = 20, p-value = 0.5418
## 
## Model df: 4.   Total lags used: 24

d. Use your chosen ARIMA model to forecast the next four years.

y.fcst<-forecast(y.arima, h=48)
autoplot(y.fcst)

e. Now try to identify an appropriate ETS model.

y.ets<-ets(y)

f. Do residual diagnostic checking of your ETS model. Are the residuals white noise? They more or less look like white noise

checkresiduals(y.ets)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,N,N)
## Q* = 34.022, df = 22, p-value = 0.04887
## 
## Model df: 2.   Total lags used: 24

g. Use your chosen ETS model to forecast the next four years.

y.ets.fcst<-forecast(y.ets, h=48)
autoplot(y.ets.fcst)

h. Which of the two models do you prefer? I prefer the arima because it narrows the prediction interval but is still very believable. The ETS interval is far too wide.

Homework #2

Joey Biasi

11/22/2020