library(fpp2)
## Warning: package 'fpp2' was built under R version 4.0.3
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## -- Attaching packages ------------------------------------ fpp2 2.4 --
## v ggplot2 3.3.0 v fma 2.4
## v forecast 8.12 v expsmooth 2.3
## Warning: package 'forecast' was built under R version 4.0.2
## Warning: package 'fma' was built under R version 4.0.3
## Warning: package 'expsmooth' was built under R version 4.0.3
##
library(tseries)
## Warning: package 'tseries' was built under R version 4.0.2
1.Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
1a.Explain the differences among these figures. Do they all indicate that the data are white noise?
Smaller bars indicate smaller numbers, and larger bars indicate larger numbers. Yes, these all indicate white noise because all of them are within the blue lines on top and bottom.
1b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?
When the sample size increases, the critical values decrease.
2.A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
autoplot(ibmclose)
autoplot(acf(ibmclose))
autoplot(pacf(ibmclose))
These plots show that the data is non-stationary.
3.For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
box8.3 = function(x)
{lam = BoxCox.lambda(x)
transformed = BoxCox(x, lambda = lam)
tsdisplay(transformed)
return(paste("BClam = ", round(lam, 4)))}
3a. usnetelec
box8.3(usnetelec)
## [1] "BClam = 0.5168"
One first differencing
3b. usgdp
box8.3(usgdp)
## [1] "BClam = 0.3664"
One first differencing
3c. mcopper
box8.3(mcopper)
## [1] "BClam = 0.1919"
One first differencing
3d. enplanements
box8.3(enplanements)
## [1] "BClam = -0.2269"
One first differencing and seasonal differencing
3e. visitors
box8.3(visitors)
## [1] "BClam = 0.2775"
One first differencing and seasonal differencing
4.For the enplanements data, write down the differences you chose above using backshift operator notation.
#One first differencing and seasonal differencing
library(readxl)
library(seasonal)
## Warning: package 'seasonal' was built under R version 4.0.3
retail= read_excel("C:/Users/burtkb/Downloads/retail.xlsx", skip=1)
retail.ts = ts(retail[,"A3349627V"], frequency = 12, start = c(1982,4))
autoplot(retail.ts)
ndiffs(retail.ts)
## [1] 1
nsdiffs(retail.ts)
## [1] 1
lam.retail = BoxCox.lambda(retail.ts)
lam.retail
## [1] -0.0579656
autoplot(BoxCox(retail.ts, lam.retail))
6a. Use the following R code to generate data from an AR(1) model with ϕ1 =0.6 and σ 2 = 1. The process starts with y1=0.
y <- ts(numeric(100))
e <- rnorm(100)
my1 = function(phi, y, e){
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]}
autoplot(y)
6c.Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1.
my2 = function(theta,seed.my){
set.seed(seed.my)
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100){
y[i] <- theta*y[i-1] + e[i]
return(autoplot(y))}
}
6d.Produce a time plot for the series. How does the plot change as you change θ1 ?
my2(.2,1)
my2(1,1)
6e.Generate data from an ARIMA(1,1) model with ϕ1=0.6, θ1=0.6 and σ2=1.
my.ar = ts(numeric(50))
y2 = rnorm(50)
for(i in 2:50){
my.ar[i] = 0.6*my.ar[i-1] + 0.6*e[i-1] + e[i]
}
6f.Generate data from an AR(2) model withϕ1=−0.8, ϕ2=0.3 and σ2=1. (Note that these parameters will give a non-stationary series.)
my.ar2 = ts(numeric(50))
y2 = rnorm(50)
for(i in 3:50){
my.ar2[i] = -0.8*my.ar2[i-1] + 0.3*my.ar2[i-1] + e[i]
}
6g.Graph the latter two series and compare them.
autoplot(my.ar)
autoplot(my.ar2)
7.Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.
7a.By studying appropriate graphs of the series in R, find an appropriate ARIMA( p,d,q) model for these data.
autoplot(wmurders)
wmurders %>% ndiffs()
## [1] 2
wmurders %>% diff() %>% ggtsdisplay()
w.ar = auto.arima(wmurders)
summary(w.ar)
## Series: wmurders
## ARIMA(1,2,1)
##
## Coefficients:
## ar1 ma1
## -0.2434 -0.8261
## s.e. 0.1553 0.1143
##
## sigma^2 estimated as 0.04632: log likelihood=6.44
## AIC=-6.88 AICc=-6.39 BIC=-0.97
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.01065956 0.2072523 0.1528734 -0.2149476 4.335214 0.9400996
## ACF1
## Training set 0.02176343
checkresiduals(w.ar)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,2,1)
## Q* = 12.419, df = 8, p-value = 0.1335
##
## Model df: 2. Total lags used: 10
7b.Should you include a constant in the model? Explain.
No.Including a constant would introducec a drift.
7c.Write this model in terms of the backshift operator.
7d.Fit the model using R and examine the residuals. Is the model satisfactory?
The arima model above was a satisfactory model.
7e.Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.
forecast(w.ar,3)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2005 2.470660 2.194836 2.746484 2.048824 2.892496
## 2006 2.363106 1.986351 2.739862 1.786908 2.939304
## 2007 2.252833 1.765391 2.740276 1.507354 2.998313
7f.Create a plot of the series with forecasts and prediction intervals for the next three periods shown.
autoplot(forecast(w.ar,3))
7g.Does auto.arima() give the same model you have chosen? If not, which model do you think is better?
Yes
8.Consider austa, the total international visitors to Australia (in millions) for the period 1980-2015.
8a.Use auto.arima() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.
austamod = auto.arima(austa)
checkresiduals(austamod)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,1) with drift
## Q* = 2.297, df = 5, p-value = 0.8067
##
## Model df: 2. Total lags used: 7
autoplot(forecast(austamod))
ARIMA(0,1,1) with drift was chosen
8b.Plot forecasts from an ARIMA(0,1,1) model with no drift and compare these to part a. Remove the MA term and plot again.
austa011 = forecast(arima(austa, c(0,1,1)), h=10)
autoplot(austa011)
austa010 = forecast(arima(austa, c(0,1,0)), h=10)
autoplot(austa010)
8c.Plot forecasts from an ARIMA(2,1,3) model with drift. Remove the constant and see what happens.
austa213 = forecast(Arima(austa, c(2,1,3), include.drift = TRUE), h=10)
autoplot(austa213)
8d.Plot forecasts from an ARIMA(0,0,1) model with a constant. Remove the MA term and plot again.
austa001 = forecast(arima(austa, c(0,0,1)), include.constant=TRUE, h=10)
## Warning in forecast.Arima(arima(austa, c(0, 0, 1)), include.constant = TRUE, :
## The non-existent include.constant arguments will be ignored.
autoplot(austa001)
8e.Plot forecasts from an ARIMA(0,2,1) model with no constant.
austa021 = forecast(arima(austa, c(0,2,1)), include.constant=FALSE, h=10)
## Warning in forecast.Arima(arima(austa, c(0, 2, 1)), include.constant = FALSE, :
## The non-existent include.constant arguments will be ignored.
autoplot(austa021)
9.For the usgdp series:
9a.if necessary, find a suitable Box-Cox transformation for the data;
str(usgdp)
## Time-Series [1:237] from 1947 to 2006: 1570 1569 1568 1591 1616 ...
autoplot(usgdp)
autoplot(BoxCox(usgdp,BoxCox.lambda(usgdp)))
bc.usgdp = BoxCox.lambda(usgdp)
9b.fit a suitable ARIMA model to the transformed data using auto.arima();
ar.usgdp = auto.arima(usgdp, lambda = bc.usgdp)
checkresiduals(ar.usgdp)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,1,0) with drift
## Q* = 6.5772, df = 5, p-value = 0.254
##
## Model df: 3. Total lags used: 8
summary(ar.usgdp)
## Series: usgdp
## ARIMA(2,1,0) with drift
## Box Cox transformation: lambda= 0.366352
##
## Coefficients:
## ar1 ar2 drift
## 0.2795 0.1208 0.1829
## s.e. 0.0647 0.0648 0.0202
##
## sigma^2 estimated as 0.03518: log likelihood=61.56
## AIC=-115.11 AICc=-114.94 BIC=-101.26
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 1.195275 39.2224 29.29521 -0.01363259 0.6863491 0.1655687
## ACF1
## Training set -0.03824844
9c.try some other plausible models by experimenting with the orders chosen;
usgdp001 = forecast(arima(usgdp, c(0,0,1)), include.constant=TRUE, h=10)
## Warning in forecast.Arima(arima(usgdp, c(0, 0, 1)), include.constant = TRUE, :
## The non-existent include.constant arguments will be ignored.
autoplot(usgdp001)
summary(usgdp001)
##
## Forecast method: ARIMA(0,0,1) with non-zero mean
##
## Model Information:
##
## Call:
## arima(x = usgdp, order = c(0, 0, 1))
##
## Coefficients:
## ma1 intercept
## 1.0000 5174.3663
## s.e. 0.0113 180.8188
##
## sigma^2 estimated as 1945380: log likelihood = -2055.02, aic = 4116.04
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 4.604078 1394.769 1176.846 -18.56536 32.04501 6.651219 0.9698935
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2006 Q2 8241.316 6450.098 10032.535 5501.884 10980.749
## 2006 Q3 5174.366 2646.505 7702.227 1308.337 9040.396
## 2006 Q4 5174.366 2646.505 7702.227 1308.337 9040.396
## 2007 Q1 5174.366 2646.505 7702.227 1308.337 9040.396
## 2007 Q2 5174.366 2646.505 7702.227 1308.337 9040.396
## 2007 Q3 5174.366 2646.505 7702.227 1308.337 9040.396
## 2007 Q4 5174.366 2646.505 7702.227 1308.337 9040.396
## 2008 Q1 5174.366 2646.505 7702.227 1308.337 9040.396
## 2008 Q2 5174.366 2646.505 7702.227 1308.337 9040.396
## 2008 Q3 5174.366 2646.505 7702.227 1308.337 9040.396
usgdp213 = forecast(Arima(usgdp, c(2,1,3), include.drift = TRUE), h=10)
autoplot(usgdp213)
summary(usgdp213)
##
## Forecast method: ARIMA(2,1,3) with drift
##
## Model Information:
## Series: usgdp
## ARIMA(2,1,3) with drift
##
## Coefficients:
## ar1 ar2 ma1 ma2 ma3 drift
## 0.1575 0.5256 0.1514 -0.2358 -0.0737 42.1169
## s.e. 0.2880 0.1862 0.2958 0.1794 0.1148 6.8481
##
## sigma^2 estimated as 1650: log likelihood=-1206.21
## AIC=2426.42 AICc=2426.91 BIC=2450.66
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.2689172 40.01301 30.38976 -0.09767027 0.7220301 0.1717548
## ACF1
## Training set -0.0004712095
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2006 Q2 11484.19 11432.14 11536.24 11404.58 11563.80
## 2006 Q3 11573.06 11487.32 11658.80 11441.94 11704.19
## 2006 Q4 11636.14 11514.88 11757.39 11450.69 11821.58
## 2007 Q1 11706.13 11553.24 11859.02 11472.30 11939.95
## 2007 Q2 11763.65 11578.99 11948.31 11481.24 12046.06
## 2007 Q3 11822.84 11608.38 12037.31 11494.85 12150.84
## 2007 Q4 11875.74 11632.20 12119.29 11503.27 12248.22
## 2008 Q1 11928.53 11657.33 12199.74 11513.76 12343.31
## 2008 Q2 11978.00 11680.09 12275.91 11522.39 12433.61
## 2008 Q3 12026.88 11703.45 12350.32 11532.24 12521.53
9d.choose what you think is the best model and check the residual diagnostics;
#usgdp213 was the better model
checkresiduals(usgdp213)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,1,3) with drift
## Q* = 7.5675, df = 3, p-value = 0.05585
##
## Model df: 6. Total lags used: 9
9e.produce forecasts of your fitted model. Do the forecasts look reasonable?
autoplot(forecast(usgdp,h=10))
9f.compare the results with what you would obtain using ets() (with no transformation).
ets.usgdp = ets(usgdp)
summary(ets.usgdp)
## ETS(A,A,N)
##
## Call:
## ets(y = usgdp)
##
## Smoothing parameters:
## alpha = 0.9999
## beta = 0.278
##
## Initial states:
## l = 1557.4589
## b = 18.6862
##
## sigma: 41.8895
##
## AIC AICc BIC
## 3072.303 3072.562 3089.643
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 1.30185 41.53448 31.85324 0.02319607 0.7543931 0.180026 0.07596492
checkresiduals(ets.usgdp)
##
## Ljung-Box test
##
## data: Residuals from ETS(A,A,N)
## Q* = 21.461, df = 4, p-value = 0.0002565
##
## Model df: 4. Total lags used: 8
autoplot(forecast(ets.usgdp,h=10))
10.Consider austourists, the quarterly visitor nights (in millions) spent by international tourists to Australia for the period 1999–2015.
10a.Describe the time plot.
autoplot(austourists)
ggseasonplot(austourists)
There’s seasonality and an upward trend.
10b.What can you learn from the ACF graph?
ggAcf(austourists)
Autocorrelations decrease
10c.What can you learn from the PACF graph?
ggPacf(austourists)
10d.Produce plots of the seasonally differenced data (1−B4)Yt. What model do these graphs suggest?
autoplot(diff(austourists, lag=4))
par(mfrow=c(1,2))
acf(diff(austourists,lag=4))
pacf(diff(austourists,lag=41))
The graphs suggest arima(1,1,0)
10e.Does auto.arima() give the same model that you chose? If not, which model do you think is better?
auto.arima(austourists)
## Series: austourists
## ARIMA(1,0,0)(1,1,0)[4] with drift
##
## Coefficients:
## ar1 sar1 drift
## 0.4705 -0.5305 0.5489
## s.e. 0.1154 0.1122 0.0864
##
## sigma^2 estimated as 5.15: log likelihood=-142.48
## AIC=292.97 AICc=293.65 BIC=301.6
It gave ARIMA(1,0,0)(1,1,0)[4]. The auto.arima gave the best model.
11a.Examine the 12-month moving average of this series to see what kind of trend is involved.
usmelec.12 = ma(usmelec, order = 12)
autoplot(usmelec.12)
There’s an upward trend.
11b.Do the data need transforming? If so, find a suitable transformation.
Yes, we will do BoxCox
bc.usmelec = BoxCox.lambda(usmelec)
bc.usmelec2 = BoxCox(usmelec, lambda=bc.usmelec)
autoplot(bc.usmelec2)
11c.Are the data stationary? If not, find an appropriate differencing which yields stationary data.
#the data is not stationary
ndiffs(bc.usmelec2)
## [1] 1
nsdiffs(bc.usmelec2)
## [1] 1
#first differencing will be used
11d.Identify a couple of ARIMA models that might be useful in describing the time series. Which of your models is the best according to their AIC values?
plot(decompose(bc.usmelec2))
#model1
bc.us = Arima(usmelec, order = c(12, 1, 0), seasonal = c(1, 0, 0), lambda = 0)
bc.us
## Series: usmelec
## ARIMA(12,1,0)(1,0,0)[12]
## Box Cox transformation: lambda= 0
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 ar6 ar7 ar8
## -0.4196 -0.3817 -0.2097 -0.2566 -0.2020 -0.0813 -0.0862 -0.0655
## s.e. 0.0436 0.0474 0.0506 0.0517 0.0523 0.0529 0.0526 0.0523
## ar9 ar10 ar11 ar12 sar1
## 0.0091 0.1031 0.1144 -0.2935 0.9610
## s.e. 0.0518 0.0508 0.0477 0.0440 0.0107
##
## sigma^2 estimated as 0.0009182: log likelihood=1000.95
## AIC=-1973.91 AICc=-1973.02 BIC=-1915.33
bc.us2 = Arima(usmelec, order = c(6, 1, 0), seasonal = c(3, 1, 0), lambda = 0)
bc.us2
## Series: usmelec
## ARIMA(6,1,0)(3,1,0)[12]
## Box Cox transformation: lambda= 0
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 ar6 sar1 sar2
## -0.4448 -0.4031 -0.2257 -0.2341 -0.1611 -0.0692 -0.6594 -0.4746
## s.e. 0.0460 0.0500 0.0522 0.0521 0.0500 0.0461 0.0453 0.0506
## sar3
## -0.2839
## s.e. 0.0457
##
## sigma^2 estimated as 0.0007708: log likelihood=1024.58
## AIC=-2029.16 AICc=-2028.68 BIC=-1987.57
summary(bc.us)
## Series: usmelec
## ARIMA(12,1,0)(1,0,0)[12]
## Box Cox transformation: lambda= 0
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 ar6 ar7 ar8
## -0.4196 -0.3817 -0.2097 -0.2566 -0.2020 -0.0813 -0.0862 -0.0655
## s.e. 0.0436 0.0474 0.0506 0.0517 0.0523 0.0529 0.0526 0.0523
## ar9 ar10 ar11 ar12 sar1
## 0.0091 0.1031 0.1144 -0.2935 0.9610
## s.e. 0.0518 0.0508 0.0477 0.0440 0.0107
##
## sigma^2 estimated as 0.0009182: log likelihood=1000.95
## AIC=-1973.91 AICc=-1973.02 BIC=-1915.33
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.08301885 8.249448 6.124246 -0.03015208 2.308762 0.6801134
## ACF1
## Training set 0.001685155
summary(bc.us2)
## Series: usmelec
## ARIMA(6,1,0)(3,1,0)[12]
## Box Cox transformation: lambda= 0
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 ar6 sar1 sar2
## -0.4448 -0.4031 -0.2257 -0.2341 -0.1611 -0.0692 -0.6594 -0.4746
## s.e. 0.0460 0.0500 0.0522 0.0521 0.0500 0.0461 0.0453 0.0506
## sar3
## -0.2839
## s.e. 0.0457
##
## sigma^2 estimated as 0.0007708: log likelihood=1024.58
## AIC=-2029.16 AICc=-2028.68 BIC=-1987.57
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.03805136 7.531002 5.64684 -0.05110048 2.120754 0.6270962
## ACF1
## Training set -0.02571378
11e.Estimate the parameters of your best model and do diagnostic testing on the residuals. Do the residuals resemble white noise? If not, try to find another ARIMA model which fits better.
checkresiduals(bc.us2)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(6,1,0)(3,1,0)[12]
## Q* = 41.028, df = 15, p-value = 0.0003167
##
## Model df: 9. Total lags used: 24
#no, it does not resemble white noise.
auto.us = auto.arima(usmelec, seasonal = TRUE, stepwise = FALSE, approximation = FALSE, lambda = "auto")
summary(auto.us)
## Series: usmelec
## ARIMA(1,1,1)(2,1,1)[12]
## Box Cox transformation: lambda= -0.5738331
##
## Coefficients:
## ar1 ma1 sar1 sar2 sma1
## 0.3888 -0.8265 0.0403 -0.0958 -0.8471
## s.e. 0.0630 0.0375 0.0555 0.0531 0.0341
##
## sigma^2 estimated as 1.274e-06: log likelihood=2547.36
## AIC=-5082.72 AICc=-5082.54 BIC=-5057.76
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.5001989 7.235656 5.302245 -0.1959073 2.002005 0.588828
## ACF1
## Training set -0.02860273
checkresiduals(auto.us)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,1,1)(2,1,1)[12]
## Q* = 28.012, df = 19, p-value = 0.08319
##
## Model df: 5. Total lags used: 24
#it produced ARIMA(1,1,1)(2,1,1), and it seems to be a bit closer.
11f.Forecast the next 15 years of electricity generation by the U.S. electric industry. Get the latest figures from the EIA to check the accuracy of your forecasts.
library(readxl)
energy <- read.csv("C:/Users/burtkb/Downloads/MER_T07_02A.csv")
names(energy) = c('month', 'elec.ind')
energy.ts = ts(energy$elec.ind, start= c(1973,1), frequency = 12)
autoplot(energy.ts)
f.energy = forecast(auto.us, h=15*12)
autoplot(f.energy)
11g.Eventually, the prediction intervals are so wide that the forecasts are not particularly useful. How many years of forecasts do you think are sufficiently accurate to be usable?
Perhaps, ,the next 3-4 years buut not much more than that.
12a.if necessary, find a suitable Box-Cox transformation for the data;
str(mcopper)
## Time-Series [1:564] from 1960 to 2007: 255 260 249 258 244 ...
autoplot(mcopper)
autoplot(BoxCox(mcopper, BoxCox.lambda(mcopper)))
bc.mcopper = BoxCox.lambda(mcopper)
12b.fit a suitable ARIMA model to the transformed data using auto.arima();
auto.mcopper = auto.arima(mcopper, lambda= bc.mcopper)
auto.mcopper
## Series: mcopper
## ARIMA(0,1,1)
## Box Cox transformation: lambda= 0.1919047
##
## Coefficients:
## ma1
## 0.3720
## s.e. 0.0388
##
## sigma^2 estimated as 0.04997: log likelihood=45.05
## AIC=-86.1 AICc=-86.08 BIC=-77.43
12c.try some other plausible models by experimenting with the orders chosen;
ndiffs(mcopper)
## [1] 1
nsdiffs(mcopper)
## [1] 0
ggtsdisplay(diff(mcopper))
mcop001 = Arima(mcopper,order = c(0, 0, 1), lambda = bc.mcopper)
summary(mcop001)
## Series: mcopper
## ARIMA(0,0,1) with non-zero mean
## Box Cox transformation: lambda= 0.1919047
##
## Coefficients:
## ma1 mean
## 0.9446 13.8817
## s.e. 0.0090 0.0956
##
## sigma^2 estimated as 1.371: log likelihood=-889.45
## AIC=1784.89 AICc=1784.93 BIC=1797.9
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 98.69302 372.2165 232.1022 -8.806098 28.3679 1.044277 0.9136532
mcop510 = Arima(mcopper,order = c(5, 1, 0), lambda = bc.mcopper)
summary(mcop510)
## Series: mcopper
## ARIMA(5,1,0)
## Box Cox transformation: lambda= 0.1919047
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5
## 0.3706 -0.1475 0.0609 -0.0038 0.0134
## s.e. 0.0421 0.0450 0.0454 0.0450 0.0422
##
## sigma^2 estimated as 0.05028: log likelihood=45.32
## AIC=-78.63 AICc=-78.48 BIC=-52.63
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 3.263194 77.22921 44.98219 0.1566706 4.310128 0.2023845
## ACF1
## Training set -0.08666255
12d.choose what you think is the best model and check the residual diagnostics;
#mcop510 is the better model
checkresiduals(mcop510)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(5,1,0)
## Q* = 22.089, df = 19, p-value = 0.2798
##
## Model df: 5. Total lags used: 24
12e.produce forecasts of your fitted model. Do the forecasts look reasonable?
autoplot(forecast(mcop510))
12f.compare the results with what you would obtain using ets() (with no transformation).
f.mcopper = ets(mcopper)
f.mcopper
## ETS(M,Ad,N)
##
## Call:
## ets(y = mcopper)
##
## Smoothing parameters:
## alpha = 0.9999
## beta = 0.2141
## phi = 0.8
##
## Initial states:
## l = 265.0541
## b = -3.9142
##
## sigma: 0.0632
##
## AIC AICc BIC
## 8052.038 8052.189 8078.049
autoplot(f.mcopper)
13.Choose one of the following seasonal time series: hsales, auscafe, qauselec, qcement, qgas.
13a.Do the data need transforming? If so, find a suitable transformation.
str(qauselec)
## Time-Series [1:218] from 1956 to 2010: 3.92 4.44 4.81 4.42 4.34 ...
autoplot(qauselec)
bc.qauselec = BoxCox.lambda(qauselec)
13b.Are the data stationary? If not, find an appropriate differencing which yields stationary data.
nsdiffs(qauselec)
## [1] 1
ndiffs(qauselec)
## [1] 1
#needs 1 seasonal differencing
kpss.test(diff(qauselec, lag = 4))
##
## KPSS Test for Level Stationarity
##
## data: diff(qauselec, lag = 4)
## KPSS Level = 0.39382, Truncation lag parameter = 4, p-value = 0.07982
13c.Identify a couple of ARIMA models that might be useful in describing the time series. Which of your models is the best according to their AIC values?
ggtsdisplay(diff(qauselec))
q.ar = auto.arima(qauselec)
summary(q.ar)
## Series: qauselec
## ARIMA(1,0,1)(1,1,2)[4] with drift
##
## Coefficients:
## ar1 ma1 sar1 sma1 sma2 drift
## 0.9158 -0.4411 0.8628 -1.6921 0.7737 0.2299
## s.e. 0.0467 0.0942 0.1420 0.1425 0.0954 0.0456
##
## sigma^2 estimated as 0.5082: log likelihood=-230.58
## AIC=475.16 AICc=475.71 BIC=498.72
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.02390707 0.6963191 0.4466824 -0.2166928 1.802291 0.3957901
## ACF1
## Training set 0.01152729
checkresiduals(q.ar)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,1)(1,1,2)[4] with drift
## Q* = 14.822, df = 3, p-value = 0.001975
##
## Model df: 6. Total lags used: 9
#auto.arima chose ARIMA(1,0,1)(1,1,2)
q.ar2 = Arima(
qauselec, lambda = bc.qauselec,
order = c(0, 1, 1), seasonal = c(0, 1, 1)
)
summary(q.ar2)
## Series: qauselec
## ARIMA(0,1,1)(0,1,1)[4]
## Box Cox transformation: lambda= 0.5195274
##
## Coefficients:
## ma1 sma1
## -0.4976 -0.6910
## s.e. 0.0772 0.0418
##
## sigma^2 estimated as 0.01435: log likelihood=149.29
## AIC=-292.59 AICc=-292.47 BIC=-282.5
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.02793129 0.7303451 0.462043 -0.02475615 1.510161 0.4094006
## ACF1
## Training set 0.02546758
checkresiduals(q.ar2)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,1)(0,1,1)[4]
## Q* = 15.102, df = 6, p-value = 0.01948
##
## Model df: 2. Total lags used: 8
#the q.ar2 model is better than the q.ar based on AIC values
13d.Estimate the parameters of your best model and do diagnostic testing on the residuals. Do the residuals resemble white noise? If not, try to find another ARIMA model which fits better.
#No it does not resemble white noise.
checkresiduals(q.ar)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,1)(1,1,2)[4] with drift
## Q* = 14.822, df = 3, p-value = 0.001975
##
## Model df: 6. Total lags used: 9
#The first model was closer but still not white noise.
q.ar3 = Arima(
qauselec, lambda = bc.qauselec,
order = c(0, 0, 1), seasonal = c(2, 1, 0))
summary(q.ar3)
## Series: qauselec
## ARIMA(0,0,1)(2,1,0)[4]
## Box Cox transformation: lambda= 0.5195274
##
## Coefficients:
## ma1 sar1 sar2
## 0.3484 0.2460 0.4663
## s.e. 0.0604 0.0665 0.0707
##
## sigma^2 estimated as 0.02878: log likelihood=75.96
## AIC=-143.93 AICc=-143.74 BIC=-130.46
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.1823278 1.018533 0.6302581 1.022655 2.197158 0.5584503
## ACF1
## Training set -0.05061457
checkresiduals(q.ar3)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,1)(2,1,0)[4]
## Q* = 59.611, df = 5, p-value = 1.463e-11
##
## Model df: 3. Total lags used: 8
q.ar4 = Arima(
qauselec, lambda = bc.qauselec,
order = c(5, 0, 1), seasonal = c(1, 1, 2))
summary(q.ar4)
## Series: qauselec
## ARIMA(5,0,1)(1,1,2)[4]
## Box Cox transformation: lambda= 0.5195274
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 ma1 sar1 sma1 sma2
## -0.2878 0.6709 0.2138 0.1418 0.1885 0.8560 0.9131 -1.7098 0.7806
## s.e. 0.1506 0.1105 0.0879 0.0921 0.0712 0.1427 0.0983 0.0954 0.0704
##
## sigma^2 estimated as 0.01339: log likelihood=159
## AIC=-298 AICc=-296.92 BIC=-264.34
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.02457217 0.6959356 0.4342435 0.2356301 1.423507 0.3847684
## ACF1
## Training set -0.03257386
checkresiduals(q.ar4)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(5,0,1)(1,1,2)[4]
## Q* = 10.927, df = 3, p-value = 0.01213
##
## Model df: 9. Total lags used: 12
#q.ar4 was best in this case
13e.Forecast the next 24 months of data using your preferred model.
f.qauselec = forecast(q.ar4, h = 24)
autoplot(f.qauselec)
13f.Compare the forecasts obtained using ets().
f.qauselec.ets = forecast(ets(qauselec), h = 24)
autoplot(f.qauselec.ets)
14.For the same time series you used in the previous exercise, try using a non-seasonal model applied to the seasonally adjusted data obtained from STL. The stlf() function will make the calculations easy (with method=“arima”). Compare the forecasts with those obtained in the previous exercise. Which do you think is the best approach?
#stlf
f.q.stlf = stlf(qauselec, BoxCox.lambda(qauselec), s.window = 5, robust = TRUE, method = "arima",h = 24)
autoplot(f.q.stlf)
#ets
autoplot(f.qauselec.ets)
#arima
autoplot(f.qauselec)
checkresiduals(f.q.stlf)
## Warning in checkresiduals(f.q.stlf): The fitted degrees of freedom is based on
## the model used for the seasonally adjusted data.
##
## Ljung-Box test
##
## data: Residuals from STL + ARIMA(0,1,1) with drift
## Q* = 20.232, df = 6, p-value = 0.002518
##
## Model df: 2. Total lags used: 8
checkresiduals(f.qauselec.ets)
##
## Ljung-Box test
##
## data: Residuals from ETS(M,A,M)
## Q* = 14.968, df = 3, p-value = 0.001845
##
## Model df: 8. Total lags used: 11
checkresiduals(f.qauselec)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(5,0,1)(1,1,2)[4]
## Q* = 10.927, df = 3, p-value = 0.01213
##
## Model df: 9. Total lags used: 12
#I think the ets model may be the best approach
15.For your retail time series (Exercise 5 above): 15a.develop an appropriate seasonal ARIMA model;
retail.ar = forecast(auto.arima(retail.ts))
15b.compare the forecasts with those you obtained in earlier chapters;
#chapter 7
retail.hw = hw(retail.ts, seasonal = "multiplicative")
autoplot(retail.hw)
retail.hw.damped = hw(retail.ts, seasonal = "multiplicative", damped = TRUE)
autoplot(retail.hw.damped)
retail.sn = snaive(retail.ts)
autoplot(retail.sn)
accuracy(retail.ar)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.3317156 6.834438 4.561364 -0.002734279 3.776002 0.4869783
## ACF1
## Training set -0.00360398
15c.Obtain up-to-date retail data from the ABS website (Cat 8501.0, Table 11), and compare your forecasts with the actual numbers. How good were the forecasts from the various models?
library(readxl)
retail2 <- read_excel("C:/Users/burtkb/Downloads/8501011.xlsx", sheet = "Data1", skip = 1)
## New names:
## * `50.4` -> `50.4...14`
## * `` -> ...45
## * `` -> ...46
## * `18.399999999999999` -> `18.399999999999999...56`
## * `11.3` -> `11.3...67`
## * ...
retail2a = ts(retail[,"A3349873A"],frequency=12, start=c(1982,4))
16.Consider sheep, the sheep population of England and Wales from 1867–1939.
16a.Produce a time plot of the time series.
autoplot(sheep)
16c.By examining the ACF and PACF of the differenced data, explain why this model is appropriate.
ggtsdisplay(diff(sheep))
#ACF shows decreasing autocorrelation. PACF shows spikes only at lag 1 & 3.
16d.The last five values of the series are given below:
sheep1 = 1797 + 0.42*(1797 - 1791) -0.20*(1791 - 1627) - 0.30*(1627 - 1665)
sheep2 = sheep1 + 0.42*(sheep1 - 1797) -0.20*(1797 - 1791) - 0.30*(1791 - 1627)
sheep3 = sheep2 + 0.42*(sheep2 - sheep1) -0.20*(sheep1 - 1797) - 0.30*(1797 - 1791)
c(sheep1, sheep2, sheep3)
## [1] 1778.120 1719.790 1697.268
16e.Now fit the model in R and obtain the forecasts using forecast. How are they different from yours? Why?
f.sheep = forecast(Arima(sheep, order = c(3, 1, 0)))
autoplot(f.sheep)
f.sheep$mean
## Time Series:
## Start = 1940
## End = 1949
## Frequency = 1
## [1] 1777.996 1718.869 1695.985 1704.067 1730.084 1746.371 1745.518 1733.953
## [9] 1724.299 1722.829
#they are very close, but mine are slightly lower.
17.The annual bituminous coal production in the United States from 1920 to 1968 is in data set bicoal.
17a.Produce a time plot of the data.
autoplot(bicoal)
17c.Explain why this model was chosen using the ACF and PACF.
ggAcf(bicoal, lag.max = 36)
ggPacf(bicoal, lag.max = 36)
#ACF shows decreasing autocorrelation. PACF shows spikes only at lag 1 & 4.
17d.The last five values of the series are given below.
c = 162
p1 = 0.83
p2 = -0.34
p3 = 0.55
p4 = -0.38
b1 = c + p1*545 + p2*552 + p3*534 + p4*512
b2 = c + p1*b1 + p2*545 + p3*552 + p4*534
b3 = c + p1*b2 + p2*b1 + p3*545 + p4*552
c(b1, b2, b3)
## [1] 525.8100 513.8023 499.6705
17e.Now fit the model in R and obtain the forecasts from the same model. How are they different from yours? Why?
f.bi = forecast(Arima(bicoal, order = c(4, 0, 0)))
autoplot(f.bi)
f.bi$mean
## Time Series:
## Start = 1969
## End = 1978
## Frequency = 1
## [1] 527.6291 517.1923 503.8051 489.2909 482.6041 478.5776 474.5657 474.4001
## [9] 475.9463 476.5973
#they are very close, but mine are slightly lower
18.Before doing this exercise, you will need to install the Quandl package in R using
18a.Select a time series from Quandl. Then copy its short URL and import the data using
library(Quandl)
## Warning: package 'Quandl' was built under R version 4.0.3
## Loading required package: xts
## Warning: package 'xts' was built under R version 4.0.2
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 4.0.2
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
ngd = Quandl("EIA/AEO_2016_REF_NO_CPP_PRCE_NA_COMM_NA_NG_NA_SATL_Y13DLRPMCF_A", api_key="otkvu4KBWBAykqK83Zxi", type="ts")
str(ngd)
## Time-Series [1:27] from 2014 to 2040: 10.2 9.52 8.85 9.27 9.91 ...
head(ngd)
## Time Series:
## Start = 2014
## End = 2019
## Frequency = 1
## [1] 10.202365 9.518894 8.852833 9.270261 9.914866 10.537735
18b.Plot graphs of the data, and try to identify an appropriate ARIMA model.
autoplot(ngd)
ngd %>% ndiffs()
## [1] 1
ggtsdisplay(diff(ngd))
ngd.ar = auto.arima(ngd)
summary(ngd.ar)
## Series: ngd
## ARIMA(0,1,2)
##
## Coefficients:
## ma1 ma2
## 0.8682 0.8879
## s.e. 0.1313 0.2520
##
## sigma^2 estimated as 0.04139: log likelihood=3.92
## AIC=-1.83 AICc=-0.74 BIC=1.94
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0239043 0.1918064 0.129958 0.2266583 1.239587 0.7440255
## ACF1
## Training set -0.04993646
#ARIMA(0,1,2)
18c.Do residual diagnostic checking of your ARIMA model. Are the residuals white noise?
checkresiduals(ngd.ar)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,2)
## Q* = 7.0725, df = 3, p-value = 0.06962
##
## Model df: 2. Total lags used: 5
#pretty close to white noise. I will try another
ngd110 = Arima(ngd, order = c(1, 1, 0))
checkresiduals(ngd110)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,1,0)
## Q* = 7.6448, df = 4, p-value = 0.1055
##
## Model df: 1. Total lags used: 5
#white noise
18d.Use your chosen ARIMA model to forecast the next four years.
f.ngd110 = forecast(ngd110, h = 4)
autoplot(f.ngd110)
18e.Now try to identify an appropriate ETS model.
ngd.ets = ets(ngd)
summary(ngd.ets)
## ETS(A,N,N)
##
## Call:
## ets(y = ngd)
##
## Smoothing parameters:
## alpha = 0.9999
##
## Initial states:
## l = 10.2023
##
## sigma: 0.2964
##
## AIC AICc BIC
## 27.24963 28.29311 31.13714
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.05862132 0.285245 0.1682127 0.4917003 1.663627 0.9630386
## ACF1
## Training set 0.5005758
autoplot(ngd.ets)
18f.Do residual diagnostic checking of your ETS model. Are the residuals white noise?
checkresiduals(ngd.ets)
##
## Ljung-Box test
##
## data: Residuals from ETS(A,N,N)
## Q* = 14.02, df = 3, p-value = 0.002878
##
## Model df: 2. Total lags used: 5
#close to white noise
18g.Use your chosen ETS model to forecast the next four years.
f.ngd.ets = forecast(ngd.ets, h = 4)
autoplot(f.ngd.ets)
18h.Which of the two models do you prefer?
I prefer the ARIMA model.