ADEC7460 - Assignment 2

library(fpp2)

## Warning: package 'fpp2' was built under R version 4.0.3

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

## -- Attaching packages ------------------------------------ fpp2 2.4 --

## v ggplot2   3.3.0     v fma       2.4  
## v forecast  8.12      v expsmooth 2.3

## Warning: package 'forecast' was built under R version 4.0.2

## Warning: package 'fma' was built under R version 4.0.3

## Warning: package 'expsmooth' was built under R version 4.0.3

##

library(tseries)

## Warning: package 'tseries' was built under R version 4.0.2

1.Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

1a.Explain the differences among these figures. Do they all indicate that the data are white noise?

Smaller bars indicate smaller numbers, and larger bars indicate larger numbers. Yes, these all indicate white noise because all of them are within the blue lines on top and bottom.

1b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

When the sample size increases, the critical values decrease.

2.A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

autoplot(ibmclose)

autoplot(acf(ibmclose))

autoplot(pacf(ibmclose))

These plots show that the data is non-stationary.

3.For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

box8.3 = function(x)
{lam = BoxCox.lambda(x)
  transformed = BoxCox(x, lambda = lam)
  tsdisplay(transformed)
  return(paste("BClam = ", round(lam, 4)))}

3a. usnetelec

box8.3(usnetelec)

## [1] "BClam =  0.5168"

One first differencing

3b. usgdp

box8.3(usgdp)

## [1] "BClam =  0.3664"

One first differencing

3c. mcopper

box8.3(mcopper)

## [1] "BClam =  0.1919"

One first differencing

3d. enplanements

box8.3(enplanements)

## [1] "BClam =  -0.2269"

One first differencing and seasonal differencing

3e. visitors

box8.3(visitors)

## [1] "BClam =  0.2775"

One first differencing and seasonal differencing

4.For the enplanements data, write down the differences you chose above using backshift operator notation.

#One first differencing and seasonal differencing

For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

library(readxl)
library(seasonal)

## Warning: package 'seasonal' was built under R version 4.0.3

retail= read_excel("C:/Users/burtkb/Downloads/retail.xlsx", skip=1)
retail.ts = ts(retail[,"A3349627V"], frequency = 12, start = c(1982,4))
autoplot(retail.ts)

ndiffs(retail.ts)

## [1] 1

nsdiffs(retail.ts)

## [1] 1

lam.retail = BoxCox.lambda(retail.ts)
lam.retail

## [1] -0.0579656

autoplot(BoxCox(retail.ts, lam.retail))

Use R to simulate and plot some data from simple ARIMA models.

6a. Use the following R code to generate data from an AR(1) model with ϕ1 =0.6 and σ 2 = 1. The process starts with y1=0.

y <- ts(numeric(100))
e <- rnorm(100)
my1 = function(phi, y, e){
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]}

autoplot(y)

6c.Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1.

my2 = function(theta,seed.my){
set.seed(seed.my)
  y <- ts(numeric(100))
  e <- rnorm(100)
  for(i in 2:100){
    y[i] <- theta*y[i-1] + e[i]
  return(autoplot(y))}
}

6d.Produce a time plot for the series. How does the plot change as you change θ1 ?

my2(.2,1)

my2(1,1)

6e.Generate data from an ARIMA(1,1) model with ϕ1=0.6, θ1=0.6 and σ2=1.

my.ar = ts(numeric(50))
y2 = rnorm(50)
for(i in 2:50){
   my.ar[i] = 0.6*my.ar[i-1] + 0.6*e[i-1] + e[i]
}

6f.Generate data from an AR(2) model withϕ1=−0.8, ϕ2=0.3 and σ2=1. (Note that these parameters will give a non-stationary series.)

my.ar2 = ts(numeric(50))
y2 = rnorm(50)
for(i in 3:50){
   my.ar2[i] = -0.8*my.ar2[i-1] + 0.3*my.ar2[i-1] + e[i]
}

6g.Graph the latter two series and compare them.

autoplot(my.ar)

autoplot(my.ar2)

7.Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.

7a.By studying appropriate graphs of the series in R, find an appropriate ARIMA( p,d,q) model for these data.

autoplot(wmurders)

wmurders %>% ndiffs()

## [1] 2

wmurders %>% diff() %>% ggtsdisplay()

w.ar = auto.arima(wmurders)
summary(w.ar)

## Series: wmurders 
## ARIMA(1,2,1) 
## 
## Coefficients:
##           ar1      ma1
##       -0.2434  -0.8261
## s.e.   0.1553   0.1143
## 
## sigma^2 estimated as 0.04632:  log likelihood=6.44
## AIC=-6.88   AICc=-6.39   BIC=-0.97
## 
## Training set error measures:
##                       ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.01065956 0.2072523 0.1528734 -0.2149476 4.335214 0.9400996
##                    ACF1
## Training set 0.02176343

checkresiduals(w.ar)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,2,1)
## Q* = 12.419, df = 8, p-value = 0.1335
## 
## Model df: 2.   Total lags used: 10

7b.Should you include a constant in the model? Explain.

No.Including a constant would introducec a drift.

7c.Write this model in terms of the backshift operator.

7d.Fit the model using R and examine the residuals. Is the model satisfactory?

The arima model above was a satisfactory model.

7e.Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.

forecast(w.ar,3)

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005       2.470660 2.194836 2.746484 2.048824 2.892496
## 2006       2.363106 1.986351 2.739862 1.786908 2.939304
## 2007       2.252833 1.765391 2.740276 1.507354 2.998313

7f.Create a plot of the series with forecasts and prediction intervals for the next three periods shown.

autoplot(forecast(w.ar,3))

7g.Does auto.arima() give the same model you have chosen? If not, which model do you think is better?

Yes

8.Consider austa, the total international visitors to Australia (in millions) for the period 1980-2015.

8a.Use auto.arima() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

austamod = auto.arima(austa)
checkresiduals(austamod)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1) with drift
## Q* = 2.297, df = 5, p-value = 0.8067
## 
## Model df: 2.   Total lags used: 7

autoplot(forecast(austamod))

ARIMA(0,1,1) with drift was chosen

8b.Plot forecasts from an ARIMA(0,1,1) model with no drift and compare these to part a. Remove the MA term and plot again.

austa011 = forecast(arima(austa, c(0,1,1)), h=10)
autoplot(austa011)

austa010 = forecast(arima(austa, c(0,1,0)), h=10)
autoplot(austa010)

8c.Plot forecasts from an ARIMA(2,1,3) model with drift. Remove the constant and see what happens.

austa213 = forecast(Arima(austa, c(2,1,3), include.drift = TRUE), h=10)

autoplot(austa213)

8d.Plot forecasts from an ARIMA(0,0,1) model with a constant. Remove the MA term and plot again.

austa001 = forecast(arima(austa, c(0,0,1)), include.constant=TRUE, h=10)

## Warning in forecast.Arima(arima(austa, c(0, 0, 1)), include.constant = TRUE, :
## The non-existent include.constant arguments will be ignored.

autoplot(austa001)

8e.Plot forecasts from an ARIMA(0,2,1) model with no constant.

austa021 = forecast(arima(austa, c(0,2,1)), include.constant=FALSE, h=10)

## Warning in forecast.Arima(arima(austa, c(0, 2, 1)), include.constant = FALSE, :
## The non-existent include.constant arguments will be ignored.

autoplot(austa021)

9.For the usgdp series:

9a.if necessary, find a suitable Box-Cox transformation for the data;

str(usgdp)

##  Time-Series [1:237] from 1947 to 2006: 1570 1569 1568 1591 1616 ...

autoplot(usgdp)

autoplot(BoxCox(usgdp,BoxCox.lambda(usgdp)))

bc.usgdp = BoxCox.lambda(usgdp)

9b.fit a suitable ARIMA model to the transformed data using auto.arima();

ar.usgdp = auto.arima(usgdp, lambda = bc.usgdp)
checkresiduals(ar.usgdp)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,1,0) with drift
## Q* = 6.5772, df = 5, p-value = 0.254
## 
## Model df: 3.   Total lags used: 8

summary(ar.usgdp)

## Series: usgdp 
## ARIMA(2,1,0) with drift 
## Box Cox transformation: lambda= 0.366352 
## 
## Coefficients:
##          ar1     ar2   drift
##       0.2795  0.1208  0.1829
## s.e.  0.0647  0.0648  0.0202
## 
## sigma^2 estimated as 0.03518:  log likelihood=61.56
## AIC=-115.11   AICc=-114.94   BIC=-101.26
## 
## Training set error measures:
##                    ME    RMSE      MAE         MPE      MAPE      MASE
## Training set 1.195275 39.2224 29.29521 -0.01363259 0.6863491 0.1655687
##                     ACF1
## Training set -0.03824844

9c.try some other plausible models by experimenting with the orders chosen;

usgdp001 = forecast(arima(usgdp, c(0,0,1)), include.constant=TRUE, h=10)

## Warning in forecast.Arima(arima(usgdp, c(0, 0, 1)), include.constant = TRUE, :
## The non-existent include.constant arguments will be ignored.

autoplot(usgdp001)

summary(usgdp001)

## 
## Forecast method: ARIMA(0,0,1) with non-zero mean
## 
## Model Information:
## 
## Call:
## arima(x = usgdp, order = c(0, 0, 1))
## 
## Coefficients:
##          ma1  intercept
##       1.0000  5174.3663
## s.e.  0.0113   180.8188
## 
## sigma^2 estimated as 1945380:  log likelihood = -2055.02,  aic = 4116.04
## 
## Error measures:
##                    ME     RMSE      MAE       MPE     MAPE     MASE      ACF1
## Training set 4.604078 1394.769 1176.846 -18.56536 32.04501 6.651219 0.9698935
## 
## Forecasts:
##         Point Forecast    Lo 80     Hi 80    Lo 95     Hi 95
## 2006 Q2       8241.316 6450.098 10032.535 5501.884 10980.749
## 2006 Q3       5174.366 2646.505  7702.227 1308.337  9040.396
## 2006 Q4       5174.366 2646.505  7702.227 1308.337  9040.396
## 2007 Q1       5174.366 2646.505  7702.227 1308.337  9040.396
## 2007 Q2       5174.366 2646.505  7702.227 1308.337  9040.396
## 2007 Q3       5174.366 2646.505  7702.227 1308.337  9040.396
## 2007 Q4       5174.366 2646.505  7702.227 1308.337  9040.396
## 2008 Q1       5174.366 2646.505  7702.227 1308.337  9040.396
## 2008 Q2       5174.366 2646.505  7702.227 1308.337  9040.396
## 2008 Q3       5174.366 2646.505  7702.227 1308.337  9040.396

usgdp213 = forecast(Arima(usgdp, c(2,1,3), include.drift = TRUE), h=10)
autoplot(usgdp213)

summary(usgdp213)

## 
## Forecast method: ARIMA(2,1,3) with drift
## 
## Model Information:
## Series: usgdp 
## ARIMA(2,1,3) with drift 
## 
## Coefficients:
##          ar1     ar2     ma1      ma2      ma3    drift
##       0.1575  0.5256  0.1514  -0.2358  -0.0737  42.1169
## s.e.  0.2880  0.1862  0.2958   0.1794   0.1148   6.8481
## 
## sigma^2 estimated as 1650:  log likelihood=-1206.21
## AIC=2426.42   AICc=2426.91   BIC=2450.66
## 
## Error measures:
##                     ME     RMSE      MAE         MPE      MAPE      MASE
## Training set 0.2689172 40.01301 30.38976 -0.09767027 0.7220301 0.1717548
##                       ACF1
## Training set -0.0004712095
## 
## Forecasts:
##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2006 Q2       11484.19 11432.14 11536.24 11404.58 11563.80
## 2006 Q3       11573.06 11487.32 11658.80 11441.94 11704.19
## 2006 Q4       11636.14 11514.88 11757.39 11450.69 11821.58
## 2007 Q1       11706.13 11553.24 11859.02 11472.30 11939.95
## 2007 Q2       11763.65 11578.99 11948.31 11481.24 12046.06
## 2007 Q3       11822.84 11608.38 12037.31 11494.85 12150.84
## 2007 Q4       11875.74 11632.20 12119.29 11503.27 12248.22
## 2008 Q1       11928.53 11657.33 12199.74 11513.76 12343.31
## 2008 Q2       11978.00 11680.09 12275.91 11522.39 12433.61
## 2008 Q3       12026.88 11703.45 12350.32 11532.24 12521.53

9d.choose what you think is the best model and check the residual diagnostics;

#usgdp213 was the better model
checkresiduals(usgdp213)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,1,3) with drift
## Q* = 7.5675, df = 3, p-value = 0.05585
## 
## Model df: 6.   Total lags used: 9

9e.produce forecasts of your fitted model. Do the forecasts look reasonable?

autoplot(forecast(usgdp,h=10))

9f.compare the results with what you would obtain using ets() (with no transformation).

ets.usgdp = ets(usgdp)
summary(ets.usgdp)

## ETS(A,A,N) 
## 
## Call:
##  ets(y = usgdp) 
## 
##   Smoothing parameters:
##     alpha = 0.9999 
##     beta  = 0.278 
## 
##   Initial states:
##     l = 1557.4589 
##     b = 18.6862 
## 
##   sigma:  41.8895
## 
##      AIC     AICc      BIC 
## 3072.303 3072.562 3089.643 
## 
## Training set error measures:
##                   ME     RMSE      MAE        MPE      MAPE     MASE       ACF1
## Training set 1.30185 41.53448 31.85324 0.02319607 0.7543931 0.180026 0.07596492

checkresiduals(ets.usgdp)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,A,N)
## Q* = 21.461, df = 4, p-value = 0.0002565
## 
## Model df: 4.   Total lags used: 8

autoplot(forecast(ets.usgdp,h=10))

10.Consider austourists, the quarterly visitor nights (in millions) spent by international tourists to Australia for the period 1999–2015.

10a.Describe the time plot.

autoplot(austourists)

ggseasonplot(austourists)

There’s seasonality and an upward trend.

10b.What can you learn from the ACF graph?

ggAcf(austourists)

Autocorrelations decrease

10c.What can you learn from the PACF graph?

ggPacf(austourists)

10d.Produce plots of the seasonally differenced data (1−B4)Yt. What model do these graphs suggest?

autoplot(diff(austourists, lag=4))

par(mfrow=c(1,2))
acf(diff(austourists,lag=4))
pacf(diff(austourists,lag=41))

The graphs suggest arima(1,1,0)

10e.Does auto.arima() give the same model that you chose? If not, which model do you think is better?

auto.arima(austourists)

## Series: austourists 
## ARIMA(1,0,0)(1,1,0)[4] with drift 
## 
## Coefficients:
##          ar1     sar1   drift
##       0.4705  -0.5305  0.5489
## s.e.  0.1154   0.1122  0.0864
## 
## sigma^2 estimated as 5.15:  log likelihood=-142.48
## AIC=292.97   AICc=293.65   BIC=301.6

It gave ARIMA(1,0,0)(1,1,0)[4]. The auto.arima gave the best model.

Consider usmelec, the total net generation of electricity (in billion kilowatt hours) by the U.S. electric industry (monthly for the period January 1973 – June 2013). In general there are two peaks per year: in mid-summer and mid-winter.

11a.Examine the 12-month moving average of this series to see what kind of trend is involved.

usmelec.12 = ma(usmelec, order = 12)
autoplot(usmelec.12)

There’s an upward trend.

11b.Do the data need transforming? If so, find a suitable transformation.

Yes, we will do BoxCox

bc.usmelec = BoxCox.lambda(usmelec)
bc.usmelec2 = BoxCox(usmelec, lambda=bc.usmelec)
autoplot(bc.usmelec2)

11c.Are the data stationary? If not, find an appropriate differencing which yields stationary data.

#the data is not stationary
ndiffs(bc.usmelec2)

## [1] 1

nsdiffs(bc.usmelec2)

## [1] 1

#first differencing will be used

11d.Identify a couple of ARIMA models that might be useful in describing the time series. Which of your models is the best according to their AIC values?

plot(decompose(bc.usmelec2))

#model1
bc.us = Arima(usmelec, order = c(12, 1, 0), seasonal = c(1, 0, 0), lambda = 0)
bc.us

## Series: usmelec 
## ARIMA(12,1,0)(1,0,0)[12] 
## Box Cox transformation: lambda= 0 
## 
## Coefficients:
##           ar1      ar2      ar3      ar4      ar5      ar6      ar7      ar8
##       -0.4196  -0.3817  -0.2097  -0.2566  -0.2020  -0.0813  -0.0862  -0.0655
## s.e.   0.0436   0.0474   0.0506   0.0517   0.0523   0.0529   0.0526   0.0523
##          ar9    ar10    ar11     ar12    sar1
##       0.0091  0.1031  0.1144  -0.2935  0.9610
## s.e.  0.0518  0.0508  0.0477   0.0440  0.0107
## 
## sigma^2 estimated as 0.0009182:  log likelihood=1000.95
## AIC=-1973.91   AICc=-1973.02   BIC=-1915.33

bc.us2 = Arima(usmelec, order = c(6, 1, 0), seasonal = c(3, 1, 0), lambda = 0)
bc.us2

## Series: usmelec 
## ARIMA(6,1,0)(3,1,0)[12] 
## Box Cox transformation: lambda= 0 
## 
## Coefficients:
##           ar1      ar2      ar3      ar4      ar5      ar6     sar1     sar2
##       -0.4448  -0.4031  -0.2257  -0.2341  -0.1611  -0.0692  -0.6594  -0.4746
## s.e.   0.0460   0.0500   0.0522   0.0521   0.0500   0.0461   0.0453   0.0506
##          sar3
##       -0.2839
## s.e.   0.0457
## 
## sigma^2 estimated as 0.0007708:  log likelihood=1024.58
## AIC=-2029.16   AICc=-2028.68   BIC=-1987.57

summary(bc.us)

## Series: usmelec 
## ARIMA(12,1,0)(1,0,0)[12] 
## Box Cox transformation: lambda= 0 
## 
## Coefficients:
##           ar1      ar2      ar3      ar4      ar5      ar6      ar7      ar8
##       -0.4196  -0.3817  -0.2097  -0.2566  -0.2020  -0.0813  -0.0862  -0.0655
## s.e.   0.0436   0.0474   0.0506   0.0517   0.0523   0.0529   0.0526   0.0523
##          ar9    ar10    ar11     ar12    sar1
##       0.0091  0.1031  0.1144  -0.2935  0.9610
## s.e.  0.0518  0.0508  0.0477   0.0440  0.0107
## 
## sigma^2 estimated as 0.0009182:  log likelihood=1000.95
## AIC=-1973.91   AICc=-1973.02   BIC=-1915.33
## 
## Training set error measures:
##                      ME     RMSE      MAE         MPE     MAPE      MASE
## Training set 0.08301885 8.249448 6.124246 -0.03015208 2.308762 0.6801134
##                     ACF1
## Training set 0.001685155

summary(bc.us2)

## Series: usmelec 
## ARIMA(6,1,0)(3,1,0)[12] 
## Box Cox transformation: lambda= 0 
## 
## Coefficients:
##           ar1      ar2      ar3      ar4      ar5      ar6     sar1     sar2
##       -0.4448  -0.4031  -0.2257  -0.2341  -0.1611  -0.0692  -0.6594  -0.4746
## s.e.   0.0460   0.0500   0.0522   0.0521   0.0500   0.0461   0.0453   0.0506
##          sar3
##       -0.2839
## s.e.   0.0457
## 
## sigma^2 estimated as 0.0007708:  log likelihood=1024.58
## AIC=-2029.16   AICc=-2028.68   BIC=-1987.57
## 
## Training set error measures:
##                       ME     RMSE     MAE         MPE     MAPE      MASE
## Training set -0.03805136 7.531002 5.64684 -0.05110048 2.120754 0.6270962
##                     ACF1
## Training set -0.02571378

11e.Estimate the parameters of your best model and do diagnostic testing on the residuals. Do the residuals resemble white noise? If not, try to find another ARIMA model which fits better.

checkresiduals(bc.us2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(6,1,0)(3,1,0)[12]
## Q* = 41.028, df = 15, p-value = 0.0003167
## 
## Model df: 9.   Total lags used: 24

#no, it does not resemble white noise.

auto.us = auto.arima(usmelec, seasonal = TRUE, stepwise = FALSE, approximation = FALSE, lambda = "auto")
summary(auto.us)

## Series: usmelec 
## ARIMA(1,1,1)(2,1,1)[12] 
## Box Cox transformation: lambda= -0.5738331 
## 
## Coefficients:
##          ar1      ma1    sar1     sar2     sma1
##       0.3888  -0.8265  0.0403  -0.0958  -0.8471
## s.e.  0.0630   0.0375  0.0555   0.0531   0.0341
## 
## sigma^2 estimated as 1.274e-06:  log likelihood=2547.36
## AIC=-5082.72   AICc=-5082.54   BIC=-5057.76
## 
## Training set error measures:
##                      ME     RMSE      MAE        MPE     MAPE     MASE
## Training set -0.5001989 7.235656 5.302245 -0.1959073 2.002005 0.588828
##                     ACF1
## Training set -0.02860273

checkresiduals(auto.us)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,1,1)(2,1,1)[12]
## Q* = 28.012, df = 19, p-value = 0.08319
## 
## Model df: 5.   Total lags used: 24

#it produced ARIMA(1,1,1)(2,1,1), and it seems to be a bit closer.

11f.Forecast the next 15 years of electricity generation by the U.S. electric industry. Get the latest figures from the EIA to check the accuracy of your forecasts.

library(readxl)
energy <- read.csv("C:/Users/burtkb/Downloads/MER_T07_02A.csv")
names(energy) = c('month', 'elec.ind')
energy.ts = ts(energy$elec.ind, start= c(1973,1), frequency = 12)
autoplot(energy.ts)

f.energy = forecast(auto.us, h=15*12)
autoplot(f.energy)

11g.Eventually, the prediction intervals are so wide that the forecasts are not particularly useful. How many years of forecasts do you think are sufficiently accurate to be usable?

Perhaps, ,the next 3-4 years buut not much more than that.

For the mcopper data:

12a.if necessary, find a suitable Box-Cox transformation for the data;

str(mcopper)

##  Time-Series [1:564] from 1960 to 2007: 255 260 249 258 244 ...

autoplot(mcopper)

autoplot(BoxCox(mcopper, BoxCox.lambda(mcopper)))

bc.mcopper = BoxCox.lambda(mcopper)

12b.fit a suitable ARIMA model to the transformed data using auto.arima();

auto.mcopper = auto.arima(mcopper, lambda= bc.mcopper)
auto.mcopper

## Series: mcopper 
## ARIMA(0,1,1) 
## Box Cox transformation: lambda= 0.1919047 
## 
## Coefficients:
##          ma1
##       0.3720
## s.e.  0.0388
## 
## sigma^2 estimated as 0.04997:  log likelihood=45.05
## AIC=-86.1   AICc=-86.08   BIC=-77.43

12c.try some other plausible models by experimenting with the orders chosen;

ndiffs(mcopper)

## [1] 1

nsdiffs(mcopper)

## [1] 0

ggtsdisplay(diff(mcopper))

mcop001 = Arima(mcopper,order = c(0, 0, 1), lambda = bc.mcopper)
summary(mcop001)

## Series: mcopper 
## ARIMA(0,0,1) with non-zero mean 
## Box Cox transformation: lambda= 0.1919047 
## 
## Coefficients:
##          ma1     mean
##       0.9446  13.8817
## s.e.  0.0090   0.0956
## 
## sigma^2 estimated as 1.371:  log likelihood=-889.45
## AIC=1784.89   AICc=1784.93   BIC=1797.9
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE    MAPE     MASE      ACF1
## Training set 98.69302 372.2165 232.1022 -8.806098 28.3679 1.044277 0.9136532

mcop510 = Arima(mcopper,order = c(5, 1, 0), lambda = bc.mcopper)
summary(mcop510)

## Series: mcopper 
## ARIMA(5,1,0) 
## Box Cox transformation: lambda= 0.1919047 
## 
## Coefficients:
##          ar1      ar2     ar3      ar4     ar5
##       0.3706  -0.1475  0.0609  -0.0038  0.0134
## s.e.  0.0421   0.0450  0.0454   0.0450  0.0422
## 
## sigma^2 estimated as 0.05028:  log likelihood=45.32
## AIC=-78.63   AICc=-78.48   BIC=-52.63
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 3.263194 77.22921 44.98219 0.1566706 4.310128 0.2023845
##                     ACF1
## Training set -0.08666255

12d.choose what you think is the best model and check the residual diagnostics;

#mcop510 is the better model
checkresiduals(mcop510)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(5,1,0)
## Q* = 22.089, df = 19, p-value = 0.2798
## 
## Model df: 5.   Total lags used: 24

12e.produce forecasts of your fitted model. Do the forecasts look reasonable?

autoplot(forecast(mcop510))

12f.compare the results with what you would obtain using ets() (with no transformation).

f.mcopper = ets(mcopper)
f.mcopper

## ETS(M,Ad,N) 
## 
## Call:
##  ets(y = mcopper) 
## 
##   Smoothing parameters:
##     alpha = 0.9999 
##     beta  = 0.2141 
##     phi   = 0.8 
## 
##   Initial states:
##     l = 265.0541 
##     b = -3.9142 
## 
##   sigma:  0.0632
## 
##      AIC     AICc      BIC 
## 8052.038 8052.189 8078.049

autoplot(f.mcopper)

13.Choose one of the following seasonal time series: hsales, auscafe, qauselec, qcement, qgas.

13a.Do the data need transforming? If so, find a suitable transformation.

str(qauselec)

##  Time-Series [1:218] from 1956 to 2010: 3.92 4.44 4.81 4.42 4.34 ...

autoplot(qauselec)

bc.qauselec = BoxCox.lambda(qauselec)

13b.Are the data stationary? If not, find an appropriate differencing which yields stationary data.

nsdiffs(qauselec)

## [1] 1

ndiffs(qauselec)

## [1] 1

#needs 1 seasonal differencing
kpss.test(diff(qauselec, lag = 4))

## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(qauselec, lag = 4)
## KPSS Level = 0.39382, Truncation lag parameter = 4, p-value = 0.07982

13c.Identify a couple of ARIMA models that might be useful in describing the time series. Which of your models is the best according to their AIC values?

ggtsdisplay(diff(qauselec))

q.ar = auto.arima(qauselec)
summary(q.ar)

## Series: qauselec 
## ARIMA(1,0,1)(1,1,2)[4] with drift 
## 
## Coefficients:
##          ar1      ma1    sar1     sma1    sma2   drift
##       0.9158  -0.4411  0.8628  -1.6921  0.7737  0.2299
## s.e.  0.0467   0.0942  0.1420   0.1425  0.0954  0.0456
## 
## sigma^2 estimated as 0.5082:  log likelihood=-230.58
## AIC=475.16   AICc=475.71   BIC=498.72
## 
## Training set error measures:
##                      ME      RMSE       MAE        MPE     MAPE      MASE
## Training set 0.02390707 0.6963191 0.4466824 -0.2166928 1.802291 0.3957901
##                    ACF1
## Training set 0.01152729

checkresiduals(q.ar)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,1)(1,1,2)[4] with drift
## Q* = 14.822, df = 3, p-value = 0.001975
## 
## Model df: 6.   Total lags used: 9

#auto.arima chose ARIMA(1,0,1)(1,1,2)
q.ar2 = Arima(
  qauselec, lambda = bc.qauselec,
  order = c(0, 1, 1), seasonal = c(0, 1, 1)
)
summary(q.ar2)

## Series: qauselec 
## ARIMA(0,1,1)(0,1,1)[4] 
## Box Cox transformation: lambda= 0.5195274 
## 
## Coefficients:
##           ma1     sma1
##       -0.4976  -0.6910
## s.e.   0.0772   0.0418
## 
## sigma^2 estimated as 0.01435:  log likelihood=149.29
## AIC=-292.59   AICc=-292.47   BIC=-282.5
## 
## Training set error measures:
##                       ME      RMSE      MAE         MPE     MAPE      MASE
## Training set -0.02793129 0.7303451 0.462043 -0.02475615 1.510161 0.4094006
##                    ACF1
## Training set 0.02546758

checkresiduals(q.ar2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1)(0,1,1)[4]
## Q* = 15.102, df = 6, p-value = 0.01948
## 
## Model df: 2.   Total lags used: 8

#the q.ar2 model is better than the q.ar based on AIC values

13d.Estimate the parameters of your best model and do diagnostic testing on the residuals. Do the residuals resemble white noise? If not, try to find another ARIMA model which fits better.

#No it does not resemble white noise. 

checkresiduals(q.ar)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,1)(1,1,2)[4] with drift
## Q* = 14.822, df = 3, p-value = 0.001975
## 
## Model df: 6.   Total lags used: 9

#The first model was closer but still not white noise.
q.ar3 = Arima(
  qauselec, lambda = bc.qauselec,
  order = c(0, 0, 1), seasonal = c(2, 1, 0))
summary(q.ar3)

## Series: qauselec 
## ARIMA(0,0,1)(2,1,0)[4] 
## Box Cox transformation: lambda= 0.5195274 
## 
## Coefficients:
##          ma1    sar1    sar2
##       0.3484  0.2460  0.4663
## s.e.  0.0604  0.0665  0.0707
## 
## sigma^2 estimated as 0.02878:  log likelihood=75.96
## AIC=-143.93   AICc=-143.74   BIC=-130.46
## 
## Training set error measures:
##                     ME     RMSE       MAE      MPE     MAPE      MASE
## Training set 0.1823278 1.018533 0.6302581 1.022655 2.197158 0.5584503
##                     ACF1
## Training set -0.05061457

checkresiduals(q.ar3)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,1)(2,1,0)[4]
## Q* = 59.611, df = 5, p-value = 1.463e-11
## 
## Model df: 3.   Total lags used: 8

q.ar4 = Arima(
  qauselec, lambda = bc.qauselec,
  order = c(5, 0, 1), seasonal = c(1, 1, 2))
summary(q.ar4)

## Series: qauselec 
## ARIMA(5,0,1)(1,1,2)[4] 
## Box Cox transformation: lambda= 0.5195274 
## 
## Coefficients:
##           ar1     ar2     ar3     ar4     ar5     ma1    sar1     sma1    sma2
##       -0.2878  0.6709  0.2138  0.1418  0.1885  0.8560  0.9131  -1.7098  0.7806
## s.e.   0.1506  0.1105  0.0879  0.0921  0.0712  0.1427  0.0983   0.0954  0.0704
## 
## sigma^2 estimated as 0.01339:  log likelihood=159
## AIC=-298   AICc=-296.92   BIC=-264.34
## 
## Training set error measures:
##                      ME      RMSE       MAE       MPE     MAPE      MASE
## Training set 0.02457217 0.6959356 0.4342435 0.2356301 1.423507 0.3847684
##                     ACF1
## Training set -0.03257386

checkresiduals(q.ar4)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(5,0,1)(1,1,2)[4]
## Q* = 10.927, df = 3, p-value = 0.01213
## 
## Model df: 9.   Total lags used: 12

#q.ar4 was best in this case

13e.Forecast the next 24 months of data using your preferred model.

f.qauselec = forecast(q.ar4, h = 24)
autoplot(f.qauselec)

13f.Compare the forecasts obtained using ets().

f.qauselec.ets = forecast(ets(qauselec), h = 24)
autoplot(f.qauselec.ets)

14.For the same time series you used in the previous exercise, try using a non-seasonal model applied to the seasonally adjusted data obtained from STL. The stlf() function will make the calculations easy (with method=“arima”). Compare the forecasts with those obtained in the previous exercise. Which do you think is the best approach?

#stlf
f.q.stlf = stlf(qauselec, BoxCox.lambda(qauselec), s.window = 5, robust = TRUE, method = "arima",h = 24)
autoplot(f.q.stlf)

#ets
autoplot(f.qauselec.ets)

#arima
autoplot(f.qauselec)

checkresiduals(f.q.stlf)

## Warning in checkresiduals(f.q.stlf): The fitted degrees of freedom is based on
## the model used for the seasonally adjusted data.

## 
##  Ljung-Box test
## 
## data:  Residuals from STL +  ARIMA(0,1,1) with drift
## Q* = 20.232, df = 6, p-value = 0.002518
## 
## Model df: 2.   Total lags used: 8

checkresiduals(f.qauselec.ets)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(M,A,M)
## Q* = 14.968, df = 3, p-value = 0.001845
## 
## Model df: 8.   Total lags used: 11

checkresiduals(f.qauselec)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(5,0,1)(1,1,2)[4]
## Q* = 10.927, df = 3, p-value = 0.01213
## 
## Model df: 9.   Total lags used: 12

#I think the ets model may be the best approach

15.For your retail time series (Exercise 5 above): 15a.develop an appropriate seasonal ARIMA model;

retail.ar = forecast(auto.arima(retail.ts))

15b.compare the forecasts with those you obtained in earlier chapters;

#chapter 7
retail.hw = hw(retail.ts, seasonal = "multiplicative")
autoplot(retail.hw)

retail.hw.damped = hw(retail.ts, seasonal = "multiplicative", damped = TRUE)
autoplot(retail.hw.damped)

retail.sn = snaive(retail.ts)
autoplot(retail.sn)

accuracy(retail.ar)

##                     ME     RMSE      MAE          MPE     MAPE      MASE
## Training set 0.3317156 6.834438 4.561364 -0.002734279 3.776002 0.4869783
##                     ACF1
## Training set -0.00360398

15c.Obtain up-to-date retail data from the ABS website (Cat 8501.0, Table 11), and compare your forecasts with the actual numbers. How good were the forecasts from the various models?

library(readxl)
retail2 <- read_excel("C:/Users/burtkb/Downloads/8501011.xlsx", sheet = "Data1", skip = 1)

## New names:
## * `50.4` -> `50.4...14`
## * `` -> ...45
## * `` -> ...46
## * `18.399999999999999` -> `18.399999999999999...56`
## * `11.3` -> `11.3...67`
## * ...

retail2a = ts(retail[,"A3349873A"],frequency=12, start=c(1982,4))

16.Consider sheep, the sheep population of England and Wales from 1867–1939.

16a.Produce a time plot of the time series.

autoplot(sheep)

16c.By examining the ACF and PACF of the differenced data, explain why this model is appropriate.

ggtsdisplay(diff(sheep))

#ACF shows decreasing autocorrelation. PACF shows spikes only at lag 1 & 3.

16d.The last five values of the series are given below:

sheep1 = 1797 + 0.42*(1797 - 1791) -0.20*(1791 - 1627) - 0.30*(1627 - 1665)
sheep2 = sheep1 + 0.42*(sheep1 - 1797) -0.20*(1797 - 1791) - 0.30*(1791 - 1627)
sheep3 = sheep2 + 0.42*(sheep2 - sheep1) -0.20*(sheep1 - 1797) - 0.30*(1797 - 1791)
c(sheep1, sheep2, sheep3)

## [1] 1778.120 1719.790 1697.268

16e.Now fit the model in R and obtain the forecasts using forecast. How are they different from yours? Why?

f.sheep = forecast(Arima(sheep, order = c(3, 1, 0)))
autoplot(f.sheep)

f.sheep$mean

## Time Series:
## Start = 1940 
## End = 1949 
## Frequency = 1 
##  [1] 1777.996 1718.869 1695.985 1704.067 1730.084 1746.371 1745.518 1733.953
##  [9] 1724.299 1722.829

#they are very close, but mine are slightly lower.

17.The annual bituminous coal production in the United States from 1920 to 1968 is in data set bicoal.

17a.Produce a time plot of the data.

autoplot(bicoal)

17c.Explain why this model was chosen using the ACF and PACF.

ggAcf(bicoal, lag.max = 36)

ggPacf(bicoal, lag.max = 36)

#ACF shows decreasing autocorrelation. PACF shows spikes only at lag 1 & 4.

17d.The last five values of the series are given below.

c = 162
p1 = 0.83
p2 = -0.34
p3 = 0.55
p4 = -0.38
b1 = c + p1*545 + p2*552 + p3*534 + p4*512
b2 = c + p1*b1 + p2*545 + p3*552 + p4*534
b3 = c + p1*b2 + p2*b1 + p3*545 + p4*552
c(b1, b2, b3)

## [1] 525.8100 513.8023 499.6705

17e.Now fit the model in R and obtain the forecasts from the same model. How are they different from yours? Why?

f.bi = forecast(Arima(bicoal, order = c(4, 0, 0)))
autoplot(f.bi)

f.bi$mean

## Time Series:
## Start = 1969 
## End = 1978 
## Frequency = 1 
##  [1] 527.6291 517.1923 503.8051 489.2909 482.6041 478.5776 474.5657 474.4001
##  [9] 475.9463 476.5973

#they are very close, but mine are slightly lower

18.Before doing this exercise, you will need to install the Quandl package in R using

18a.Select a time series from Quandl. Then copy its short URL and import the data using

library(Quandl)

## Warning: package 'Quandl' was built under R version 4.0.3

## Loading required package: xts

## Warning: package 'xts' was built under R version 4.0.2

## Loading required package: zoo

## Warning: package 'zoo' was built under R version 4.0.2

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

ngd = Quandl("EIA/AEO_2016_REF_NO_CPP_PRCE_NA_COMM_NA_NG_NA_SATL_Y13DLRPMCF_A", api_key="otkvu4KBWBAykqK83Zxi", type="ts")
str(ngd)

##  Time-Series [1:27] from 2014 to 2040: 10.2 9.52 8.85 9.27 9.91 ...

head(ngd)

## Time Series:
## Start = 2014 
## End = 2019 
## Frequency = 1 
## [1] 10.202365  9.518894  8.852833  9.270261  9.914866 10.537735

18b.Plot graphs of the data, and try to identify an appropriate ARIMA model.

autoplot(ngd)

ngd %>% ndiffs()

## [1] 1

ggtsdisplay(diff(ngd))

ngd.ar = auto.arima(ngd)
summary(ngd.ar)

## Series: ngd 
## ARIMA(0,1,2) 
## 
## Coefficients:
##          ma1     ma2
##       0.8682  0.8879
## s.e.  0.1313  0.2520
## 
## sigma^2 estimated as 0.04139:  log likelihood=3.92
## AIC=-1.83   AICc=-0.74   BIC=1.94
## 
## Training set error measures:
##                     ME      RMSE      MAE       MPE     MAPE      MASE
## Training set 0.0239043 0.1918064 0.129958 0.2266583 1.239587 0.7440255
##                     ACF1
## Training set -0.04993646

#ARIMA(0,1,2)

18c.Do residual diagnostic checking of your ARIMA model. Are the residuals white noise?

checkresiduals(ngd.ar)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,2)
## Q* = 7.0725, df = 3, p-value = 0.06962
## 
## Model df: 2.   Total lags used: 5

#pretty close to white noise. I will try another

ngd110 = Arima(ngd, order = c(1, 1, 0))
checkresiduals(ngd110)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,1,0)
## Q* = 7.6448, df = 4, p-value = 0.1055
## 
## Model df: 1.   Total lags used: 5

#white noise

18d.Use your chosen ARIMA model to forecast the next four years.

f.ngd110 = forecast(ngd110, h = 4)
autoplot(f.ngd110)

18e.Now try to identify an appropriate ETS model.

ngd.ets = ets(ngd)
summary(ngd.ets)

## ETS(A,N,N) 
## 
## Call:
##  ets(y = ngd) 
## 
##   Smoothing parameters:
##     alpha = 0.9999 
## 
##   Initial states:
##     l = 10.2023 
## 
##   sigma:  0.2964
## 
##      AIC     AICc      BIC 
## 27.24963 28.29311 31.13714 
## 
## Training set error measures:
##                      ME     RMSE       MAE       MPE     MAPE      MASE
## Training set 0.05862132 0.285245 0.1682127 0.4917003 1.663627 0.9630386
##                   ACF1
## Training set 0.5005758

autoplot(ngd.ets)

18f.Do residual diagnostic checking of your ETS model. Are the residuals white noise?

checkresiduals(ngd.ets)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,N,N)
## Q* = 14.02, df = 3, p-value = 0.002878
## 
## Model df: 2.   Total lags used: 5

#close to white noise

18g.Use your chosen ETS model to forecast the next four years.

f.ngd.ets = forecast(ngd.ets, h = 4)
autoplot(f.ngd.ets)

18h.Which of the two models do you prefer?

I prefer the ARIMA model.

ADEC7460 - Assignment 2 - Chapter 8

Kayla Burt

11/16/2020