library(fpp2)

## Warning: package 'fpp2' was built under R version 3.6.3

## Registered S3 method overwritten by 'xts':
##   method     from
##   as.zoo.xts zoo

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

## -- Attaching packages ------------------------------------------------------------------------------ fpp2 2.4 --

## v ggplot2   3.3.2     v fma       2.4  
## v forecast  8.13      v expsmooth 2.3

## Warning: package 'ggplot2' was built under R version 3.6.3

## Warning: package 'forecast' was built under R version 3.6.3

## Warning: package 'fma' was built under R version 3.6.3

## Warning: package 'expsmooth' was built under R version 3.6.3

##

library(ggplot2)

Question 1

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.

usnetelec

usgdp

mcopper

enplanements

usnetelec

plot(usnetelec)

lambda<-BoxCox.lambda(usnetelec)
lambda

## [1] 0.5167714

autoplot(BoxCox(usnetelec,lambda))

usgdp

plot(usgdp)

lambda2<-BoxCox.lambda(usgdp)
lambda2

## [1] 0.366352

autoplot(BoxCox(usgdp,lambda2))

mcopper

plot(mcopper)

lambda3<-BoxCox.lambda(mcopper)
lambda3

## [1] 0.1919047

autoplot(BoxCox(mcopper,lambda3))

enplanements

plot(enplanements)

lambda4<-BoxCox.lambda(enplanements)
lambda4

## [1] -0.2269461

autoplot(BoxCox(enplanements,lambda4))

Question 2

Why is a Box-Cox transformation unhelpful for the cangas data?

plot(cangas)

lambda5<-BoxCox.lambda(cangas)
lambda5

## [1] 0.5767759

autoplot(BoxCox(cangas,lambda5))

When compare the plot after we apply Box-Cox transformation, we don’t see an obvious change because the data various much widely during 1978 to 1993 and we can’t detect an uniform increase or decrease during this period. The volatility of the data might affect the Box-Cox transformation.

Question 3

What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?

I will select A3349874C from the column.

retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349874C"],
  frequency=12, start=c(1982,4))
autoplot(myts)

lambda_myts<-BoxCox.lambda(myts)
lambda_myts

## [1] -0.01469377

autoplot(BoxCox(myts,lambda_myts))

The plot shows the upward trend and seasonality. The lambda value, -0.01469377, is very small. Box-Cox transformation smooth the drastically changes after year 2010 and make variation curve looks less abnormal.

Question 8

For your retail time series (from Exercise 3 in Section 2.10):

a.

Split the data into two parts using

myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)

b.

Check that your data have been split appropriately by producing the following plot.

autoplot(myts) +
  autolayer(myts.train, series="Training") +
  autolayer(myts.test, series="Test")

c.

Calculate forecasts using snaive applied to myts.train.

fc <- snaive(myts.train)

d.

Compare the accuracy of your forecasts against the actual values stored in myts.test.

accuracy(fc,myts.test)

##                      ME     RMSE       MAE        MPE      MAPE     MASE
## Training set   6.122823 12.45603  8.666967   5.906047  7.848706 1.000000
## Test set     -31.379167 34.36303 31.379167 -18.882634 18.882634 3.620548
##                   ACF1 Theil's U
## Training set 0.5367142        NA
## Test set     0.1146989 0.8221791

e.

Check the residuals

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 301.98, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Do the residuals appear to be uncorrelated and normally distributed?

When we check the autocorrelation plot, lag 1 to lag 9 all above the blue dot line, indicating that the residuals are positive correlated. The Histogram confirms our finding that the mean of residual is more than 0, but they are normal distributed with some outliers. Ljung-Box test indicate the P value is 2.2e-16, thus we can conclude that the residuals are distinguishable from a white noise series.

f.

How sensitive are the accuracy measures to the training/test split?

accuracy(fc,myts.test)

##                      ME     RMSE       MAE        MPE      MAPE     MASE
## Training set   6.122823 12.45603  8.666967   5.906047  7.848706 1.000000
## Test set     -31.379167 34.36303 31.379167 -18.882634 18.882634 3.620548
##                   ACF1 Theil's U
## Training set 0.5367142        NA
## Test set     0.1146989 0.8221791

The accuracy measures to the training/test split is very sensitive. Except ME and MPE, those measures are move in the same direction. The mean absolute scaled error is very similar for both training and test set. Mean error has the biggest difference because of some negative residuals.

Data624_HW2

Mengqin Cai

2/20/2021

Question 1

usnetelec

usgdp

mcopper

enplanements

Question 2

Question 3

Question 8

a.

b.

c.

d.

e.

f.